Legacy Annotations Reference (Deprecated)
Important: Annotation Deprecation Notice
The v1 annotation format is deprecated but still supported for backward compatibility. We strongly recommend migrating to the new unified configuration format (v2) for future compatibility and access to the latest features. See Workload Autoscaler Configuration.
Configuration via Annotations v1
All settings are also available by adding annotations on the workload controller. When any workloads.cast.ai
annotation is detected on a workload, it will be considered managed by annotations. This allows for flexible configuration, combining annotations and scaling policies.
Changes to the settings via the API/UI are no longer permitted for workloads with annotations. When a workload does not have an annotation for a specific setting, the default or scaling policy value is used.
Note
Workloads can be managed through a combination of annotations and scaling policies. For example, you can set the
workloads.cast.ai/scaling-policy
annotation on a workload and toggle vertical autoscaling on/off in the scaling policy itself. This provides more flexibility in managing workload configurations.
The annotations generally follow a pattern of workloads.cast.ai/{resource}-{setting}
. Currently, the available resources are cpu
and memory
. Available settings:
Annotation | Possible Values | Default | Info | Required* |
---|---|---|---|---|
workloads.cast.ai/vertical-autoscaling | on, off | - | Automated vertical scaling. | Optional |
workloads.cast.ai/scaling-policy | any valid k8s annotation value | default | Specifies the scaling policy name to use. When set, this annotation allows the workload to be managed by both annotations and the specified scaling policy. The scaling policy can control global settings like enabling/disabling vertical autoscaling. | Optional |
workloads.cast.ai/apply-type | immediate, deferred | immediate | Allows configuring the autoscaler operating mode to apply the recommendations. Use immediate to apply recommendations as soon as the thresholds are passed.Note: immediate mode can cause pod restarts.Use deferred to apply recommendations only on natural pod restarts. | Optional |
workloads.cast.ai/vertical-downscale-apply-type | immediate, deferred | - | Configures the autoscaler operating mode specifically for downscaling operations, allowing for different behavior between upscaling and downscaling. When used in combination with workloads.cast.ai/apply-type , it provides fine-grained control over scaling operations. | Optional |
workloads.cast.ai/memory-event-apply-type | immediate, deferred | - | Configures the autoscaler operating mode specifically for memory-related events, such as OOMKill or Node Memory Pressure Eviction. | Optional |
workloads.cast.ai/{resource}-overhead | float >= 0 | cpu: 0, memory: 0.1 | Overhead expressed as a fraction, e.g., 10% would be expressed as 0.1. | Optional |
workloads.cast.ai/{resource}-target | max, p{x} | cpu: p80, memory: max | The x in the p{x} is the target percentile. Integers between 0 and 99. | Optional |
workloads.cast.ai/{resource}-apply-threshold | float >= 0 | cpu: 0.1 memory: 0.1 | The amount of the recommendation should differ from the requests so that it can be applied. For example, a 10% difference would be expressed as 0.1. | Optional |
workloads.cast.ai/{resource}-max | 4Gi, 60m, etc. | - | The upper limit for the recommendation. Recommendations won't exceed this value. | Optional |
workloads.cast.ai/{resource}-min | 4Gi, 60m, etc. | - | The lower limit for the recommendation. Min cannot be greater than max. | Optional |
workloads.cast.ai/{resource}-look-back-period-seconds | 86400 >= int <= 604800 | 86400 (24h) | The duration of the look-back period applied to the metric query when generating a recommendation. | Optional |
workloads.cast.ai/vertical-downscale-apply-type
workloads.cast.ai/vertical-downscale-apply-type
The workloads.cast.ai/vertical-downscale-apply-type
annotation is fully compatible with the workloads.cast.ai/apply-type
annotation and is meant to be used in combination with it. This allows for fine-grained control over both upscaling and downscaling. Here's how they interact:
- If both annotations are set to the same value (both
immediate
or bothdeferred
), the behavior remains unchanged. - If
apply-type
is set toimmediate
andvertical-downscale-apply-type
is set todeferred
:- Upscaling operations will be applied immediately.
- Downscaling operations will be deferred to natural pod restarts.
- If
apply-type
is set todeferred
andvertical-downscale-apply-type
is set toimmediate
:- Upscaling operations will be deferred to natural pod restarts.
- Downscaling operations will be applied immediately.
Example config:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
annotations:
workloads.cast.ai/vertical-autoscaling: "on" # enable vertical automatic scaling
workloads.cast.ai/scaling-policy: "my-custom" # select my-custom scaling policy
workloads.cast.ai/apply-type: "immediate" # apply recommendations immediately for upscaling
workloads.cast.ai/vertical-downscale-apply-type: "deferred" # defer downscaling to natural pod restarts
workloads.cast.ai/cpu-overhead: "0" # 0%
workloads.cast.ai/cpu-apply-threshold: "0.05" # 5%
workloads.cast.ai/cpu-target: "p80" # 80th percentile
workloads.cast.ai/cpu-max: "400m" # max 0.4 cpu
workloads.cast.ai/cpu-min: "120m" # min 0.12 cpu
workloads.cast.ai/cpu-look-back-period-seconds: "259200" # 3 days
workloads.cast.ai/memory-overhead: "0.1" # 10%
workloads.cast.ai/memory-apply-threshold: "0.05" # 5%
workloads.cast.ai/memory-target: "max" # max usage
workloads.cast.ai/memory-max: "2Gi" # max 2Gi
workloads.cast.ai/memory-min: "1Gi" # min 1Gi
workloads.cast.ai/memory-look-back-period-seconds: "172800" # 2 days
Configuration Errors
If the workload manifest contains an invalid configuration, as an example workloads.cast.ai/autoscaling: "unknown-value"
the configuration will not be updated (old configuration values will be used until the erroneous configuration is fixed), and you should be able to see the error in the workload details in the CAST AI Console. Since scaling policy names are not restricted character-wise -- any value can be set, but a non-existent policy will be treated as an invalid configuration.
Updated about 1 month ago