Legacy annotations reference (deprecated)
Important: Annotation Deprecation NoticeThe v1 annotation format is deprecated but still supported for backward compatibility. We strongly recommend migrating to the new unified configuration format (v2) for future compatibility and access to the latest features. See Workload Autoscaler Configuration.
Configuration via annotations v1
All settings are also available by adding annotations on the workload controller. When any workloads.cast.ai annotation is detected on a workload, it will be considered managed by annotations. This allows for flexible configuration, combining annotations and scaling policies.
Changes to the settings via the API/UI are no longer permitted for workloads with annotations. When a workload does not have an annotation for a specific setting, the default or scaling policy value is used.
NoteWorkloads can be managed through a combination of annotations and scaling policies. For example, you can set the
workloads.cast.ai/scaling-policyannotation on a workload and toggle vertical autoscaling on/off in the scaling policy itself. This provides more flexibility in managing workload configurations.
The annotations generally follow a pattern of workloads.cast.ai/{resource}-{setting}. Currently, the available resources are cpu and memory. Available settings:
Annotation | Possible Values | Default | Info | Required * |
|---|---|---|---|---|
workloads.cast.ai/vertical-autoscaling | on, off | Automated vertical scaling. | Optional | |
workloads.cast.ai/scaling-policy | any valid k8s annotation value | default | Specifies the scaling policy name to use. When set, this annotation allows the workload to be managed by both annotations and the specified scaling policy. The scaling policy can control global settings like enabling/disabling vertical autoscaling. | Optional |
workloads.cast.ai/apply-type | immediate, deferred | immediate | Allows configuring the autoscaler operating mode to apply the recommendations.
| Optional |
workloads.cast.ai/vertical-downscale-apply-type | immediate, deferred | Configures the autoscaler operating mode specifically for downscaling operations, allowing for different behavior between upscaling and downscaling. When used in combination with | Optional | |
workloads.cast.ai/memory-event-apply-type | immediate, deferred | Configures the autoscaler operating mode specifically for memory-related events, such as OOMKill or Node Memory Pressure Eviction. | Optional | |
workloads.cast.ai/{resource}-overhead | float >= 0 | cpu: 0, memory: 0.1 | Overhead expressed as a fraction, e.g., 10% would be expressed as 0.1. | Optional |
workloads.cast.ai/{resource}-target | max, p{x} | cpu: p80, memory: max | The x in the p{x} is the target percentile. Integers between 0 and 99. | Optional |
workloads.cast.ai/{resource}-apply-threshold | float >= 0 | cpu: 0.1 | The amount of the recommendation should differ from the requests so that it can be applied. For example, a 10% difference would be expressed as 0.1. | Optional |
workloads.cast.ai/{resource}-max | 4Gi, 60m, etc. | The upper limit for the recommendation. Recommendations won't exceed this value. | Optional | |
workloads.cast.ai/{resource}-min | 4Gi, 60m, etc. | The lower limit for the recommendation. Min cannot be greater than max. | Optional | |
workloads.cast.ai/{resource}-look-back-period-seconds | 10800 >= int <= 604800 | 86400 (24h) | The duration of the look-back period applied to the metric query when generating a recommendation. | Optional |
workloads.cast.ai/vertical-downscale-apply-type
workloads.cast.ai/vertical-downscale-apply-typeThe workloads.cast.ai/vertical-downscale-apply-type annotation is fully compatible with the workloads.cast.ai/apply-type annotation and is meant to be used in combination with it. This allows for fine-grained control over both upscaling and downscaling. Here's how they interact:
- If both annotations are set to the same value (both
immediateor bothdeferred), the behavior remains unchanged. - If
apply-typeis set toimmediateandvertical-downscale-apply-typeis set todeferred:- Upscaling operations will be applied immediately.
- Downscaling operations will be deferred to natural pod restarts.
- If
apply-typeis set todeferredandvertical-downscale-apply-typeis set toimmediate:- Upscaling operations will be deferred to natural pod restarts.
- Downscaling operations will be applied immediately.
Example config:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
annotations:
workloads.cast.ai/vertical-autoscaling: "on" # enable vertical automatic scaling
workloads.cast.ai/scaling-policy: "my-custom" # select my-custom scaling policy
workloads.cast.ai/apply-type: "immediate" # apply recommendations immediately for upscaling
workloads.cast.ai/vertical-downscale-apply-type: "deferred" # defer downscaling to natural pod restarts
workloads.cast.ai/cpu-overhead: "0" # 0%
workloads.cast.ai/cpu-apply-threshold: "0.05" # 5%
workloads.cast.ai/cpu-target: "p80" # 80th percentile
workloads.cast.ai/cpu-max: "400m" # max 0.4 cpu
workloads.cast.ai/cpu-min: "120m" # min 0.12 cpu
workloads.cast.ai/cpu-look-back-period-seconds: "259200" # 3 days
workloads.cast.ai/memory-overhead: "0.1" # 10%
workloads.cast.ai/memory-apply-threshold: "0.05" # 5%
workloads.cast.ai/memory-target: "max" # max usage
workloads.cast.ai/memory-max: "2Gi" # max 2Gi
workloads.cast.ai/memory-min: "1Gi" # min 1Gi
workloads.cast.ai/memory-look-back-period-seconds: "172800" # 2 daysConfiguration errors
If the workload manifest contains an invalid configuration, as an example workloads.cast.ai/autoscaling: "unknown-value" the configuration will not be updated (old configuration values will be used until the erroneous configuration is fixed), and you should be able to see the error in the workload details in the CAST AI Console. Since scaling policy names are not restricted character-wise -- any value can be set, but a non-existent policy will be treated as an invalid configuration.
Updated 26 days ago
