Legacy Annotations Reference (Deprecated)

🚧

Important: Annotation Deprecation Notice

The v1 annotation format is deprecated but still supported for backward compatibility. We strongly recommend migrating to the new unified configuration format (v2) for future compatibility and access to the latest features. See Workload Autoscaler Configuration.

Configuration via Annotations v1

All settings are also available by adding annotations on the workload controller. When any workloads.cast.ai annotation is detected on a workload, it will be considered managed by annotations. This allows for flexible configuration, combining annotations and scaling policies.
Changes to the settings via the API/UI are no longer permitted for workloads with annotations. When a workload does not have an annotation for a specific setting, the default or scaling policy value is used.

📘

Note

Workloads can be managed through a combination of annotations and scaling policies. For example, you can set the workloads.cast.ai/scaling-policy annotation on a workload and toggle vertical autoscaling on/off in the scaling policy itself. This provides more flexibility in managing workload configurations.

The annotations generally follow a pattern of workloads.cast.ai/{resource}-{setting}. Currently, the available resources are cpu and memory. Available settings:

AnnotationPossible ValuesDefaultInfoRequired*
workloads.cast.ai/vertical-autoscalingon, off-Automated vertical scaling.Optional
workloads.cast.ai/scaling-policyany valid k8s annotation valuedefaultSpecifies the scaling policy name to use. When set, this annotation allows the workload to be managed by both annotations and the specified scaling policy. The scaling policy can control global settings like enabling/disabling vertical autoscaling.Optional
workloads.cast.ai/apply-typeimmediate, deferredimmediateAllows configuring the autoscaler operating mode to apply the recommendations.
Use immediate to apply recommendations as soon as the thresholds are passed.
Note: immediate mode can cause pod restarts.
Use deferred to apply recommendations only on natural pod restarts.
Optional
workloads.cast.ai/vertical-downscale-apply-typeimmediate, deferred-Configures the autoscaler operating mode specifically for downscaling operations, allowing for different behavior between upscaling and downscaling. When used in combination with workloads.cast.ai/apply-type, it provides fine-grained control over scaling operations.Optional
workloads.cast.ai/memory-event-apply-typeimmediate, deferred-Configures the autoscaler operating mode specifically for memory-related events, such as OOMKill or Node Memory Pressure Eviction.Optional
workloads.cast.ai/{resource}-overheadfloat >= 0cpu: 0, memory: 0.1Overhead expressed as a fraction, e.g., 10% would be expressed as 0.1.Optional
workloads.cast.ai/{resource}-targetmax, p{x}cpu: p80, memory: maxThe x in the p{x} is the target percentile. Integers between 0 and 99.Optional
workloads.cast.ai/{resource}-apply-thresholdfloat >= 0cpu: 0.1
memory: 0.1
The amount of the recommendation should differ from the requests so that it can be applied. For example, a 10% difference would be expressed as 0.1.Optional
workloads.cast.ai/{resource}-max4Gi, 60m, etc.-The upper limit for the recommendation. Recommendations won't exceed this value.Optional
workloads.cast.ai/{resource}-min4Gi, 60m, etc.-The lower limit for the recommendation. Min cannot be greater than max.Optional
workloads.cast.ai/{resource}-look-back-period-seconds 86400 >= int <= 60480086400 (24h)The duration of the look-back period applied to the metric query when generating a recommendation.Optional

workloads.cast.ai/vertical-downscale-apply-type

The workloads.cast.ai/vertical-downscale-apply-type annotation is fully compatible with the workloads.cast.ai/apply-type annotation and is meant to be used in combination with it. This allows for fine-grained control over both upscaling and downscaling. Here's how they interact:

  1. If both annotations are set to the same value (both immediate or both deferred), the behavior remains unchanged.
  2. If apply-type is set to immediate and vertical-downscale-apply-type is set to deferred:
    • Upscaling operations will be applied immediately.
    • Downscaling operations will be deferred to natural pod restarts.
  3. If apply-type is set to deferred and vertical-downscale-apply-type is set to immediate:
    • Upscaling operations will be deferred to natural pod restarts.
    • Downscaling operations will be applied immediately.

Example config:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
  annotations:
    workloads.cast.ai/vertical-autoscaling: "on" # enable vertical automatic scaling
    workloads.cast.ai/scaling-policy: "my-custom" # select my-custom scaling policy
    workloads.cast.ai/apply-type: "immediate" # apply recommendations immediately for upscaling
    workloads.cast.ai/vertical-downscale-apply-type: "deferred" # defer downscaling to natural pod restarts

    workloads.cast.ai/cpu-overhead:                 "0"      # 0%
    workloads.cast.ai/cpu-apply-threshold:          "0.05"   # 5% 
    workloads.cast.ai/cpu-target:                   "p80"    # 80th percentile
    workloads.cast.ai/cpu-max:                      "400m"   # max 0.4 cpu
    workloads.cast.ai/cpu-min:                      "120m"   # min 0.12 cpu
    workloads.cast.ai/cpu-look-back-period-seconds: "259200" # 3 days

    workloads.cast.ai/memory-overhead:                 "0.1"    # 10%
    workloads.cast.ai/memory-apply-threshold:          "0.05"   # 5%
    workloads.cast.ai/memory-target:                   "max"    # max usage
    workloads.cast.ai/memory-max:                      "2Gi"    # max 2Gi
    workloads.cast.ai/memory-min:                      "1Gi"    # min 1Gi
    workloads.cast.ai/memory-look-back-period-seconds: "172800" # 2 days

Configuration Errors

If the workload manifest contains an invalid configuration, as an example workloads.cast.ai/autoscaling: "unknown-value" the configuration will not be updated (old configuration values will be used until the erroneous configuration is fixed), and you should be able to see the error in the workload details in the CAST AI Console. Since scaling policy names are not restricted character-wise -- any value can be set, but a non-existent policy will be treated as an invalid configuration.