Annotations reference

All Workload Autoscaler settings are available by adding annotations to the workload controller. When the workloads.cast.ai/configuration annotation is detected on a workload, it will be considered as configured by annotations. This allows for flexible configuration, combining annotations and scaling policies.

Changes to the settings via the API/UI are no longer permitted for workloads with annotations. The default or scaling policy value is used when a workload does not have an annotation for a specific setting.

Annotation values take precedence over what is defined in a scaling policy. This means that if a scaling policy is defined in the workload configuration under annotations, all of the individual configuration options defined under the annotation will override the respective policy values. Those that are not defined under the annotation will use system defaults or what is defined in the scaling policy.

Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
  annotations:
    workloads.cast.ai/configuration: |
      scalingPolicyName: custom
      vertical:
        optimization: on
        applyType: immediate
        anomalyDetection:
          cpuPressure:
            cpuStallThresholdPercentage: 30.0
            minPressuredPodPercentage: 20.0
        excludedContainers:
          - istio-proxy
        antiAffinity:
          considerAntiAffinity: false
        startup:
          period: 5m
        confidence:
          threshold: 0.5
        cpu:
          target: p81
          lookBackPeriod: 25h
          min: 1000m
          max: 2500m
          applyThresholdStrategy:
            type: defaultAdaptive
          overhead: 0.15
          limit:
            type: multiplier
            multiplier: 2.0
        memory:
          target: max
          lookBackPeriod: 30h
          min: 2Gi
          max: 10Gi
          applyThresholdStrategy:
            type: defaultAdaptive
          overhead: 0.35
          limit:
            type: noLimit
        downscaling:
          applyType: immediate
        memoryEvent:
          applyType: immediate
        containers:
          {container_name}:
            cpu:
              min: 10m
              max: 1000m
            memory:
              min: 10Mi
              max: 2048Mi
      rolloutBehavior:
        type: NoDisruption
        preferOneByOne: true
        delaySeconds: 120
      horizontal:
        optimization: on
        minReplicas: 5
        maxReplicas: 10
        scaleDown:
          stabilizationWindow: 5m
      containersGrouping:
        - key: name
          operator: contains
          values: ["data-processor"]
          into: data-processor

Configuration Structure

Below is a configuration structure reference for setting up a workload to be controlled by annotations.

📘

Note

workloads.cast.ai/configuration has to be a valid YAML string. In cases where the annotation contains an invalid YAML string, the entire configuration will be ignored.

scalingPolicyName

If not set, the system assigns the workload to system default policies based on workload type: resiliency for StatefulSets and balanced for all other workloads.

FieldTypeRequiredDefaultDescription
scalingPolicyNamestringNoSystem default (see below)Specifies the scaling policy name to use. When set, this annotation allows the workload to be managed by both annotations and the specified scaling policy.
scalingPolicyName: custom-policy

Fallback for invalid policy names

If the scalingPolicyName value does not match any existing scaling policy, the workload falls back to system default policies:

  • StatefulSets: resiliency
  • All other workloads: balanced

When this fallback occurs, the Cast AI Console displays a warning status on the workload in the Optimization page, such as: Could not find scaling policy by name '<policy-name>'. To resolve this, update the annotation to reference a valid scaling policy name.

vertical

FieldTypeRequiredDefaultDescription
verticalobjectNo-Vertical scaling configuration.
vertical:
  optimization: on
  applyType: immediate
  antiAffinity:
    considerAntiAffinity: false
  startup:
    period: 5m
  confidence:
    threshold: 0.5
  excludedContainers:
    - istio-proxy
vertical.optimization
FieldTypeRequiredDefaultDescription
optimizationstring
  • Yes

Enable vertical scaling ("on"/"off").

If using the vertical configuration option, this field becomes required.

vertical:
  optimization: on
vertical.applyType
FieldTypeRequiredDefaultDescription
applyTypestringNo"immediate"Allows configuring the autoscaler operating mode to apply the recommendations.
Use immediate to apply recommendations as soon as the thresholds are passed.
  • Note*: immediate mode can cause pod restarts.
    Use deferred to apply recommendations only on natural pod restarts.
vertical:
  applyType: immediate
vertical.antiAffinity
FieldTypeRequiredDefaultDescription
antiAffinityobjectNo-Configuration for handling pod anti-affinity scheduling constraints.
vertical:
  antiAffinity:
    considerAntiAffinity: false
vertical.antiAffinity.considerAntiAffinity
FieldTypeRequiredDefaultDescription
considerAntiAffinityboolean
  • Yes
falseWhen true, workload autoscaler will respect pod anti-affinity rules on hostname or host port and, as a result, issue recommendations for these pods in a deferred manner.
When false (default), recommendations for pods containing one of the constraints above will be applied immediately instead.
  • If using the vertical.antiAffinity configuration option, this field becomes required.
vertical:
  antiAffinity:
    considerAntiAffinity: false
vertical.startup
FieldTypeRequiredDefaultDescription
startupobjectNo

Configuration for handling workload startup behavior.

See Startup metrics.

vertical:
  startup:
    period: 5m
vertical.startup.period
FieldTypeRequiredDefaultDescription
periodduration
  • Yes
"2m"

Duration to ignore resource usage metrics after workload startup. Useful for applications with high initial resource usage spikes. Set to 0s to disable the startup metrics ignore period completely.

Valid values for the ignoring range from 2m to 60m. Set to 0s to disable the feature.
  • If using the vertical.startup configuration option, this field becomes required.
vertical:
  startup:
    period: 5m  # Example: ignore first 5 minutes of metrics
vertical:
  startup:
    period: 0s  # Disable startup metrics ignore period
vertical.confidence
FieldTypeRequiredDefaultDescription
confidenceobjectNo-Configuration for recommendation confidence thresholds.
vertical:
  confidence:
    threshold: 0.5
    required:  false
vertical.confidence.required
FieldTypeRequiredDefaultDescription
requiredbool
  • Yes

When set to:
  • true: The workload will require confidence metrics before optimization, even when directly enabled via annotations. This prevents immediate optimization of workloads with highly variable load patterns until sufficient historical data is available.

  • false: When enabled via annotations, the workload will be optimized immediately, bypassing the confidence check.

    • Note: *This option is not recommended.\
      *If using the vertical.confidence configuration option, at least one confidence field is required.
vertical:
  confidence:
    required: true
vertical.confidence.threshold
FieldTypeRequiredDefaultDescription
thresholdfloat
  • Yes
0.9Minimum confidence score required to apply recommendations (0.0-1.0). Higher values require more data points for recommendations.
  • If using the vertical.confidence configuration option, at least one confidence field is required.
vertical:
  confidence:
    threshold: 0.5
vertical.anomalyDetection
FieldTypeRequiredDefaultDescription
anomalyDetectionobjectNo-Configuration for anomaly detection features, like CPU pressure-based stall detection.
vertical:
  anomalyDetection:
    cpuPressure:
      cpuStallThresholdPercentage: 10.0
      minPressuredPodPercentage: 50.0
vertical.anomalyDetection.cpuPressure
FieldTypeRequiredDefaultDescription
cpuPressureobjectNo-Configuration for CPU pressure-based stall detection using PSI metrics.
vertical:
  anomalyDetection:
    cpuPressure:
      cpuStallThresholdPercentage: 10.0
      minPressuredPodPercentage: 50.0
vertical.anomalyDetection.cpuPressure.cpuStallThresholdPercentage
FieldTypeRequiredDefaultDescription
cpuStallThresholdPercentagefloatNoVaries by policy

The CPU stall percentage that triggers resource increases. When workloads can't get CPU time and stall exceeds this threshold, they'll be scaled up to reduce contention. Measured over a 5-minute window. Value range: 1.0–100.0.

See Stall detection.

vertical:
  anomalyDetection:
    cpuPressure:
      cpuStallThresholdPercentage: 30.0  # 30% stall threshold
vertical.anomalyDetection.cpuPressure.minPressuredPodPercentage
FieldTypeRequiredDefaultDescription
minPressuredPodPercentagefloatNoVaries by policy

The percentage of pods that must exceed the stall threshold before scaling triggers. This prevents scaling the entire workload when only a single pod or small subset is experiencing pressure. Value range: 1.0–100.0.

See Stall detection.

vertical:
  anomalyDetection:
    cpuPressure:
      minPressuredPodPercentage: 20.0  # Scale when 20% of pods are stalling
vertical.cpu
FieldTypeRequiredDefaultDescription
cpuobjectNo-CPU-specific scaling configuration.
vertical:
  cpu:
    target: p80
    lookBackPeriod: 24h
    min: 100m
    max: 1000m
    applyThresholdStrategy:
    	type: defaultAdaptive
    overhead: 0.1
vertical.cpu.target
FieldTypeRequiredDefaultDescription
targetstringNo"p80"Resource usage target:
  • max - Use maximum observed usage
  • p{0-99.9} - Use percentile (e.g., p80 for 80th percentile).
vertical:
  cpu:
    target: p80
vertical.cpu.lookBackPeriod
FieldTypeRequiredDefaultDescription
lookBackPerioddurationNo"24h"

Historical resource usage data window to consider for recommendations (3h-168h).

See Look-back Period.

vertical:
  cpu:
    lookBackPeriod: 24h
vertical.cpu.constraints

Defines minimum and maximum CPU bounds for the autoscaler. Two formats are supported:

  • constraints object (required when using percentageOfOriginal): specifies a typed constraint with type and value fields.
  • Legacy flat format (min/max as strings): still supported for constant values and not planned for deprecation.
FieldTypeRequiredDefaultDescription
constraintsobjectNo-Container for min and max CPU constraint objects. Required when using percentageOfOriginal.
vertical.cpu.constraints.min
FieldTypeRequiredDescription
typestringYesconstant for an absolute value; percentageOfOriginal for a percentage of the workload's original request.
valuestring | numberYesFor constant: Kubernetes CPU notation (e.g., "100m", "2"). For percentageOfOriginal: a number (e.g., 90 = 90%).
vertical.cpu.constraints.max

Same fields as vertical.cpu.constraints.min.

vertical:
  cpu:
    constraints:
      min:
        type: percentageOfOriginal
        value: 90
      max:
        type: constant
        value: 2000m
📘

Note

percentageOfOriginal requires the workload to have defined resource requests. Not supported for custom workloads or injected containers. If requests are not defined, the constraint is treated as having no limit set.

vertical.cpu.applyThreshold
🚧

Deprecation Notice

The applyThreshold configuration option is deprecated but still supported for backward compatibility. We strongly recommend migrating to the new applyThresholdStrategy configuration format for future compatibility and access to the latest features. See applyThresholdStrategy.

FieldTypeRequiredDefaultDescription
applyThresholdfloatNo0.1The relative difference required between current and recommended resource values to apply a change immediately:
  • for upscaling, the difference is calculated relative to current resource requests;
  • for downscaling, it's calculated relative to the new recommended value. For example, with a threshold of 0.1 (10%), an upscale from 100m to 120m CPU would be applied immediately (20% increase relative to current 100m), while an upscale from 110m to 120m would not be applied immediately (8% increase relative to new 120m).Value range: 0.01-2.5.
vertical:
  cpu:
    applyThreshold: 0.1
vertical.cpu.applyThresholdStrategy
🚧

Warning

applyThreshold and applyThresholdStrategy cannot be used simultaneously in a configuration as that will result in an error. applyThresholdStrategy is the latest and recommended configuration option.

FieldTypeRequiredDefaultDescription
applyThresholdStrategyobjectNo-Configuration for the strategy used to determine when recommendations should be applied. The strategy determines how the threshold percentage is calculated based on current resource requests.
vertical:
  cpu:
    applyThresholdStrategy:
      type: defaultAdaptive
vertical.cpu.applyThresholdStrategy.type
FieldTypeRequiredDefaultDescription
typestring
  • Yes
"defaultAdaptive"The type of threshold strategy to use:
  • defaultAdaptive - Automatically adjusts thresholds based on workload size.
  • percentage - Uses a fixed percentage threshold.
  • customAdaptive - Allows custom configuration of the adaptive threshold formula. Recommended for power users only. It works in the same way as the Default Adaptive Threshold, but it allows tweaking the parameters of the adaptive threshold formula.The defaultAdaptive threshold option uses the following values:
    numerator = 0.5
    denominator = 1
    exponent = 1 (same effect as not used, i.e., no influence on the calculation)*Required when using applyThresholdStrategy
vertical:
  cpu:
    applyThresholdStrategy:
      type: defaultAdaptive  # Using default adaptive threshold
vertical.cpu.applyThresholdStrategy.percentage
FieldTypeRequiredDefaultDescription
percentagefloat
  • Yes

The fixed percentage threshold to use. Value range: 0.01-2.5.
  • Required when type is percentage
vertical:
  cpu:
    applyThresholdStrategy:
      type: percentage      # Using fixed percentage threshold
      percentage: 0.3      # 30% threshold
vertical.cpu.applyThresholdStrategy.numerator
FieldTypeRequiredDefaultDescription
numeratorfloat
  • Yes
0.5Affects the vertical stretch of the threshold function. Lower values create smaller thresholds.
  • Required when type is customAdaptive
vertical:
  cpu:
    applyThresholdStrategy:
      type: customAdaptive    # Using custom adaptive threshold
      numerator: 0.1
      exponent: 0.1
      denominator: 2
vertical.cpu.applyThresholdStrategy.denominator
FieldTypeRequiredDefaultDescription
denominatorfloat
  • Yes
1Affects threshold sensitivity for small workloads. Values close to 0 result in larger thresholds for small workloads. For example, when numerator is 1, exponent is 1 and denominator is 0 the threshold for 0.5 req. CPU will be 200%.
  • Required when type is customAdaptive
vertical:
  cpu:
    applyThresholdStrategy:
      type: customAdaptive
      numerator: 0.1
      exponent: 0.1
      denominator: 2
vertical.cpu.applyThresholdStrategy.exponent
FieldTypeRequiredDefaultDescription
exponentfloat
  • Yes
1Controls how quickly the threshold decreases for larger workloads. Lower values prevent extremely small thresholds for large resources.
  • Required when type is customAdaptive
vertical:
  cpu:
    applyThresholdStrategy:
      type: customAdaptive
      numerator: 0.1
      exponent: 0.1
      denominator: 2
vertical.cpu.overhead
FieldTypeRequiredDefaultDescription
overheadfloatNo0.1

Additional resource buffer when applying recommendations (0.0-2.5, e.g., 0.1 = 10%).

If a 10% buffer is configured, the issued recommendation will have +10% added to it, so that the workload can handle further increased resource demand.

vertical:
  cpu:
    overhead: 0.1
vertical.cpu.limit
FieldTypeRequiredDefaultDescription
limitobjectNo

Configuration for container CPU limit scaling.

Default behaviour when not specified:
  • If spec.containers[].resources.limits.cpu is not defined on the workload, no limit is set by the workload autoscaler.
  • If spec.containers[].resources.limits.cpu is defined on the workload, it is removed by Workload Autoscaler.
vertical:
  cpu:
    limit:
      type: multiplier
      multiplier: 2.0
vertical.cpu.limit.type
FieldTypeRequiredDefaultDescription
typestring*Yes

Type of CPU limit scaling to apply:
  • noLimit - Remove the resource limit from the workload definition entirely.

  • multiplier - Set limit as a multiple of requests using the following formula: resources.requests.cpu * {multiplier}. The behavior can be further controlled with optional flags onlyIfOriginalExist and onlyIfOriginalLower.

  • keepLimits - Preserve the existing limits as defined in the workload manifest without modification.

  • maintainRatio - Scale limits proportionally with recommended requests, preserving the original limit-to-request ratio from the workload manifest. If the original limit is 2× the request, the updated limit stays 2× the new recommended request. If no CPU limit exists on the workload, falls back to the default behavior (limits are removed).

*If using the vertical.cpu.limit configuration option, this field becomes required.
vertical:
  cpu:
    limit:
      type: keepLimits
vertical.cpu.limit.multiplier
FieldTypeRequiredDefaultDescription
multiplierfloat*Yes

Value to multiply the requests by to set the limit. The calculation is: resources.requests.cpu * {multiplier}. The value must be greater than or equal to 1.

*Required when type is set to multiplier.

vertical:
  cpu:
    limit:
      type: multiplier
      multiplier: 2.0
vertical.cpu.limit.onlyIfOriginalExist
FieldTypeRequiredDefaultDescription
onlyIfOriginalExistbooleanNofalse

When set to true, CPU limits will only be set if the workload originally had CPU limits defined in its manifest. If the original workload has no CPU limits specified, no limits will be added.

This flag allows conditional limit management based on the original workload configuration.

Only applicable when the type is set to multiplier.

vertical:
  cpu:
    limit:
      type: multiplier
      multiplier: 2.0
      onlyIfOriginalExist: true
vertical.cpu.limit.onlyIfOriginalLower
FieldTypeRequiredDefaultDescription
onlyIfOriginalLowerbooleanNofalse

When set to true, CPU limits will only be updated if the original limits are lower than the calculated value (requests × multiplier). If the original limits are already higher than the calculated value, they remain unchanged.

This flag prevents reducing existing limits and ensures limits only increase when beneficial.

Only applicable when the type is set to multiplier.

vertical:
  cpu:
    limit:
      type: multiplier
      multiplier: 2.0
      onlyIfOriginalLower: true

Combining both flags:

When both onlyIfOriginalExist and onlyIfOriginalLower are set to true, the behavior matches Memory's Automatic mode: limits are only set when the workload originally had limits defined AND only when those original limits are lower than the calculated value.

vertical:
  cpu:
    limit:
      type: multiplier
      multiplier: 1.5
      onlyIfOriginalExist: true
      onlyIfOriginalLower: true
vertical.cpu.optimization
FieldTypeRequiredDefaultDescription
optimizationstringno

This configuration option can be used to disable CPU management for workloads that benefit from memory management only. The workload will then use CPU requests/limits configured in the template.
  • The only allowed option is off. If not set, the resource will inherit the workload management option as normal. Minimum required workload-autoscaler component version to use this feature is v0.23.1.
vertical:
  cpu:
    optimization: off
vertical.memory
FieldTypeRequiredDefaultDescription
memoryobjectNo-Memory-specific scaling configuration.
vertical:
  memory:
    target: max
    lookBackPeriod: 24h
    min: 128Mi
    max: 2Gi
    applyThresholdStrategy:
    	type: defaultAdaptive
    overhead: 0.1
vertical.memory.target
FieldTypeRequiredDefaultDescription
targetstringNo"max"Resource usage target:
  • max - Use maximum observed usage.
  • p{0-99.9} - Use percentile (e.g., p80 for 80th percentile).
vertical:
  memory:
    target: max
vertical.memory.lookBackPeriod
FieldTypeRequiredDefaultDescription
lookBackPerioddurationNo"24h"

Historical resource usage data window to consider for recommendations (3h-168h).

See Look-back Period.

vertical:
  memory:
    lookBackPeriod: 24h
vertical.memory.constraints

Defines minimum and maximum memory bounds for the autoscaler. Two formats are supported:

  • constraints object (required when using percentageOfOriginal): specifies a typed constraint with type and value fields.
  • Legacy flat format (min/max as strings): still supported for constant values and not planned for deprecation.
FieldTypeRequiredDefaultDescription
constraintsobjectNo-Container for min and max memory constraint objects. Required when using percentageOfOriginal.
vertical.memory.constraints.min
FieldTypeRequiredDescription
typestringYesconstant for an absolute value; percentageOfOriginal for a percentage of the workload's original request.
valuestring | numberYesFor constant: Kubernetes memory notation (e.g., "128Mi", "2Gi"). For percentageOfOriginal: a number (e.g., 90 = 90%). To control memory limits, see vertical.memory.limit.
vertical.memory.constraints.max

Same fields as vertical.memory.constraints.min. Note: This does not cap the memory limit directly. To control memory limits, see vertical.memory.limit.

vertical:
  memory:
    constraints:
      min:
        type: constant
        value: 128Mi
      max:
        type: percentageOfOriginal
        value: 150
📘

Note

percentageOfOriginal requires the workload to have defined resource requests. Not supported for custom workloads or injected containers. If requests are not defined, the constraint is treated as having no limit set.

vertical.memory.applyThreshold
🚧

Deprecation Notice

The applyThreshold configuration option is deprecated but still supported for backward compatibility. We strongly recommend migrating to the new applyThresholdStrategy configuration format for future compatibility and access to the latest features. See applyThresholdStrategy.

FieldTypeRequiredDefaultDescription
applyThresholdfloatNo0.1The relative difference required between current and recommended resource values to apply a change immediately:
  • for upscaling, the difference is calculated relative to current resource requests;
  • for downscaling, it's calculated relative to the new recommended value. For example, with a threshold of 0.1 (10%), an upscale from 100MiB to 120MiB of memory would be applied immediately (20% increase relative to current 100MiB), while an upscale from 110MiB to 120MiB would not be applied immediately (8% increase relative to new 120MiB). Value range: 0.01-2.5.
vertical:
  memory:
    applyThreshold: 0.1
vertical.memory.applyThresholdStrategy
🚧

Warning

applyThreshold and applyThresholdStrategy cannot be used simultaneously in a configuration as that will result in an error. applyThresholdStrategy is the latest and recommended configuration option.

FieldTypeRequiredDefaultDescription
applyThresholdStrategyobjectNo-Configuration for the strategy used to determine when recommendations should be applied. The strategy determines how the threshold percentage is calculated based on current resource requests.
vertical:
  memory:
    applyThresholdStrategy:
      type: customAdaptive    # Using custom adaptive threshold
      numerator: 0.1
      exponent: 0.1
      denominator: 2
vertical.memory.applyThresholdStrategy.type
FieldTypeRequiredDefaultDescription
typestring
  • Yes
"defaultAdaptive"The type of threshold strategy to use:
  • defaultAdaptive - Automatically adjusts thresholds based on workload size.
  • percentage - Uses a fixed percentage threshold.
  • customAdaptive - Allows custom configuration of the adaptive threshold formula. Recommended for power users only. It works in the same way as the Default Adaptive Threshold, but it allows tweaking the parameters of the adaptive threshold formula. The defaultAdaptive threshold option uses the following values:
    numerator = 0.5
    denominator = 1
    exponent = 1 (same effect as not used, i.e., no influence on the calculation)*Required when using applyThresholdStrategy
vertical:
  memory:
    applyThresholdStrategy:
      type: defaultAdaptive  # Using default adaptive threshold
vertical.memory.applyThresholdStrategy.percentage
FieldTypeRequiredDefaultDescription
percentagefloat
  • Yes

The fixed percentage threshold to use. Value range: 0.01-2.5.
  • Required when type is percentage
vertical:
  memory:
    applyThresholdStrategy:
      type: percentage      # Using fixed percentage threshold
      percentage: 0.3      # 30% threshold
vertical.memory.applyThresholdStrategy.numerator
FieldTypeRequiredDefaultDescription
numeratorfloat
  • Yes
0.5Affects the vertical stretch of the threshold function. Lower values create smaller thresholds.
  • Required when type is customAdaptive
vertical:
  memory:
    applyThresholdStrategy:
      type: customAdaptive    # Using custom adaptive threshold
      numerator: 0.1
      exponent: 0.1
      denominator: 2
vertical.memory.applyThresholdStrategy.denominator
FieldTypeRequiredDefaultDescription
denominatorfloat
  • Yes
1Affects threshold sensitivity for small workloads. Values close to 0 result in larger thresholds for small workloads. For example, when numerator is 1, exponent is 1 and denominator is 0 the threshold for 0.5 req. Memory will be 200%.
  • Required when type is customAdaptive
vertical:
  memory:
    applyThresholdStrategy:
      type: customAdaptive
      numerator: 0.1
      exponent: 0.1
      denominator: 2
vertical.cpu.applyThresholdStrategy.exponent
FieldTypeRequiredDefaultDescription
exponentfloat
  • Yes
1Controls how quickly the threshold decreases for larger workloads. Lower values prevent extremely small thresholds for large resources.
  • Required when type is customAdaptive
vertical:
  memory:
    applyThresholdStrategy:
      type: customAdaptive
      numerator: 0.1
      exponent: 0.1
      denominator: 2
vertical.memory.overhead
FieldTypeRequiredDefaultDescription
overheadfloatNo0.1

Additional resource buffer when applying recommendations (0.0-2.5, e.g., 0.1 = 10%).

If a 10% buffer is configured, the issued recommendation will have +10% added to it so that the workload can handle further increased resource demand.

vertical:
  memory:
    overhead: 0.1
vertical.memory.limit
FieldTypeRequiredDefaultDescription
limitobjectNo

Configuration for container memory limit scaling.

The default behavior when not specified:
  • If spec.containers[].resources.limits.memory is not defined on the workload, no limit is set by Workload Autoscaler.

  • If spec.containers[].resources.limits.memory is defined on the workload, Workload Autoscaler calculates the limit using the following formula: max(max(resources.requests.memory * 1.5, 128MiB), current.resources.limit) (it will only increase the limit, but never lower it, and will enforce a minimum of 128MiB).

vertical:
  memory:
    limit:
      type: multiplier  
      multiplier: 1.5
vertical.memory.limit.type
FieldTypeRequiredDefaultDescription
typestring*Yes

Type of limit scaling to apply:
  • noLimit - Remove the resource limit from the workload definition entirely.

  • multiplier - Set limit as a multiple of requests using the following formula: resources.requests.memory * {multiplier}.

  • keepLimits - Preserve the existing limits as defined in the workload manifest without modification.

  • maintainRatio - Scale limits proportionally with recommended requests, preserving the original limit-to-request ratio from the workload manifest. If the original limit is 2× the request, the updated limit stays 2× the new recommended request. If the workload has no existing memory limit, a default 1.5× multiplier is applied to the recommended request. The 128MiB minimum memory limit applies.

*If using the vertical.memory.limit configuration option, this field (type) becomes required.
vertical:
  memory:
    limit:
      type: keepLimits  
vertical.memory.limit.multiplier
FieldTypeRequiredDefaultDescription
multiplierfloat
  • Yes

Value to multiply the requests by to set the limit on the workload. The calculation is: max(resources.requests.memory * {multiplier}, 128MiB). The value must be greater than or equal to 1. Note that the recommended limit will never be less than 128MiB.

*Required when type is set to multiplier.

vertical:
  memory:
    limit:
      type: multiplier  
      multiplier: 1.5
vertical.memory.optimization
FieldTypeRequiredDefaultDescription
optimizationstringno

This configuration option can be used to disable memory management for workloads that benefit from CPU management only (e.g., Java workloads with a fixed heap size). The workload will then use memory requests/limits configured in the template.
  • The only allowed option is off. If not set, the resource will inherit the workload management option as normal. Minimum required workload-autoscaler component version to use this feature is v0.23.1.
vertical:
  memory:
    optimization: off
vertical.excludedContainers

Specifies container names to exclude from vertical autoscaling optimization. Excluded containers retain their current resource settings and are not scaled by the Workload Autoscaler. Recommendations are still generated and visible in event logs and the workload detail view.

FieldTypeRequiredDefaultDescription
excludedContainersarrayNo-List of container names to exclude from optimization. Names must match exactly as defined in the pod specification. Regex or glob patterns are not supported.
vertical:
  excludedContainers:
    - istio-proxy
    - logging-agent
📘

Note

Dynamically injected sidecar containers are already excluded from automatic optimization by default. Use excludedContainers for application containers or native sidecar containers explicitly defined in your pod spec that you want to prevent from being scaled.

vertical.predictiveScaling
FieldTypeRequiredDefaultDescription
predictiveScalingobjectNo-Predictive scaling configuration for CPU.
vertical:
  predictiveScaling:
    cpu:
      enabled: true
vertical.predictiveScaling.cpu
FieldTypeRequiredDefaultDescription
cpuobjectNo-CPU-specific predictive scaling settings.
vertical:
  predictiveScaling:
    cpu:
      enabled: true
vertical.predictiveScaling.cpu.enabled
FieldTypeRequiredDefaultDescription
enabledbooleanNofalse

Enable predictive scaling for CPU resources. When enabled, the system forecasts CPU usage based on historical patterns and generates proactive recommendations. Requires vertical.optimization to be set to on.

Predictive scaling can be enabled on any workload via annotations, even if it is not currently eligible for scaling in this manner. The system will automatically activate predictive scaling once the workload becomes predictable and will seamlessly revert to standard scaling if the patterns are lost. This allows preemptive enablement without monitoring for eligibility.

See Predictive workload scaling for more information.

vertical:
  predictiveScaling:
    cpu:
      enabled: true
vertical.downscaling
FieldTypeRequiredDefaultDescription
downscalingobjectNo-Downscaling behavior override.
vertical:
  downscaling:
    applyType: immediate
vertical.downscaling.applyType
FieldTypeRequiredDefaultDescription
applyTypestringNoDefault is taken from the vertical scaling policy controlling the workload.Override application mode:
  • immediate - Apply changes immediately
  • deferred - Apply during natural restarts
vertical:
  downscaling:
    applyType: immediate
vertical.memoryEvent
FieldTypeRequiredDefaultDescription
memoryEventobjectNo-Memory event behavior override.
vertical:
  memoryEvent:
    applyType: immediate
vertical.memoryEvent.applyType

This configuration option is fully compatible with other applyType options and is meant to be used in combination with them. This allows for fine-grained control over both upscaling and downscaling. Here's how they interact:

  1. If both configuration options are set to the same value (both immediate or both deferred), the behavior remains unchanged.
  2. If vertical.downscaling.applyType is set to deferred and vertical.memoryEvent.applyType is set to immediate:
    • Upscaling operations will be applied immediately.
    • Downscaling operations will be deferred to natural pod restarts.
  3. If vertical.downscaling.applyType is set to immediate and vertical.memoryEvent.applyType is set to deferred:
    • Upscaling operations will be deferred to natural pod restarts.
    • Downscaling operations will be applied immediately.
FieldTypeRequiredDefaultDescription
applyTypestring
  • Yes
Default is taken from the vertical scaling policy controlling the workload.Override application mode for memory-related events (OOM kills, pressure):
  • immediate - Apply changes immediately
  • deferred - Apply during natural restarts*If using the vertical.memoryEvent configuration option, this field becomes required.
vertical:
  memoryEvent:
    applyType: immediate
vertical.containers

Configuration object that contains container-specific settings.

FieldTypeRequiredDefaultDescription
containersobjectNo-Container configuration mapping.
vertical:
  containers:
vertical.containers.{container_name}

Configuration object that contains resource constraints for a specific container. Replace {container_name} with the name of your container.

FieldTypeRequiredDefaultDescription
{container_name}objectNo-Container resource configuration object. Has to be the name of the container for which resources are being configured.
vertical:
  containers:
    {container_name}:
📘

Note

Container constraints apply to application containers (spec.containers[]) and native sidecar containers (spec.initContainers[] with restartPolicy: Always). Traditional initContainers that run during pod startup are not optimized by Workload Autoscaler.

vertical.containers.{container_name}.cpu

Container CPU constraints. Set minimum and maximum CPU limits for a specific container to define the workload autoscaler's scaling range. The percentageOfOriginal type is not yet supported at container level — only constant is. The constraints.min and constraints.max blocks are optional, but if specified, both type and value are required.

FieldTypeRequiredDefaultDescription
constraints.min.typestringYes-Must be constant.
constraints.min.valuestringYes-Kubernetes CPU notation (e.g., "10m", "1"). Min cannot be greater than max.
constraints.max.typestringYes-Must be constant.
constraints.max.valuestringYes-Kubernetes CPU notation (e.g., "1000m", "2"). Recommendations won't exceed this value.
vertical:
  containers:
    {container_name}:
      cpu:
        constraints:
          min:
            type: constant
            value: 10m
          max:
            type: constant
            value: 1000m
vertical.containers.{container_name}.memory

Container memory request constraints. Set minimum and maximum memory requests for a specific container to define the workload autoscaler's scaling range. These values control requests only; memory limits are managed separately via vertical.memory.limit. The percentageOfOriginal type is not yet supported at container level — only constant is. The constraints.min and constraints.max blocks are optional, but if specified, both type and value are required.

FieldTypeRequiredDefaultDescription
constraints.min.typestringYes-Must be constant.
constraints.min.valuestringYes-Kubernetes memory notation (e.g., "10Mi", "2Gi"). Recommendations will not go below this value.
constraints.max.typestringYes-Must be constant.
constraints.max.valuestringYes-Kubernetes memory notation (e.g., "2048Mi", "4Gi"). Recommendations will not exceed this value.
vertical:
  containers:
    {container_name}:
      memory:
        constraints:
          min:
            type: constant
            value: 10Mi
          max:
            type: constant
            value: 2048Mi
vertical.excludedContainers

Specifies container names to exclude from vertical autoscaling. Excluded containers retain their current resource settings and are not scaled by the Workload Autoscaler. Recommendations are still generated and visible, but not applied.

FieldTypeRequiredDefaultDescription
excludedContainersarrayNo-List of container names to exclude from optimization. Names must match exactly as defined in the pod specification.
vertical:
  excludedContainers:
    - istio-proxy
    - logging-agent
📘

Note

Dynamically injected sidecar containers are already excluded from automatic optimization by default. Use excludedContainers for application containers or native sidecar containers explicitly defined in your pod spec that you want to prevent from being scaled.

rolloutBehavior

FieldTypeRequiredDefaultDescription
rolloutBehaviorobjectNo-Configuration for controlling how recommendations are rolled out.
rolloutBehavior:
  type: NoDisruption
  preferOneByOne: true
  delaySeconds: 120
rolloutBehavior.type
FieldTypeRequiredDefaultDescription
typestringNo-

Controls how recommendation updates are rolled out.

NoDisruption Ensures zero-downtime updates for single-replica workloads by temporarily scaling to two replicas during updates. This prevents service interruptions when applying new resource recommendations.

Note: Requires workload-autoscaler component version v0.35.3 or higher. This setting is incompatible with the deferred apply type - if you change policy to deferred, you must remove the rolloutBehavior setting. This setting is applicable only to Deployment resources with a single replica, whose rollout strategy allows for downtime.

For comprehensive workload requirements, see Configuring zero-downtime updates.

rolloutBehavior:
  type: NoDisruption
rolloutBehavior.preferOneByOne
FieldTypeRequiredDefaultDescription
preferOneByOnebooleanNofalse

When true, enables a one-by-one pod restart strategy for immediate mode recommendations. This applies recommendations sequentially, waiting for each pod to become healthy before proceeding to the next. This reduces disruption for workloads without Pod Disruption Budgets or with aggressive rollout strategies.


Note: Requires workload-autoscaler component version v0.57.0 or higher. This setting applies only to immediate mode and has no effect when using the deferred apply type. Supported for replicated workloads (Deployments, StatefulSets, ReplicaSets, ArgoCD Rollouts).

See Pod restart strategy for more information.

rolloutBehavior:
  preferOneByOne: true
rolloutBehavior.delaySeconds
FieldTypeRequiredDefaultDescription
delaySecondsintegerNo-Number of seconds to delay between pod restarts when applying recommendations. Accepted range: 0–3600.
rolloutBehavior:
  delaySeconds: 120

horizontal

FieldTypeRequiredDefaultDescription
horizontalobjectNo-Horizontal autoscaling configuration.

When horizontal autoscaling is configured via annotations, Cast AI creates and manages a native Kubernetes HorizontalPodAutoscaler (autoscaling/v2) resource in the cluster. For concepts, supported workload types, and compatibility details, see Horizontal autoscaling.

horizontal:
  optimization: true
  useNative: true
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
horizontal.optimization
FieldTypeRequiredDefaultDescription
optimizationbooleanYes*

Controls whether Cast AI actively manages horizontal scaling for this workload. Set to true for managed scaling or false for read-only mode (Cast AI monitors but does not scale).

*Required when the horizontal configuration block is present.

horizontal:
  optimization: true
horizontal.useNative
FieldTypeRequiredDefaultDescription
useNativebooleanYes*false

Enables native Kubernetes HPA mode. When true, Cast AI creates and manages a native HorizontalPodAutoscaler resource in the cluster.

*Required to enable horizontal autoscaling.

horizontal:
  useNative: true
horizontal.takeOwnership
FieldTypeRequiredDefaultDescription
takeOwnershipbooleanNofalseWhen true, Cast AI takes ownership of an existing native HPA on the workload and replaces its configuration with Cast AI-managed settings. The existing HPA must not be managed by a third party such as KEDA. Only HPAs with CPU or memory utilization triggers are currently eligible for ownership.
horizontal:
  takeOwnership: true
horizontal.minReplicas
FieldTypeRequiredDefaultDescription
minReplicasintegerYes*-Minimum number of pod replicas. Must be greater than 0.
horizontal:
  minReplicas: 2
horizontal.maxReplicas
FieldTypeRequiredDefaultDescription
maxReplicasintegerYes*-Maximum number of pod replicas. Must be greater than 0 and ≥ minReplicas.
horizontal:
  maxReplicas: 10
horizontal.metrics
FieldTypeRequiredDefaultDescription
metricsarrayYes*-Array of metric objects that define scaling triggers. At least one metric is required. Only resource metric type (CPU or memory) is supported via annotations.
horizontal:
  metrics:
    - type: resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
metrics[].type
FieldTypeRequiredDefaultDescription
typestringYes-The metric source type. Only resource is supported via annotations.
metrics[].resource.name
FieldTypeRequiredDefaultDescription
namestringYes-The resource to monitor: cpu or memory.
metrics[].resource.target.type
FieldTypeRequiredDefaultDescription
typestringYes-How the target value is interpreted. Accepted values: Utilization (percentage of resource requests averaged across all pods), AverageValue (average raw value across all pods), or Value (raw value).
metrics[].resource.target.averageUtilization
FieldTypeRequiredDefaultDescription
averageUtilizationintegerConditional-Target average utilization percentage across all pods. Required when target.type is Utilization.
metrics:
  - type: resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
metrics[].resource.target.averageValue
FieldTypeRequiredDefaultDescription
averageValuestringConditional-Target average value as a Kubernetes quantity string (e.g., "100m"). Required when target.type is AverageValue.
metrics[].resource.target.value
FieldTypeRequiredDefaultDescription
valuestringConditional-Target value as a Kubernetes quantity string (e.g., "100m"). Required when target.type is Value.
horizontal.behavior
FieldTypeRequiredDefaultDescription
behaviorobjectNo-Custom scaling behavior for scale-up and scale-down directions. Maps directly to the Kubernetes HPA v2 behavior spec.
horizontal:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      selectPolicy: Max
      tolerance: "0.1"
      policies:
        - type: Pods
          value: 4
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      selectPolicy: Min
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
horizontal.behavior.scaleUp / horizontal.behavior.scaleDown
FieldTypeRequiredDefaultDescription
scaleUpobjectNo-Scale-up behavior configuration.
scaleDownobjectNo-Scale-down behavior configuration.

The scaleUp and scaleDown objects share the same structure. The following fields apply to both.

behavior.direction.stabilizationWindowSeconds
FieldTypeRequiredDefaultDescription
stabilizationWindowSecondsintegerNo-The number of seconds the autoscaler looks back at previous scaling recommendations before acting. This prevents rapid fluctuations in replica count. Valid range: 0–3600.
behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
behavior.direction.selectPolicy
FieldTypeRequiredDefaultDescription
selectPolicystringNo-When multiple scaling policies are defined, determines which one to use. Max selects the policy allowing the most change, Min selects the most conservative, and Disabled prevents scaling in this direction entirely.
behavior.direction.tolerance
FieldTypeRequiredDefaultDescription
tolerancestringNo-A non-negative quantity value. Scaling is triggered only when the metric deviation exceeds this tolerance. Must be ≥ 0.
behavior:
  scaleUp:
    tolerance: "0.1"
behavior.direction.policies
FieldTypeRequiredDefaultDescription
policiesarrayNo-Array of scaling policy rules that control the rate of change.
policies[].type
FieldTypeRequiredDefaultDescription
typestringYes-The unit for the scaling rate limit: Pods (absolute count) or Percent (percentage of current replicas).
policies[].value
FieldTypeRequiredDefaultDescription
valueintegerYes-Maximum amount of change allowed per period. Must be > 0.
policies[].periodSeconds
FieldTypeRequiredDefaultDescription
periodSecondsintegerYes-Time window in seconds for the scaling rate limit. Valid range: 1–1800.
policies:
  - type: Pods
    value: 4
    periodSeconds: 60
  - type: Percent
    value: 100
    periodSeconds: 60

(Deprecated) legacy horizontal fields

🚧

Deprecated

The following fields apply only to legacy horizontal scaling. They are not compatible with native HPA mode (useNative: true). New configurations should use the fields documented in the horizontal section above. To migrate existing workloads, see Migrate from legacy horizontal scaling.

When using legacy Cast AI horizontal scaling (without useNative: true), the horizontal block uses the following structure:

horizontal:
  optimization: on
  minReplicas: 1
  maxReplicas: 10
  scaleDown:
    stabilizationWindow: 5m
  shortAverage: 3m

The minReplicas and maxReplicas fields work identically to the native HPA configuration documented above. The fields below are specific to legacy mode.

horizontal.optimization (legacy)
FieldTypeRequiredDefaultDescription
optimizationstringYes*

Enable legacy horizontal scaling. Set to "on" to enable or "off" to disable.

In native HPA mode, this field is a boolean (true/false) instead. See horizontal.optimization.

*Required when the horizontal configuration block is present.

horizontal:
  optimization: on
horizontal.scaleDown.stabilizationWindow (legacy)
FieldTypeRequiredDefaultDescription
stabilizationWindowduration*Yes"5m"

Cooldown period between scale-downs. The duration needs to be parsable (e.g., "3m", "10m").

*Required if the horizontal.scaleDown block is present.

horizontal:
  scaleDown:
    stabilizationWindow: 5m
horizontal.shortAverage (legacy)
FieldTypeRequiredDefaultDescription
shortAveragedurationNo"3m"Time period to average CPU metrics over before making scaling decisions. Valid range: 1–10 minutes. Not applicable to native HPA mode — this field is ignored when useNative: true.
horizontal:
  shortAverage: 3m

containersGrouping

FieldTypeRequiredDefaultDescription
containersGroupingarrayNo-Rules for grouping dynamically generated containers with similar naming patterns.
containersGrouping:
  - key: name
    operator: contains
    values: ["data-processor"]
    into: data-processor
containersGrouping.[].key
FieldTypeRequiredDefaultDescription
keystring*Yes-The attribute used to match containers. Currently, only supports name which refers to the container name property.
containersGrouping.[].operator
FieldTypeRequiredDefaultDescription
operatorstring*Yes-Defines how the key is evaluated against the values list. Currently, only supports contains.
containersGrouping.[].values
FieldTypeRequiredDefaultDescription
valuesarray*Yes-A list of string values used for matching against the key with the specified operator. Must contain at least one item.
containersGrouping.[].into
FieldTypeRequiredDefaultDescription
intostring*Yes-The target container name into which matching containers should be grouped.

Example usage

See Container Grouping for Dynamic Containers.

*Required if the parent object is present in the configuration.

schedule

FieldTypeRequiredDefaultDescription
scheduleobjectNo-Controls whether a custom workload is treated as job-like (sporadic execution) or continuous (always running) for optimization purposes.
schedule:
  type: jobLike
schedule.type
FieldTypeRequiredDefaultDescription
typestringNo

Explicitly sets whether the custom workload should be treated as job-like or continuous:
  • jobLike– Treats the workload as a job with sporadic execution patterns. For confidence, it requires only 3 runs per configured look-back period and keeps recommendations in the cluster even when no pods are running.
  • continuous– Treats the workload as a long-running application. It requires more metric density for confidence.Note: This configuration is only applicable to custom workloads (those with the workloads.cast.ai/custom-workload label). It does not affect native Kubernetes Jobs or other standard workload types.
schedule:
  type: jobLike  # For sporadic workloads like batch jobs
🚧

Important

The schedule.type configuration only applies to custom job-like workloads identified by the workloads.cast.ai/custom-workload label. Native Kubernetes Jobs with this label are always treated as job-like, and standard workload types (Deployments, StatefulSets, etc.) are always treated as continuous.

🚧

Legacy Annotation Support

For documentation on the legacy annotation format, which is now deprecated, see the Legacy Annotations Reference page .

Migration Guide

📘

Note

The annotations V2 structure cannot be combined with deprecated annotations V1. When the annotation workloads.cast.ai/configuration is detected, the workload is considered to be configured by using that annotation and all other annotations starting with workloads.cast.ai will be ignored.

To migrate from v1 to v2 annotations:

  1. Remove all individual legacy workloads.cast.ai/* annotations
  2. Add the new workloads.cast.ai/configuration annotation
  3. Move all settings into the YAML structure under the new annotation

For example, these v1 annotations:

workloads.cast.ai/vertical-autoscaling: "on"
workloads.cast.ai/cpu-target: "p80"
workloads.cast.ai/memory-max: "2Gi"

Would become:

workloads.cast.ai/configuration: |
  vertical:
    optimization: on
    cpu:
      target: p80
    memory:
      max: 2Gi