Annotations reference

All Workload Autoscaler settings are available by adding annotations to the workload controller. When the workloads.cast.ai/configuration annotation is detected on a workload, it will be considered as configured by annotations. This allows for flexible configuration, combining annotations and scaling policies.

Changes to the settings via the API/UI are no longer permitted for workloads with annotations. The default or scaling policy value is used when a workload does not have an annotation for a specific setting.

Annotation values take precedence over what is defined in a scaling policy. This means that if a scaling policy is defined in the workload configuration under annotations, all of the individual configuration options defined under the annotation will override the respective policy values. Those that are not defined under the annotation will use system defaults or what is defined in the scaling policy.

Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
  annotations:
    workloads.cast.ai/configuration: |
      scalingPolicyName: custom
      vertical:
        optimization: on
        applyType: immediate
        antiAffinity:
          considerAntiAffinity: false
        startup:
          period: 5m
        confidence:
          threshold: 0.5
        cpu:
          target: p81
          lookBackPeriod: 25h
          min: 1000m
          max: 2500m
          applyThresholdStrategy:
            type: defaultAdaptive
          overhead: 0.15
          limit:
            type: multiplier
            multiplier: 2.0
        memory:
          target: max
          lookBackPeriod: 30h
          min: 2Gi
          max: 10Gi
          applyThresholdStrategy:
            type: defaultAdaptive
          overhead: 0.35
          limit:
            type: noLimit
        downscaling:
          applyType: immediate
        memoryEvent:
          applyType: immediate
        containers:
          {container_name}:
            cpu:
              min: 10m
              max: 1000m
            memory:
              min: 10Mi
              max: 2048Mi
      rolloutBehavior:
        type: NoDisruption
        preferOneByOne: true
      horizontal:
        optimization: on
        minReplicas: 5
        maxReplicas: 10
        scaleDown:
          stabilizationWindow: 5m
      containersGrouping:
        - key: name
          operator: contains
          values: ["data-processor"]
          into: data-processor

Configuration Structure

Below is a configuration structure reference for setting up a workload to be controlled by annotations.

📘
Note
workloads.cast.ai/configuration has to be a valid YAML string. In cases where the annotation contains an invalid YAML string, the entire configuration will be ignored.

scalingPolicyName

If not set, the system will use the default scaling policy.

Field	Type	Required	Default	Description
`scalingPolicyName`	string	No	"default"	Specifies the scaling policy name to use. When set, this annotation allows the workload to be managed by both annotations and the specified scaling policy. The scaling policy can control global settings like enabling/disabling vertical autoscaling.

scalingPolicyName: custom-policy

vertical

Field	Type	Required	Default	Description
`vertical`	object	No	-	Vertical scaling configuration.

vertical:
  optimization: on
  applyType: immediate
  antiAffinity:
    considerAntiAffinity: false
  startup:
    period: 5m
  confidence:
    threshold: 0.5

vertical.optimization

Field	Type	Required	Default	Description
`optimization`	string	Yes		Enable vertical scaling ("on"/"off"). If using the `vertical` configuration option, this field becomes required.

Field

Type

Required

Default

Description

optimization

string

Enable vertical scaling ("on"/"off").

If using the vertical configuration option, this field becomes required.

vertical:
  optimization: on

vertical.applyType

Field	Type	Required	Default	Description
`applyType`	string	No	"immediate"	Allows configuring the autoscaler operating mode to apply the recommendations. Use `immediate` to apply recommendations as soon as the thresholds are passed. Note*: `immediate` mode can cause pod restarts. Use `deferred` to apply recommendations only on natural pod restarts.

Field

Type

Required

Default

Description

applyType

string

"immediate"

Allows configuring the autoscaler operating mode to apply the recommendations. Use immediate to apply recommendations as soon as the thresholds are passed.

Note*: immediate mode can cause pod restarts. Use deferred to apply recommendations only on natural pod restarts.

vertical:
  applyType: immediate

vertical.antiAffinity

Field	Type	Required	Default	Description
`antiAffinity`	object	No	-	Configuration for handling pod anti-affinity scheduling constraints.

vertical:
  antiAffinity:
    considerAntiAffinity: false

vertical.antiAffinity.considerAntiAffinity

Field	Type	Required	Default	Description
`considerAntiAffinity`	boolean	Yes	`false`	When `true`, workload autoscaler will respect pod anti-affinity rules on hostname or host port and, as a result, issue recommendations for these pods in a deferred manner. When `false` (default), recommendations for pods containing one of the constraints above will be applied immediately instead. If using the `vertical.antiAffinity` configuration option, this field becomes required.

Field

Type

Required

Default

Description

considerAntiAffinity

boolean

false

When true, workload autoscaler will respect pod anti-affinity rules on hostname or host port and, as a result, issue recommendations for these pods in a deferred manner. When false (default), recommendations for pods containing one of the constraints above will be applied immediately instead.

If using the vertical.antiAffinity configuration option, this field becomes required.

vertical:
  antiAffinity:
    considerAntiAffinity: false

vertical.startup

Field	Type	Required	Default	Description
`startup`	object	No		Configuration for handling workload startup behavior. See Startup metrics.

Field

Type

Required

Default

Description

startup

object

Configuration for handling workload startup behavior.

See Startup metrics.

vertical:
  startup:
    period: 5m

vertical.startup.period

Field	Type	Required	Default	Description
`period`	duration	Yes	"2m"	Duration to ignore resource usage metrics after workload startup. Useful for applications with high initial resource usage spikes. Set to `0s` to disable startup metrics ignore period completely. Valid values range from `0s` (disabled) to `60m`. If using the `vertical.startup` configuration option, this field becomes required.

Field

Type

Required

Default

Description

period

duration

"2m"

Duration to ignore resource usage metrics after workload startup. Useful for applications with high initial resource usage spikes. Set to 0s to disable startup metrics ignore period completely.

Valid values range from 0s (disabled) to 60m.

If using the vertical.startup configuration option, this field becomes required.

vertical:
  startup:
    period: 5m  # Example: ignore first 5 minutes of metrics

vertical:
  startup:
    period: 0s  # Disable startup metrics ignore period

vertical.confidence

Field	Type	Required	Default	Description
`confidence`	object	No	-	Configuration for recommendation confidence thresholds.

vertical:
  confidence:
    threshold: 0.5
    required:  false

vertical.confidence.required

Field	Type	Required	Default	Description
`required`	bool	Yes		When set to: `true`: The workload will require confidence metrics before optimization, even when directly enabled via annotations. This prevents immediate optimization of workloads with highly variable load patterns until sufficient historical data is available. `false`: When enabled via annotations, the workload will be optimized immediately, bypassing the confidence check. Note: This option is not recommended.\ If using the `vertical.confidence` configuration option, at least one confidence field is required.

Field

Type

Required

Default

Description

required

bool

When set to:

true: The workload will require confidence metrics before optimization, even when directly enabled via annotations. This prevents immediate optimization of workloads with highly variable load patterns until sufficient historical data is available.
false: When enabled via annotations, the workload will be optimized immediately, bypassing the confidence check.
- *Note: **This option is not recommended.\
  *If using the vertical.confidence configuration option, at least one confidence field is required.

vertical:
  confidence:
    required: true

vertical.confidence.threshold

Field	Type	Required	Default	Description
`threshold`	float	Yes	0.9	Minimum confidence score required to apply recommendations (0.0-1.0). Higher values require more data points for recommendations. If using the `vertical.confidence` configuration option, at least one confidence field is required.

Field

Type

Required

Default

Description

threshold

float

0.9

Minimum confidence score required to apply recommendations (0.0-1.0). Higher values require more data points for recommendations.

If using the vertical.confidence configuration option, at least one confidence field is required.

vertical:
  confidence:
    threshold: 0.5

vertical.cpu

Field	Type	Required	Default	Description
`cpu`	object	No	-	CPU-specific scaling configuration.

vertical:
  cpu:
    target: p80
    lookBackPeriod: 24h
    min: 100m
    max: 1000m
    applyThresholdStrategy:
    	type: defaultAdaptive
    overhead: 0.1

vertical.cpu.target

Field	Type	Required	Default	Description
`target`	string	No	"p80"	Resource usage target: `max` - Use maximum observed usage `p{0-99.9}` - Use percentile (e.g., `p80` for 80th percentile).

Field

Type

Required

Default

Description

target

string

"p80"

Resource usage target:

max - Use maximum observed usage
p{0-99.9} - Use percentile (e.g., p80 for 80th percentile).

vertical:
  cpu:
    target: p80

vertical.cpu.lookBackPeriod

Field	Type	Required	Default	Description
`lookBackPeriod`	duration	No	"24h"	Historical resource usage data window to consider for recommendations (3h-168h). See Look-back Period.

Field

Type

Required

Default

Description

lookBackPeriod

duration

"24h"

Historical resource usage data window to consider for recommendations (3h-168h).

See Look-back Period.

vertical:
  cpu:
    lookBackPeriod: 24h

vertical.cpu.min

Field	Type	Required	Default	Description
`min`	string	No	"10m"	The lower limit for the recommendation uses standard Kubernetes CPU notation (e.g., "1000m" or "1"). The minimum cannot be greater than the maximum.

vertical:
  cpu:
    min: 100m

vertical.cpu.max

Field	Type	Required	Default	Description
`max`	string	No	-	The upper limit for the recommendation. It uses standard Kubernetes CPU notation (e.g., "1000m" or "1"). Recommendations will not exceed this value.

vertical:
  cpu:
    max: 1000m

vertical.cpu.applyThreshold

🚧
Deprecation Notice
The applyThreshold configuration option is deprecated but still supported for backward compatibility. We strongly recommend migrating to the new applyThresholdStrategy configuration format for future compatibility and access to the latest features. See applyThresholdStrategy.

Field	Type	Required	Default	Description
`applyThreshold`	float	No	0.1	The relative difference required between current and recommended resource values to apply a change immediately: for upscaling, the difference is calculated relative to current resource requests; for downscaling, it's calculated relative to the new recommended value. For example, with a threshold of 0.1 (10%), an upscale from 100m to 120m CPU would be applied immediately (20% increase relative to current 100m), while an upscale from 110m to 120m would not be applied immediately (8% increase relative to new 120m).Value range: 0.01-2.5.

Field

Type

Required

Default

Description

applyThreshold

float

0.1

The relative difference required between current and recommended resource values to apply a change immediately:

for upscaling, the difference is calculated relative to current resource requests;
for downscaling, it's calculated relative to the new recommended value. For example, with a threshold of 0.1 (10%), an upscale from 100m to 120m CPU would be applied immediately (20% increase relative to current 100m), while an upscale from 110m to 120m would not be applied immediately (8% increase relative to new 120m).Value range: 0.01-2.5.

vertical:
  cpu:
    applyThreshold: 0.1

vertical.cpu.applyThresholdStrategy

🚧
Warning
applyThreshold and applyThresholdStrategy cannot be used simultaneously in a configuration as that will result in an error. applyThresholdStrategy is the latest and recommended configuration option.

Field	Type	Required	Default	Description
`applyThresholdStrategy`	object	No	-	Configuration for the strategy used to determine when recommendations should be applied. The strategy determines how the threshold percentage is calculated based on current resource requests.

vertical:
  cpu:
    applyThresholdStrategy:
      type: defaultAdaptive

vertical.cpu.applyThresholdStrategy.type

Field	Type	Required	Default	Description
`type`	string	Yes	"defaultAdaptive"	The type of threshold strategy to use: `defaultAdaptive` - Automatically adjusts thresholds based on workload size. `percentage` - Uses a fixed percentage threshold. `customAdaptive` - Allows custom configuration of the adaptive threshold formula. Recommended for power users only. It works in the same way as the Default Adaptive Threshold, but it allows tweaking the parameters of the adaptive threshold formula.The `defaultAdaptive` threshold option uses the following values: `numerator` = `0.5` `denominator` = `1` `exponent` = `1` (same effect as not used, i.e., no influence on the calculation)*Required when using `applyThresholdStrategy`

Field

Type

Required

Default

Description

type

string

"defaultAdaptive"

The type of threshold strategy to use:

defaultAdaptive - Automatically adjusts thresholds based on workload size.
percentage - Uses a fixed percentage threshold.
customAdaptive - Allows custom configuration of the adaptive threshold formula. Recommended for power users only. It works in the same way as the Default Adaptive Threshold, but it allows tweaking the parameters of the adaptive threshold formula.The defaultAdaptive threshold option uses the following values: numerator = 0.5 denominator = 1 exponent = 1 (same effect as not used, i.e., no influence on the calculation)*Required when using applyThresholdStrategy

vertical:
  cpu:
    applyThresholdStrategy:
      type: defaultAdaptive  # Using default adaptive threshold

vertical.cpu.applyThresholdStrategy.percentage

Field	Type	Required	Default	Description
`percentage`	float	Yes		The fixed percentage threshold to use. Value range: 0.01-2.5. Required when type is `percentage`

Field

Type

Required

Default

Description

percentage

float

The fixed percentage threshold to use. Value range: 0.01-2.5.

Required when type is percentage

vertical:
  cpu:
    applyThresholdStrategy:
      type: percentage      # Using fixed percentage threshold
      percentage: 0.3      # 30% threshold

vertical.cpu.applyThresholdStrategy.numerator

Field	Type	Required	Default	Description
`numerator`	float	Yes	0.5	Affects the vertical stretch of the threshold function. Lower values create smaller thresholds. Required when type is `customAdaptive`

Field

Type

Required

Default

Description

numerator

float

0.5

Affects the vertical stretch of the threshold function. Lower values create smaller thresholds.

Required when type is customAdaptive

vertical:
  cpu:
    applyThresholdStrategy:
      type: customAdaptive    # Using custom adaptive threshold
      numerator: 0.1
      exponent: 0.1
      denominator: 2

vertical.cpu.applyThresholdStrategy.denominator

Field	Type	Required	Default	Description
`denominator`	float	Yes	1	Affects threshold sensitivity for small workloads. Values close to `0` result in larger thresholds for small workloads. For example, when `numerator` is `1`, `exponent` is `1` and `denominator` is `0` the threshold for `0.5` req. CPU will be 200%. Required when type is `customAdaptive`

Field

Type

Required

Default

Description

denominator

float

Affects threshold sensitivity for small workloads. Values close to 0 result in larger thresholds for small workloads. For example, when numerator is 1, exponent is 1 and denominator is 0 the threshold for 0.5 req. CPU will be 200%.

Required when type is customAdaptive

vertical:
  cpu:
    applyThresholdStrategy:
      type: customAdaptive
      numerator: 0.1
      exponent: 0.1
      denominator: 2

vertical.cpu.applyThresholdStrategy.exponent

Field	Type	Required	Default	Description
`exponent`	float	Yes	1	Controls how quickly the threshold decreases for larger workloads. Lower values prevent extremely small thresholds for large resources. Required when type is `customAdaptive`

Field

Type

Required

Default

Description

exponent

float

Controls how quickly the threshold decreases for larger workloads. Lower values prevent extremely small thresholds for large resources.

Required when type is customAdaptive

vertical:
  cpu:
    applyThresholdStrategy:
      type: customAdaptive
      numerator: 0.1
      exponent: 0.1
      denominator: 2

vertical.cpu.overhead

Field	Type	Required	Default	Description
`overhead`	float	No	0.1	Additional resource buffer when applying recommendations (0.0-2.5, e.g., 0.1 = 10%). If a 10% buffer is configured, the issued recommendation will have +10% added to it, so that the workload can handle further increased resource demand.

Field

Type

Required

Default

Description

overhead

float

0.1

Additional resource buffer when applying recommendations (0.0-2.5, e.g., 0.1 = 10%).

If a 10% buffer is configured, the issued recommendation will have +10% added to it, so that the workload can handle further increased resource demand.

vertical:
  cpu:
    overhead: 0.1

vertical.cpu.limit

Field	Type	Required	Default	Description
`limit`	object	No		Configuration for container CPU limit scaling. Default behaviour when not specified: If `spec.containers[].resources.limits.cpu` is not defined on the workload, no limit is set by the workload autoscaler. If `spec.containers[].resources.limits.cpu` is defined on the workload, it is removed by Workload Autoscaler.

Field

Type

Required

Default

Description

limit

object

Configuration for container CPU limit scaling.

Default behaviour when not specified:

If spec.containers[].resources.limits.cpu is not defined on the workload, no limit is set by the workload autoscaler.
If spec.containers[].resources.limits.cpu is defined on the workload, it is removed by Workload Autoscaler.

vertical:
  cpu:
    limit:
      type: multiplier
      multiplier: 2.0

vertical.cpu.limit.type

Field	Type	Required	Default	Description
`type`	string	*Yes		Type of CPU limit scaling to apply: `noLimit` - Remove the resource limit from the workload definition entirely. `multiplier` - Set limit as a multiple of requests using the following formula: `resources.requests.cpu * {multiplier}`. The behavior can be further controlled with optional flags `onlyIfOriginalExist` and `onlyIfOriginalLower`. `keepLimits` - Preserve the existing limits as defined in the workload manifest without modification. *If using the `vertical.cpu.limit` configuration option, this field becomes required.

Field

Type

Required

Default

Description

type

string

*Yes

Type of CPU limit scaling to apply:

noLimit - Remove the resource limit from the workload definition entirely.
multiplier - Set limit as a multiple of requests using the following formula: resources.requests.cpu * {multiplier}. The behavior can be further controlled with optional flags onlyIfOriginalExist and onlyIfOriginalLower.
keepLimits - Preserve the existing limits as defined in the workload manifest without modification.

*If using the vertical.cpu.limit configuration option, this field becomes required.

vertical:
  cpu:
    limit:
      type: keepLimits

vertical.cpu.limit.multiplier

Field	Type	Required	Default	Description
`multiplier`	float	*Yes		Value to multiply the requests by to set the limit. The calculation is: `resources.requests.cpu * {multiplier}`. The value must be greater than or equal to `1`. *Required when type is set to `multiplier`.

Field

Type

Required

Default

Description

multiplier

float

*Yes

Value to multiply the requests by to set the limit. The calculation is: resources.requests.cpu * {multiplier}. The value must be greater than or equal to 1.

*Required when type is set to multiplier.

vertical:
  cpu:
    limit:
      type: multiplier
      multiplier: 2.0

vertical.cpu.limit.onlyIfOriginalExist

Field	Type	Required	Default	Description
`onlyIfOriginalExist`	boolean	No	false	When set to `true`, CPU limits will only be set if the workload originally had CPU limits defined in its manifest. If the original workload has no CPU limits specified, no limits will be added. This flag allows conditional limit management based on the original workload configuration. Only applicable when the type is set to `multiplier`.

Field

Type

Required

Default

Description

onlyIfOriginalExist

boolean

false

When set to true, CPU limits will only be set if the workload originally had CPU limits defined in its manifest. If the original workload has no CPU limits specified, no limits will be added.

This flag allows conditional limit management based on the original workload configuration.

Only applicable when the type is set to multiplier.

vertical:
  cpu:
    limit:
      type: multiplier
      multiplier: 2.0
      onlyIfOriginalExist: true

vertical.cpu.limit.onlyIfOriginalLower

Field	Type	Required	Default	Description
`onlyIfOriginalLower`	boolean	No	false	When set to `true`, CPU limits will only be updated if the original limits are lower than the calculated value (requests × multiplier). If the original limits are already higher than the calculated value, they remain unchanged. This flag prevents reducing existing limits and ensures limits only increase when beneficial. Only applicable when the type is set to `multiplier`.

Field

Type

Required

Default

Description

onlyIfOriginalLower

boolean

false

When set to true, CPU limits will only be updated if the original limits are lower than the calculated value (requests × multiplier). If the original limits are already higher than the calculated value, they remain unchanged.

This flag prevents reducing existing limits and ensures limits only increase when beneficial.

Only applicable when the type is set to multiplier.

vertical:
  cpu:
    limit:
      type: multiplier
      multiplier: 2.0
      onlyIfOriginalLower: true

Combining both flags:

When both onlyIfOriginalExist and onlyIfOriginalLower are set to true, the behavior matches Memory's Automatic mode: limits are only set when the workload originally had limits defined AND only when those original limits are lower than the calculated value.

vertical:
  cpu:
    limit:
      type: multiplier
      multiplier: 1.5
      onlyIfOriginalExist: true
      onlyIfOriginalLower: true

vertical.cpu.optimization

Field	Type	Required	Default	Description
`optimization`	string	no		This configuration option can be used to disable CPU management for workloads that benefit from memory management only. The workload will then use CPU requests/limits configured in the template. The only allowed option is off. If not set, the resource will inherit the workload management option as normal. Minimum required `workload-autoscaler` component version to use this feature is `v0.23.1`.

Field

Type

Required

Default

Description

optimization

string

This configuration option can be used to disable CPU management for workloads that benefit from memory management only. The workload will then use CPU requests/limits configured in the template.

The only allowed option is off. If not set, the resource will inherit the workload management option as normal. Minimum required workload-autoscaler component version to use this feature is v0.23.1.

vertical:
  cpu:
    optimization: off

vertical.memory

Field	Type	Required	Default	Description
`memory`	object	No	-	Memory-specific scaling configuration.

vertical:
  memory:
    target: max
    lookBackPeriod: 24h
    min: 128Mi
    max: 2Gi
    applyThresholdStrategy:
    	type: defaultAdaptive
    overhead: 0.1

vertical.memory.target

Field	Type	Required	Default	Description
`target`	string	No	"max"	Resource usage target: `max` - Use maximum observed usage. `p{0-99.9}` - Use percentile (e.g., `p80` for 80th percentile).

Field

Type

Required

Default

Description

target

string

"max"

Resource usage target:

max - Use maximum observed usage.
p{0-99.9} - Use percentile (e.g., p80 for 80th percentile).

vertical:
  memory:
    target: max

vertical.memory.lookBackPeriod

Field	Type	Required	Default	Description
`lookBackPeriod`	duration	No	"24h"	Historical resource usage data window to consider for recommendations (3h-168h). See Look-back Period.

Field

Type

Required

Default

Description

lookBackPeriod

duration

"24h"

Historical resource usage data window to consider for recommendations (3h-168h).

See Look-back Period.

vertical:
  memory:
    lookBackPeriod: 24h

vertical.memory.min

Field	Type	Required	Default	Description
`min`	string	No	"10Mi"	Minimum resource limit. Uses standard Kubernetes memory notation (e.g., "2Gi", "1000Mi").

vertical:
  memory:
    min: 128Mi

vertical.memory.max

Field	Type	Required	Default	Description
`max`	string	No	-	Maximum resource limit. Uses standard Kubernetes memory notation (e.g., "2Gi", "1000Mi").

vertical:
  memory:
    max: 2Gi

vertical.memory.applyThreshold

🚧
Deprecation Notice
The applyThreshold configuration option is deprecated but still supported for backward compatibility. We strongly recommend migrating to the new applyThresholdStrategy configuration format for future compatibility and access to the latest features. See applyThresholdStrategy.

Field	Type	Required	Default	Description
`applyThreshold`	float	No	0.1	The relative difference required between current and recommended resource values to apply a change immediately: for upscaling, the difference is calculated relative to current resource requests; for downscaling, it's calculated relative to the new recommended value. For example, with a threshold of 0.1 (10%), an upscale from 100MiB to 120MiB of memory would be applied immediately (20% increase relative to current 100MiB), while an upscale from 110MiB to 120MiB would not be applied immediately (8% increase relative to new 120MiB). Value range: 0.01-2.5.

Field

Type

Required

Default

Description

applyThreshold

float

0.1

The relative difference required between current and recommended resource values to apply a change immediately:

for upscaling, the difference is calculated relative to current resource requests;
for downscaling, it's calculated relative to the new recommended value. For example, with a threshold of 0.1 (10%), an upscale from 100MiB to 120MiB of memory would be applied immediately (20% increase relative to current 100MiB), while an upscale from 110MiB to 120MiB would not be applied immediately (8% increase relative to new 120MiB). Value range: 0.01-2.5.

vertical:
  memory:
    applyThreshold: 0.1

vertical.memory.applyThresholdStrategy

🚧
Warning
applyThreshold and applyThresholdStrategy cannot be used simultaneously in a configuration as that will result in an error. applyThresholdStrategy is the latest and recommended configuration option.

Field	Type	Required	Default	Description
`applyThresholdStrategy`	object	No	-	Configuration for the strategy used to determine when recommendations should be applied. The strategy determines how the threshold percentage is calculated based on current resource requests.

vertical:
  memory:
    applyThresholdStrategy:
      type: customAdaptive    # Using custom adaptive threshold
      numerator: 0.1
      exponent: 0.1
      denominator: 2

vertical.memory.applyThresholdStrategy.type

Field	Type	Required	Default	Description
`type`	string	Yes	"defaultAdaptive"	The type of threshold strategy to use: `defaultAdaptive` - Automatically adjusts thresholds based on workload size. `percentage` - Uses a fixed percentage threshold. `customAdaptive` - Allows custom configuration of the adaptive threshold formula. Recommended for power users only. It works in the same way as the Default Adaptive Threshold, but it allows tweaking the parameters of the adaptive threshold formula. The `defaultAdaptive` threshold option uses the following values: `numerator` = `0.5` `denominator` = `1` `exponent` = `1` (same effect as not used, i.e., no influence on the calculation)*Required when using `applyThresholdStrategy`

Field

Type

Required

Default

Description

type

string

"defaultAdaptive"

The type of threshold strategy to use:

defaultAdaptive - Automatically adjusts thresholds based on workload size.
percentage - Uses a fixed percentage threshold.
customAdaptive - Allows custom configuration of the adaptive threshold formula. Recommended for power users only. It works in the same way as the Default Adaptive Threshold, but it allows tweaking the parameters of the adaptive threshold formula. The defaultAdaptive threshold option uses the following values: numerator = 0.5 denominator = 1 exponent = 1 (same effect as not used, i.e., no influence on the calculation)*Required when using applyThresholdStrategy

vertical:
  memory:
    applyThresholdStrategy:
      type: defaultAdaptive  # Using default adaptive threshold

vertical.memory.applyThresholdStrategy.percentage

Field	Type	Required	Default	Description
`percentage`	float	Yes		The fixed percentage threshold to use. Value range: 0.01-2.5. Required when type is `percentage`

Field

Type

Required

Default

Description

percentage

float

The fixed percentage threshold to use. Value range: 0.01-2.5.

Required when type is percentage

vertical:
  memory:
    applyThresholdStrategy:
      type: percentage      # Using fixed percentage threshold
      percentage: 0.3      # 30% threshold

vertical.memory.applyThresholdStrategy.numerator

Field	Type	Required	Default	Description
`numerator`	float	Yes	0.5	Affects the vertical stretch of the threshold function. Lower values create smaller thresholds. Required when type is `customAdaptive`

Field

Type

Required

Default

Description

numerator

float

0.5

Affects the vertical stretch of the threshold function. Lower values create smaller thresholds.

Required when type is customAdaptive

vertical:
  memory:
    applyThresholdStrategy:
      type: customAdaptive    # Using custom adaptive threshold
      numerator: 0.1
      exponent: 0.1
      denominator: 2

vertical.memory.applyThresholdStrategy.denominator

Field	Type	Required	Default	Description
`denominator`	float	Yes	1	Affects threshold sensitivity for small workloads. Values close to `0` result in larger thresholds for small workloads. For example, when `numerator` is `1`, `exponent` is `1` and `denominator` is `0` the threshold for `0.5` req. Memory will be 200%. Required when type is `customAdaptive`

Field

Type

Required

Default

Description

denominator

float

Required when type is customAdaptive

vertical:
  memory:
    applyThresholdStrategy:
      type: customAdaptive
      numerator: 0.1
      exponent: 0.1
      denominator: 2

vertical.cpu.applyThresholdStrategy.exponent

Field	Type	Required	Default	Description
`exponent`	float	Yes	1	Controls how quickly the threshold decreases for larger workloads. Lower values prevent extremely small thresholds for large resources. Required when type is `customAdaptive`

Field

Type

Required

Default

Description

exponent

float

Controls how quickly the threshold decreases for larger workloads. Lower values prevent extremely small thresholds for large resources.

Required when type is customAdaptive

vertical:
  memory:
    applyThresholdStrategy:
      type: customAdaptive
      numerator: 0.1
      exponent: 0.1
      denominator: 2

vertical.memory.overhead

Field	Type	Required	Default	Description
`overhead`	float	No	0.1	Additional resource buffer when applying recommendations (0.0-2.5, e.g., 0.1 = 10%). If a 10% buffer is configured, the issued recommendation will have +10% added to it so that the workload can handle further increased resource demand.

Field

Type

Required

Default

Description

overhead

float

0.1

Additional resource buffer when applying recommendations (0.0-2.5, e.g., 0.1 = 10%).

If a 10% buffer is configured, the issued recommendation will have +10% added to it so that the workload can handle further increased resource demand.

vertical:
  memory:
    overhead: 0.1

vertical.memory.limit

Field	Type	Required	Default	Description
`limit`	object	No		Configuration for container memory limit scaling. The default behavior when not specified: If `spec.containers[].resources.limits.memory` is not defined on the workload, no limit is set by Workload Autoscaler. If `spec.containers[].resources.limits.memory` is defined on the workload, Workload Autoscaler calculates the limit using the following formula: `max(max(resources.requests.memory * 1.5, 128MiB), current.resources.limit)` (it will only increase the limit, but never lower it, and will enforce a minimum of 128MiB).

Field

Type

Required

Default

Description

limit

object

Configuration for container memory limit scaling.

The default behavior when not specified:

If spec.containers[].resources.limits.memory is not defined on the workload, no limit is set by Workload Autoscaler.
If spec.containers[].resources.limits.memory is defined on the workload, Workload Autoscaler calculates the limit using the following formula: max(max(resources.requests.memory * 1.5, 128MiB), current.resources.limit) (it will only increase the limit, but never lower it, and will enforce a minimum of 128MiB).

vertical:
  memory:
    limit:
      type: multiplier  
      multiplier: 1.5

vertical.memory.limit.type

Field	Type	Required	Default	Description
`type`	string	*Yes		Type of limit scaling to apply: `noLimit` - Remove the resource limit from the workload definition entirely. `multiplier` - Set limit as a multiple of requests using the following formula: `resources.requests.memory * {multiplier}`. `keepLimits` - Preserve the existing limits as defined in the workload manifest without modification. *If using the `vertical.memory.limit` configuration option, this field (type) becomes required.

Field

Type

Required

Default

Description

type

string

*Yes

Type of limit scaling to apply:

noLimit - Remove the resource limit from the workload definition entirely.
multiplier - Set limit as a multiple of requests using the following formula: resources.requests.memory * {multiplier}.
keepLimits - Preserve the existing limits as defined in the workload manifest without modification.

*If using the vertical.memory.limit configuration option, this field (type) becomes required.

vertical:
  memory:
    limit:
      type: keepLimits

vertical.memory.limit.multiplier

Field	Type	Required	Default	Description
`multiplier`	float	Yes		Value to multiply the requests by to set the limit on the workload. The calculation is: `max(resources.requests.memory * {multiplier}, 128MiB)`. The value must be greater than or equal to `1`. Note that the recommended limit will never be less than 128MiB. *Required when type is set to `multiplier`.

Field

Type

Required

Default

Description

multiplier

float

Value to multiply the requests by to set the limit on the workload. The calculation is: max(resources.requests.memory * {multiplier}, 128MiB). The value must be greater than or equal to 1. Note that the recommended limit will never be less than 128MiB.

*Required when type is set to multiplier.

vertical:
  memory:
    limit:
      type: multiplier  
      multiplier: 1.5

vertical.memory.optimization

Field	Type	Required	Default	Description
`optimization`	string	no		This configuration option can be used to disable memory management for workloads that benefit from CPU management only (e.g., Java workloads with a fixed heap size). The workload will then use memory requests/limits configured in the template. The only allowed option is off. If not set, the resource will inherit the workload management option as normal. Minimum required `workload-autoscaler` component version to use this feature is `v0.23.1`.

Field

Type

Required

Default

Description

optimization

string

This configuration option can be used to disable memory management for workloads that benefit from CPU management only (e.g., Java workloads with a fixed heap size). The workload will then use memory requests/limits configured in the template.

The only allowed option is off. If not set, the resource will inherit the workload management option as normal. Minimum required workload-autoscaler component version to use this feature is v0.23.1.

vertical:
  memory:
    optimization: off

vertical.predictiveScaling

Field	Type	Required	Default	Description
`predictiveScaling`	object	No	-	Predictive scaling configuration for CPU.

vertical:
  predictiveScaling:
    cpu:
      enabled: true

vertical.predictiveScaling.cpu

Field	Type	Required	Default	Description
`cpu`	object	No	-	CPU-specific predictive scaling settings.

vertical:
  predictiveScaling:
    cpu:
      enabled: true

vertical.predictiveScaling.cpu.enabled

Field	Type	Required	Default	Description
`enabled`	boolean	No	false	Enable predictive scaling for CPU resources. When enabled, the system forecasts CPU usage based on historical patterns and generates proactive recommendations. Requires `vertical.optimization` to be set to `on`. Predictive scaling can be enabled on any workload via annotations, even if it is not currently eligible for scaling in this manner. The system will automatically activate predictive scaling once the workload becomes predictable and will seamlessly revert to standard scaling if the patterns are lost. This allows preemptive enablement without monitoring for eligibility. See Predictive workload scaling for more information.

Field

Type

Required

Default

Description

enabled

boolean

false

Enable predictive scaling for CPU resources. When enabled, the system forecasts CPU usage based on historical patterns and generates proactive recommendations. Requires vertical.optimization to be set to on.

Predictive scaling can be enabled on any workload via annotations, even if it is not currently eligible for scaling in this manner. The system will automatically activate predictive scaling once the workload becomes predictable and will seamlessly revert to standard scaling if the patterns are lost. This allows preemptive enablement without monitoring for eligibility.

See Predictive workload scaling for more information.

vertical:
  predictiveScaling:
    cpu:
      enabled: true

vertical.downscaling

Field	Type	Required	Default	Description
`downscaling`	object	No	-	Downscaling behavior override.

vertical:
  downscaling:
    applyType: immediate

vertical.downscaling.applyType

Field

Type

Required

Default

Description

applyType

string

Default is taken from the vertical scaling policy controlling the workload.

Override application mode:

immediate - Apply changes immediately
deferred - Apply during natural restarts

vertical:
  downscaling:
    applyType: immediate

vertical.memoryEvent

Field	Type	Required	Default	Description
`memoryEvent`	object	No	-	Memory event behavior override.

vertical:
  memoryEvent:
    applyType: immediate

vertical.memoryEvent.applyType

This configuration option is fully compatible with other applyType options and is meant to be used in combination with them. This allows for fine-grained control over both upscaling and downscaling. Here's how they interact:

If both configuration options are set to the same value (both immediate or both deferred), the behavior remains unchanged.
If vertical.downscaling.applyType is set to deferred and vertical.memoryEvent.applyType is set to immediate:
- Upscaling operations will be applied immediately.
- Downscaling operations will be deferred to natural pod restarts.
If vertical.downscaling.applyType is set to immediate and vertical.memoryEvent.applyType is set to deferred:
- Upscaling operations will be deferred to natural pod restarts.
- Downscaling operations will be applied immediately.

Field

Type

Required

Default

Description

applyType

string

Default is taken from the vertical scaling policy controlling the workload.

Override application mode for memory-related events (OOM kills, pressure):

immediate - Apply changes immediately
deferred - Apply during natural restarts*If using the vertical.memoryEvent configuration option, this field becomes required.

vertical:
  memoryEvent:
    applyType: immediate

vertical.containers

Configuration object that contains container-specific settings.

Field	Type	Required	Default	Description
containers	object	No	-	Container configuration mapping.

vertical:
  containers:

vertical.containers.{container_name}

Configuration object that contains resource constraints for a specific container. Replace {container_name} with the name of your container.

Field	Type	Required	Default	Description
{container_name}	object	No	-	Container resource configuration object. Has to be the name of the container for which resources are being configured.

vertical:
  containers:
    {container_name}:

📘
Note
Container constraints apply to application containers (spec.containers[]) and native sidecar containers (spec.initContainers[] with restartPolicy: Always). Traditional initContainers that run during pod startup are not optimized by Workload Autoscaler.

vertical.containers.{container_name}.cpu

Container CPU constraints. Set minimum and maximum CPU limits for a specific container to define the workload autoscaler's scaling range.

Field	Type	Required	Default	Description
max	string	No	-	The upper limit for the recommendation. Uses standard Kubernetes CPU notation (e.g., "1000m" or "1"). Recommendations won't exceed this value.
min	string	No	-	The lower limit for the recommendation. Uses standard Kubernetes CPU notation (e.g., "1000m" or "1"). Min cannot be greater than max.

vertical:
  containers:
    {container_name}:
      cpu:
        min: 10m
        max: 1000m

vertical.containers.{container_name}.memory

Container memory constraints. Set minimum and maximum memory limits for a specific container to define the workload autoscaler's scaling range.

Field	Type	Required	Default	Description
min	string	No	-	Minimum resource limit. Uses standard Kubernetes memory notation (e.g., "2Gi", "1000Mi").
max	string	No	-	Maximum resource limit. Uses standard Kubernetes memory notation (e.g., "2Gi", "1000Mi").

vertical:
  containers:
    {container_name}:
      memory:
        min: 10Mi
        max: 2048Mi

rolloutBehavior

Field	Type	Required	Default	Description
`rolloutBehavior`	object	No	-	Configuration for controlling how recommendations are rolled out.

rolloutBehavior:
  type: NoDisruption

rolloutBehavior.type

Field

Type

Required

Default

Description

type

string

Controls how recommendation updates are rolled out:

NoDisruption Ensures zero-downtime updates for single-replica workloads by temporarily scaling to two replicas during updates. This prevents service interruptions when applying new resource recommendations.

*If using the rolloutBehavior configuration option, this field becomes required.

Note: Requires workload-autoscaler component version v0.35.3 or higher. This setting is incompatible with the deferred apply type - if you change policy to deferred, you must remove the rolloutBehavior setting. This setting is applicable only to Deployment resources with a single replica, whose rollout strategy allows for downtime.

rolloutBehavior:
  type: NoDisruption

rolloutBehavior.preferOneByOne

Field

Type

Required

Default

Description

preferOneByOne

boolean

false

When true, enables a one-by-one pod restart strategy for immediate mode recommendations. This applies recommendations sequentially, waiting for each pod to become healthy before proceeding to the next. This reduces disruption for workloads without Pod Disruption Budgets or with aggressive rollout strategies.

Note: Requires workload-autoscaler component version v0.57.0 or higher. This setting applies only to immediate mode and has no effect when using the deferred apply type. Supported for replicated workloads (Deployments, StatefulSets, ReplicaSets, ArgoCD Rollouts).

See Pod restart strategy for more information.

rolloutBehavior:
  type: NoDisruption
  preferOneByOne: true

horizontal

Field	Type	Required	Default	Description
`horizontal`	object	No	-	Horizontal scaling configuration.

horizontal:
  optimization: on
  minReplicas: 1
  maxReplicas: 10
  scaleDown:
    stabilizationWindow: 5m
  shortAverage: 3m

horizontal.optimization

Field

Type

Required

Default

Description

optimization

string

Yes*

Enable horizontal scaling ("on"/"off").

If using the horizontal configuration option, this field becomes required.

horizontal:
  optimization: on

horizontal.minReplicas

Field	Type	Required	Default	Description
`minReplicas`	integer	Yes*	-	Minimum number of replicas.

horizontal:
  minReplicas: 1

horizontal.maxReplicas

Field	Type	Required	Default	Description
`maxReplicas`	integer	Yes*	-	Maximum number of replicas.

horizontal:
  maxReplicas: 10

horizontal.scaleDown

Field	Type	Required	Default	Description
`scaleDown`	object	No	-	Houses scaledown cofiguration options.

horizontal.scaleDown.stabilizationWindow

Field

Type

Required

Default

Description

stabilizationWindow

duration

"5m"

Cooldown period between scale-downs.

If using the horizontal.scaleDown configuration option, this field becomes required.

horizontal:
  scaleDown:
    stabilizationWindow: 5m

*Required if the parent object is present in the configuration.

containersGrouping

Field	Type	Required	Default	Description
`containersGrouping`	array	No	-	Rules for grouping dynamically generated containers with similar naming patterns.

containersGrouping:
  - key: name
    operator: contains
    values: ["data-processor"]
    into: data-processor

containersGrouping.[].key

Field	Type	Required	Default	Description
`key`	string	*Yes	-	The attribute used to match containers. Currently, only supports `name` which refers to the container name property.

containersGrouping.[].operator

Field	Type	Required	Default	Description
`operator`	string	*Yes	-	Defines how the `key` is evaluated against the `values` list. Currently, only supports contains.

containersGrouping.[].values

Field	Type	Required	Default	Description
`values`	array	*Yes	-	A list of string values used for matching against the `key` with the specified operator. Must contain at least one item.

containersGrouping.[].into

Field	Type	Required	Default	Description
`into`	string	*Yes	-	The target container name into which matching containers should be grouped.

Example usage

See Container Grouping for Dynamic Containers.

*Required if the parent object is present in the configuration.

schedule

Field	Type	Required	Default	Description
`schedule`	object	No	-	Controls whether a custom workload is treated as job-like (sporadic execution) or continuous (always running) for optimization purposes.

schedule:
  type: jobLike

schedule.type

Field

Type

Required

Default

Description

type

string

Explicitly sets whether the custom workload should be treated as job-like or continuous:

jobLike– Treats the workload as a job with sporadic execution patterns. For confidence, it requires only 3 runs per configured look-back period and keeps recommendations in the cluster even when no pods are running.
continuous– Treats the workload as a long-running application. It requires more metric density for confidence.Note: This configuration is only applicable to custom workloads (those with the workloads.cast.ai/custom-workload label). It does not affect native Kubernetes Jobs or other standard workload types.

schedule:
  type: jobLike  # For sporadic workloads like batch jobs

🚧
Important
The schedule.type configuration only applies to custom job-like workloads identified by the workloads.cast.ai/custom-workload label. Native Kubernetes Jobs with this label are always treated as job-like, and standard workload types (Deployments, StatefulSets, etc.) are always treated as continuous.

🚧
Legacy Annotation Support
For documentation on the legacy annotation format, which is now deprecated, see the Legacy Annotations Reference page .

Migration Guide

📘
Note
The annotations V2 structure cannot be combined with deprecated annotations V1. When the annotation workloads.cast.ai/configuration is detected, the workload is considered to be configured by using that annotation and all other annotations starting with workloads.cast.ai will be ignored.

To migrate from v1 to v2 annotations:

Remove all individual legacy workloads.cast.ai/* annotations
Add the new workloads.cast.ai/configuration annotation
Move all settings into the YAML structure under the new annotation

For example, these v1 annotations:

workloads.cast.ai/vertical-autoscaling: "on"
workloads.cast.ai/cpu-target: "p80"
workloads.cast.ai/memory-max: "2Gi"

Would become:

workloads.cast.ai/configuration: |
  vertical:
    optimization: on
    cpu:
      target: p80
    memory:
      max: 2Gi