Workload Autoscaler Configuration
Workload Autoscaling can be configured in different ways to suit your specific needs. You can achieve this by using the Cast AI API (or changing the fields via the UI) or controlling the autoscaling settings at the workload level using workload annotations.
Upgrading
Currently, workload autoscaler is installed as an in-cluster component via helm and can be upgraded by running the following command:
helm upgrade -i castai-workload-autoscaler -n castai-agent castai-helm/castai-workload-autoscaler --reuse-values
Upgrade with Helm Test Enabled
To verify the deployment's functionality during an upgrade, you can enable Helm's built-in testing mechanism. This ensures both the reconciliation loop and admission webhook are working correctly:
helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler -n castai-agent --reuse-values --set test.enabled=true
Upgrade with Memory Limits & Helm Test Enabled
For clusters with high workload counts or complex scaling requirements, you may need to adjust memory limits to prevent OOM kills. This command combines memory limit configuration with Helm testing:
helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler -n castai-agent --reuse-values --set resources.limits.memory=5Gi --set resources.requests.memory=5G --set test.enabled=true
Rollback to a Previous Version
If you encounter issues after an upgrade, you can roll back to a previous version using the following steps:
Check Release History
Before rolling back, check the current release history to identify available versions:
helm history castai-workload-autoscaler -n castai-agent
Rollback to a Specific Version
Once you've identified the desired version from the history, execute the rollback. Replace [VERSION]
with the desired revision number:
helm rollback castai-workload-autoscaler [VERSION] -n castai-agent
This will restore the workload autoscaler to the specified version.
Verify the deployment
After an upgrade or rollback, confirm that the deployment is running correctly:
kubectl get pods -n castai-agent
kubectl logs -l app=castai-workload-autoscaler -n castai-agent
ArgoCD compatibility
If you're using ArgoCD for deployment, you may need to disable Helm test hooks to prevent sync issues. The Workload Autoscaler includes a test hook that verifies its basic functionality, but this can cause problems with ArgoCD as it doesn't fully support Helm test hooks. To disable test hook creation, set test.enabled=false
.
For new installations:
helm install castai-workload-autoscaler castai-helm/castai-workload-autoscaler
-n castai-agent \
--set test.enabled=false
For upgrades:
helm upgrade -i castai-workload-autoscaler castai-helm/castai-workload-autoscaler \
-n castai-agent \
--reuse-values \
--set test.enabled=false
This option is available from version 0.1.75 (appVersion
v0.27.3) onwards.
Dynamically Injected containers
By default, containers that are injected during runtime (e.g.,istio-proxy
) won't be managed by workload autoscaler, and recommendations won't be applied. To enable that, you must configure the in-cluster component with the following command:
helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler -n castai-agent --reuse-values --set webhook.reinvocationPolicy=IfNeeded
Available Workload Settings
The following settings are currently available to configure Cast AI Workload Autoscaling:
- Automation - on/off marks whether Cast AI should apply or just generate recommendations.
- Scaling policy- allows for the selection of policy names. It must be one of the policies available for a cluster.
- Recommendation Percentile - which percentile Cast AI will recommend, looking at the last day of the usage. The recommendation will be the average target percentile across all pods spanning the recommendation period. Setting the percentile to 100% will no longer use the average of all pods but the maximum observed value over the period.
- Overhead - marks how many extra resources should be added to the recommendation. By default, it's set to 10% for memory and 0% for CPU.
- Optimization Threshold - when automation is enabled, how much of a difference should there be between the current pod requests and the new recommendation so that the recommendation can be applied immediately? Defaults to 10% for both memory and CPU.
- Workload autoscaler constraints - sets the minimum and maximum values for resources, which will dictate that workload autoscaler cannot scale CPU/Memory above the max or below the minimum limits. The limit is set for all containers.
- Ignore startup metrics - allows excluding a specified duration of startup metrics from recommendation calculations for workloads with high initial resource usage (e.g., Java applications). Only affects vertical pod autoscaling decisions.
- Look-back period - defines a custom timeframe (between 24 hours and 7 days) the Workload Autoscaler uses to observe CPU and memory usage when calculating scaling recommendations. It can be set separately for CPU and memory.
Note
It is recommended to wait for a week before enabling Workload Autoscaling for "all workloads", so that the system has understanding how the resource consumption varies on weekdays and weekends.
Ignore startup metrics
Some workloads, notably Java and .NET applications, may have increased resource usage during startup that can negatively impact vertical pod autoscaling recommendations. To address this, Cast AI allows you to ignore startup metrics for a specified duration when calculating these recommendations.
You can configure this setting in the Cast AI console under the Advanced Settings of a vertical scaling policy:

Startup metrics at the vertical scaling policy level
- Enable the feature by checking the "Ignore workload startup metrics" box.
- Set the duration to exclude from vertical pod autoscaling recommendation generation after a workload starts (between 2 and 60 minutes).
This feature helps prevent inflated vertical scaling recommendations and unnecessary pod restarts caused by temporary resource spikes during application initialization. The startup metrics exclusion only applies to vertical pod autoscaling - horizontal pod autoscaling (HPA) will still respond normally to resource usage during startup.
You can also configure this setting via the API or Terraform.
Look-back period
The look-back period defines the timeframe the Workload Autoscaler uses to observe CPU and memory usage when calculating scaling recommendations. This feature allows you to customize the historical data window used for generating recommendations, which can be particularly useful for workloads with atypical resource usage patterns.
You can configure the look-back period in the Cast AI console under Advanced Settings of a vertical scaling policy:

Look-back period in Advanced Settings
- Set the look-back period for CPU and memory separately.
- Specify the duration in days (d) and hours (h). The minimum allowed period is 24 hours, and the maximum is 7 days.
This feature allows you to:
- Adjust the recommendation window based on your workload's specific resource usage patterns.
- Account for longer-term trends or cyclical resource usage in your applications.
You can configure this setting at different levels:
- Policy level: Apply the setting to all workloads assigned to a specific scaling policy.
- Individual workload level: Configure the setting for a specific workload using annotations or the UI by overriding policy-level settings.
The look-back period can also be configured via Annotations, the API, or Terraform.
Choosing the right look-back period
The optimal look-back period largely depends on your workload's resource usage patterns. Most applications benefit from a shorter look-back period of 1-2 days. This approach works particularly well for standard web applications, capturing daily usage patterns while maintaining high responsiveness to changes. Shorter periods enable more aggressive optimization and often lead to higher savings.
Some workloads, however, require longer observation periods of 3-7 days. Applications with significant differences between weekday and weekend usage patterns benefit from a 7-day period to capture these weekly variations. Batch processing jobs that run every few days need a look-back period that covers at least one full job cycle to prevent potential out-of-memory (OOM) situations.
Common use cases and recommended periods:
- Standard web applications: 1-2 days captures daily patterns while maintaining responsiveness to changes
- Batch processing jobs: Set to cover at least one full job cycle to account for periodic resource spikes
- Weekend-sensitive workloads: 7 days to capture both weekday and weekend patterns
- Variable workloads: Start with 1-2 days and adjust based on observed scaling behavior
Tip
For workloads with variable or uncertain patterns, start with a shorter period and adjust based on observed behavior. The key is to match the look-back period to your application's actual resource usage patterns – whether that's daily consistency, weekly cycles, or periodic processing jobs.
Resource-specific optimization
When configuring vertical scaling, you can enable or disable CPU and memory optimization independently while still receiving recommendations for both resources. Even when optimization is disabled for a resource, Workload autoscaler continues to generate recommendations but won't apply them automatically. This setting can be configured both at the vertical policy level and for individual workloads.

Selective resource optimization controls in the vertical scaling policy settings
Note
At least one resource type must remain enabled - you cannot disable both CPU and memory optimization simultaneously.
Version requirements
The minimum required workload-autoscaler
component version to use this feature is v0.23.1
.
Configuration options
You can configure resource-specific optimization through:
- The Cast AI console UI using the resource checkboxes
- The Cast AI API or Terraform
- Annotations at the workload level:
workloads.cast.ai/configuration: |
vertical:
memory:
optimization: off
For detailed reference information on Workload autoscaler annotations, see Configuration via annotations.
Custom workload support
The Workload autoscaler supports the scaling of custom workloads through label-based selection. This allows autoscaling for:
- Bare pods (pods without controllers)
- Pods created programmatically (as Spark Executors or Airflow Workers).
- Jobs and CronJobs
- Workloads with custom controllers not natively supported by Cast AI
- Groups of related workloads that should be scaled together
Label-based workload selection
To enable autoscaling for custom workloads, add the workloads.cast.ai/custom-workload
label to the Pod template specification. This is crucial - the label must be present in the Pod template, not just on the controller.
The label value must conform to RFC 1123 DNS subdomain and label name restrictions imposed by Kubernetes, meaning it must:
- Contain only lowercase alphanumeric characters or '-'
- contain at most 63 characters
- Start and end with an alphanumeric character
- Not contain underscores (_) or other special characters
apiVersion: v1
kind: Pod
metadata:
labels:
workloads.cast.ai/custom-workload: "my-custom-workload"
spec:
containers:
- name: app
Workloads with the same label value will be treated as a single workload for autoscaling purposes. The value acts as a unique identifier for the workload group.
Workloads are uniquely identified by their:
- Namespace
- Label value
- Controller kind
Configuring custom workloads via annotations
Regarding custom workload discovery, Cast AI configuration annotations must also be specified either in the Pod template specification or at the top level of the manifest.
The placement of the configuration annotations for each Kind
:
- For deployments: Place annotations in the
spec.template.metadata.annotations
apiVersion: apps/v1
kind: Deployment
metadata:
name: custom-managed-app
spec:
template:
metadata:
annotations:
workloads.cast.ai/configuration: | # Configuration annotation in pod template spec
scalingPolicyName: custom
vertical:
optimization: on
labels:
workloads.cast.ai/custom-workload: "custom-controlled-app"
spec:
containers:
- name: app
# Container spec...
- For jobs: Place both the custom workload label and all configuration annotations in the Pod template specification (
spec.template.metadata
):
apiVersion: batch/v1
kind: Job
metadata:
name: batch-job
spec:
template:
metadata:
labels:
workloads.cast.ai/custom-workload: "batch-processor"
annotations:
workloads.cast.ai/configuration: | # Configuration annotation in pod template spec
scalingPolicyName: custom
vertical:
optimization: on
spec:
containers:
# Container spec...
Configuring autoscaling behavior
Both labels and annotations used to configure autoscaling behavior must be specified in the Pod template specification, not on the controller or running Pod.
Key points about label-based workload configuration:
- Workloads are grouped per controller kind (deployments and StatefulSets with the same label will be treated as separate workloads)
- For grouped workloads, the newest/latest matching controller's pod template configuration is used as the workload specification
- Only workloads with the
workloads.cast.ai/custom-workload
label will be discovered for custom workload autoscaling - The label value must be unique for each distinct workload or group of workloads you want to scale together
- All configuration labels and annotations must be specified in the Pod template specification
Examples
Scale a bare pod:
apiVersion: v1
kind: Pod
metadata:
labels:
workloads.cast.ai/custom-workload: "standalone-pod"
annotations:
workloads.cast.ai/configuration: | # Configuration annotation in pod template spec
scalingPolicyName: custom
vertical:
optimization: on
spec:
containers:
- name: app
# Container spec...
Group related jobs:
apiVersion: batch/v1
kind: Job
spec:
template:
metadata:
labels:
workloads.cast.ai/custom-workload: "batch-processors"
annotations:
workloads.cast.ai/configuration: | # Configuration annotation in pod template spec
scalingPolicyName: custom
vertical:
optimization: on
spec:
containers:
- name: processor
# Container spec...
Schedule recurring workloads:
apiVersion: batch/v1
kind: CronJob
spec:
schedule: "*/10 * * * *"
jobTemplate:
spec:
template:
metadata:
labels:
workloads.cast.ai/custom-workload: "scheduled-processor"
annotations:
workloads.cast.ai/configuration: | # Configuration annotation in pod template spec
scalingPolicyName: custom
vertical:
optimization: on
spec:
containers:
- name: processor
# Container spec...
Scale workloads with custom controllers:
apiVersion: apps/v1
kind: Deployment
metadata:
name: custom-managed-app
ownerReferences: # Custom controller resource
- apiVersion: customcontroller.example.com/v1alpha1
kind: CustomResourceType
name: custom-resource
uid: abc123
controller: true
spec:
template:
metadata:
annotations:
workloads.cast.ai/configuration: | # Configuration annotation in pod template spec
scalingPolicyName: custom
vertical:
optimization: on
labels:
workloads.cast.ai/custom-workload: "custom-controlled-app"
spec:
containers:
- name: app
# Container spec...
The Workload autoscaler will track and scale these workloads based on resource usage patterns, applying the same autoscaling policies and recommendations as standard workloads, except:
- These workloads are only scaled vertically using Vertical Pod Autoscaling (VPA)
- Only the deferred recommendation mode is supported
Note
Custom workload autoscaling uses deferred mode, meaning recommendations are only applied when pods are naturally restarted. This helps ensure safe scaling behavior for workloads without native scaling support.
Scaling Job workloads
The workload autoscaler can automatically manage resources for Job and CronJob workloads when they're labeled as custom workloads.
To enable automatic scaling for your Jobs or CronJobs that is uniquely optimized for those types of workloads, add the custom workload label to your Job specifications:
apiVersion: batch/v1
kind: CronJob
spec:
schedule: "*/10 * * * *"
jobTemplate:
spec:
template:
metadata:
labels:
workloads.cast.ai/custom-workload: "scheduled-processor"
annotations:
workloads.cast.ai/configuration: | # Configuration annotation in pod template spec
scalingPolicyName: custom
vertical:
optimization: on
spec:
containers:
- name: processor
# Container spec...
Once labeled, the workload autoscaler will uniquely recognize these workloads, begin tracking your Job's resource usage patterns, and generate scaling recommendations that are different from those for other kinds of workloads.
Creating scaling policies for Jobs
Different types of Jobs benefit from different scaling configurations. For best results, set up separate vertical scaling policies to match your Job frequency patterns.
-
Group similar Jobs together
Create dedicated scaling policies for Jobs with similar patterns. For example, put all your hourly Jobs under one policy and daily Jobs under another. That is because the optimal settings for scaling jobs of different frequencies will differ (e.g., have different look-back periods configured). -
Consider Job duration
Very short-lived Jobs (running for just seconds) might be better served with fixed resource allocations rather than automatic scaling.
Note
To reach full recommendation confidence, a job needs to run at least 3 times within the configured look-back period.
GitLab CI Runner Integration
When using Cast AI's workload autoscaling with GitLab CI runners, you must properly configure custom workload labels to enable optimization. Here's how to set it up.
Configuring GitLab Runner
Add the custom workload label to your GitLab runner configuration in the config.toml
file:
[[runners]]
[runners.kubernetes]
namespace = "{{.Release.Namespace}}"
image = "alpine"
[runners.kubernetes.pod_labels]
# Use CI_JOB_NAME_SLUG and CI_PROJECT_ID for unique workload identification
"workloads.cast.ai/custom-workload" = "${CI_JOB_NAME_SLUG}-${CI_PROJECT_ID}"
Important Considerations
These are some of the considerations around best practices when implementing custom workload logic with GitLab runners that should help avoid potential issues:
-
Label naming restrictions: The custom workload label value must conform to Kubernetes naming conventions (RFC 1123).
-
Using GitLab CI variables: When constructing the custom workload label, use GitLab's predefined variables that are already sanitized:
CI_JOB_NAME_SLUG
: A lowercase, sanitized version ofCI_JOB_NAME
CI_PROJECT_ID
: The unique ID of the project
-
Shared runners: For environments with shared runners across multiple projects, using a combination of
CI_JOB_NAME_SLUG
andCI_PROJECT_ID
helps ensure unique workload identification per project and job type.
Example Configuration
Here's a complete example of a GitLab runner configuration with workload autoscaling enabled:
[[runners]]
[runners.kubernetes]
namespace = "{{.Release.Namespace}}"
image = "alpine"
[runners.kubernetes.pod_annotations]
"workloads.cast.ai/configuration" = "vertical:\n optimization: on\n"
[runners.kubernetes.pod_labels]
"workloads.cast.ai/custom-workload" = "${CI_JOB_NAME_SLUG}-${CI_PROJECT_ID}"
This configuration ensures the following key requirements:
- Each job type per project gets its own workload optimization profile
- Label values comply with Kubernetes naming requirements
For more information about available GitLab CI variables, refer to the GitLab CI/CD predefined variables documentation.
Configuration via API/UI
We can configure the aforementioned settings via the UI.
Configuration via Annotations
All settings are also available by adding annotations on the workload controller. When the workloads.cast.ai/configuration
annotation is detected on a workload, it will be considered as configured by annotations. This allows for flexible configuration, combining annotations and scaling policies.
Changes to the settings via the API/UI are no longer permitted for workloads with annotations. The default or scaling policy value is used when a workload does not have an annotation for a specific setting.
Annotation values take precedence over what is defined in a scaling policy. This means that if a scaling policy is defined in the workload configuration under annotations, all of the individual configuration options defined under the annotation will override the respective policy values. Those that are not defined under the annotation will use system defaults or what is defined in the scaling policy.
Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
annotations:
workloads.cast.ai/configuration: |
scalingPolicyName: custom
vertical:
optimization: on
applyType: immediate
antiAffinity:
considerAntiAffinity: false
startup:
period: 5m
confidence:
threshold: 0.5
cpu:
target: p81
lookBackPeriod: 25h
min: 1000m
max: 2500m
applyThresholdStrategy:
type: defaultAdaptive
overhead: 0.15
limit:
type: multiplier
multiplier: 2.0
memory:
target: max
lookBackPeriod: 30h
min: 2Gi
max: 10Gi
applyThresholdStrategy:
type: defaultAdaptive
overhead: 0.35
limit:
type: noLimit
downscaling:
applyType: immediate
memoryEvent:
applyType: immediate
containers:
{container_name}:
cpu:
min: 10m
max: 1000m
memory:
min: 10Mi
max: 2048Mi
horizontal:
optimization: on
minReplicas: 5
maxReplicas: 10
scaleDown:
stabilizationWindow: 5m
Configuration Structure
Below is a configuration structure reference for setting up a workload to be controlled by annotations.
Note
workloads.cast.ai/configuration
has to be a valid YAML string. In cases where the annotation contains an invalid YAML string, the entire configuration will be ignored.
scalingPolicyName
If not set, the system will use the default scaling policy.
Field | Type | Required | Default | Description |
---|---|---|---|---|
scalingPolicyName | string | No | "default" | Specifies the scaling policy name to use. When set, this annotation allows the workload to be managed by both annotations and the specified scaling policy. The scaling policy can control global settings like enabling/disabling vertical autoscaling. |
scalingPolicyName: custom-policy
vertical
Field | Type | Required | Default | Description |
---|---|---|---|---|
vertical | object | No | - | Vertical scaling configuration. |
vertical:
optimization: on
applyType: immediate
antiAffinity:
considerAntiAffinity: false
startup:
period: 5m
confidence:
threshold: 0.5
vertical.optimization
Field | Type | Required | Default | Description |
---|---|---|---|---|
optimization | string | *Yes | - | Enable vertical scaling ("on"/"off"). If using the vertical configuration option, this field becomes required. |
vertical:
optimization: on
vertical.applyType
Field | Type | Required | Default | Description |
---|---|---|---|---|
applyType | string | No | "immediate" | Allows configuring the autoscaler operating mode to apply the recommendations. Use immediate to apply recommendations as soon as the thresholds are passed.Note: immediate mode can cause pod restarts.Use deferred to apply recommendations only on natural pod restarts. |
vertical:
applyType: immediate
vertical.antiAffinity
Field | Type | Required | Default | Description |
---|---|---|---|---|
antiAffinity | object | No | - | Configuration for handling pod anti-affinity scheduling constraints. |
vertical:
antiAffinity:
considerAntiAffinity: false
vertical.antiAffinity.considerAntiAffinity
Field | Type | Required | Default | Description |
---|---|---|---|---|
considerAntiAffinity | boolean | *Yes | false | When true, workload autoscaler will respect pod anti-affinity rules when making scaling decisions. *If using the vertical.antiAffinity configuration option, this field becomes required. |
vertical:
antiAffinity:
considerAntiAffinity: false
vertical.startup
Field | Type | Required | Default | Description |
---|---|---|---|---|
startup | object | No | - | Configuration for handling workload startup behavior. See Startup metrics. |
vertical:
startup:
period: 5m
vertical.startup.period
Field | Type | Required | Default | Description |
---|---|---|---|---|
period | duration | *Yes | "0m" | Duration to ignore resource usage metrics after workload startup. Useful for applications with high initial resource usage spikes. *If using the vertical.startup configuration option, this field becomes required. |
vertical:
startup:
period: 5m
vertical.confidence
Field | Type | Required | Default | Description |
---|---|---|---|---|
confidence | object | No | - | Configuration for recommendation confidence thresholds. |
vertical:
confidence:
threshold: 0.5
vertical.confidence.threshold
Field | Type | Required | Default | Description |
---|---|---|---|---|
threshold | float | *Yes | 0.9 | Minimum confidence score required to apply recommendations (0.0-1.0). Higher values require more data points for recommendations. *If using the vertical.confidence configuration option, this field becomes required. |
vertical:
confidence:
threshold: 0.5
vertical.cpu
Field | Type | Required | Default | Description |
---|---|---|---|---|
cpu | object | No | - | CPU-specific scaling configuration. |
vertical:
cpu:
target: p80
lookBackPeriod: 24h
min: 100m
max: 1000m
applyThresholdStrategy:
type: defaultAdaptive
overhead: 0.0
vertical.cpu.target
Field | Type | Required | Default | Description |
---|---|---|---|---|
target | string | No | "p80" | Resource usage target: - max - Use maximum observed usage- p{0-99} - Use percentile (e.g., p80 for 80th percentile). |
vertical:
cpu:
target: p80
vertical.cpu.lookBackPeriod
Field | Type | Required | Default | Description |
---|---|---|---|---|
lookBackPeriod | duration | No | "24h" | Historical resource usage data window to consider for recommendations (24h-168h). See Look-back Period. |
vertical:
cpu:
lookBackPeriod: 24h
vertical.cpu.min
Field | Type | Required | Default | Description |
---|---|---|---|---|
min | string | No | "10m" | The lower limit for the recommendation uses standard Kubernetes CPU notation (e.g., "1000m" or "1"). The minimum cannot be greater than the maximum. |
vertical:
cpu:
min: 100m
vertical.cpu.max
Field | Type | Required | Default | Description |
---|---|---|---|---|
max | string | No | - | The upper limit for the recommendation. It uses standard Kubernetes CPU notation (e.g., "1000m" or "1"). Recommendations will not exceed this value. |
vertical:
cpu:
max: 1000m
vertical.cpu.applyThreshold
Deprecation Notice
The
applyThreshold
configuration option is deprecated but still supported for backward compatibility. We strongly recommend migrating to the newapplyThresholdStrategy
configuration format for future compatibility and access to the latest features. See applyThresholdStrategy.
Field | Type | Required | Default | Description |
---|---|---|---|---|
applyThreshold | float | No | 0.1 | The relative difference required between current and recommended resource values to apply a change immediately: - for upscaling, the difference is calculated relative to current resource requests; - for downscaling, it's calculated relative to the new recommended value. For example, with a threshold of 0.1 (10%), an upscale from 100m to 120m CPU would be applied immediately (20% increase relative to current 100m), while an upscale from 110m to 120m would not be applied immediately (8% increase relative to new 120m).Value range: 0.01-2.5. |
vertical:
cpu:
applyThreshold: 0.1
vertical.cpu.applyThresholdStrategy
Warning
applyThreshold
andapplyThresholdStrategy
cannot be used simultaneously in a configuration as that will result in an error.applyThresholdStrategy
is the latest and recommended configuration option.
Field | Type | Required | Default | Description |
---|---|---|---|---|
applyThresholdStrategy | object | No | - | Configuration for the strategy used to determine when recommendations should be applied. The strategy determines how the threshold percentage is calculated based on current resource requests. |
vertical:
cpu:
applyThresholdStrategy:
type: defaultAdaptive
vertical.cpu.applyThresholdStrategy.type
Field | Type | Required | Default | Description |
---|---|---|---|---|
type | string | *Yes | "defaultAdaptive" | The type of threshold strategy to use: - defaultAdaptive - Automatically adjusts thresholds based on workload size.- percentage - Uses a fixed percentage threshold.- customAdaptive - Allows custom configuration of the adaptive threshold formula. Recommended for power users only. It works in the same way as the Default Adaptive Threshold, but it allows tweaking the parameters of the adaptive threshold formula.The defaultAdaptive threshold option uses the following values:numerator = 0.5 denominator = 1 exponent = 1 (same effect as not used, i.e., no influence on the calculation)*Required when using applyThresholdStrategy |
vertical:
cpu:
applyThresholdStrategy:
type: defaultAdaptive # Using default adaptive threshold
vertical.cpu.applyThresholdStrategy.percentage
Field | Type | Required | Default | Description |
---|---|---|---|---|
percentage | float | *Yes | - | The fixed percentage threshold to use. Value range: 0.01-2.5. *Required when type is percentage |
vertical:
cpu:
applyThresholdStrategy:
type: percentage # Using fixed percentage threshold
percentage: 0.3 # 30% threshold
vertical.cpu.applyThresholdStrategy.numerator
Field | Type | Required | Default | Description |
---|---|---|---|---|
numerator | float | *Yes | 0.5 | Affects the vertical stretch of the threshold function. Lower values create smaller thresholds. *Required when type is customAdaptive |
vertical:
cpu:
applyThresholdStrategy:
type: customAdaptive # Using custom adaptive threshold
numerator: 0.1
exponent: 0.1
denominator: 2
vertical.cpu.applyThresholdStrategy.denominator
Field | Type | Required | Default | Description |
---|---|---|---|---|
denominator | float | *Yes | 1 | Affects threshold sensitivity for small workloads. Values close to 0 result in larger thresholds for small workloads. For example, when numerator is 1 , exponent is 1 and denominator is 0 the threshold for 0.5 req. CPU will be 200%.*Required when type is customAdaptive |
vertical:
cpu:
applyThresholdStrategy:
type: customAdaptive
numerator: 0.1
exponent: 0.1
denominator: 2
vertical.cpu.applyThresholdStrategy.exponent
Field | Type | Required | Default | Description |
---|---|---|---|---|
exponent | float | *Yes | 1 | Controls how quickly the threshold decreases for larger workloads. Lower values prevent extremely small thresholds for large resources. *Required when type is customAdaptive |
vertical:
cpu:
applyThresholdStrategy:
type: customAdaptive
numerator: 0.1
exponent: 0.1
denominator: 2
vertical.cpu.overhead
Field | Type | Required | Default | Description |
---|---|---|---|---|
overhead | float | No | 0.0 | Additional resource buffer when applying recommendations (0.0-2.5, e.g., 0.1 = 10%). If a 10% buffer is configured, the issued recommendation will have +10% added to it, so that the workload can handle further increased resource demand. |
vertical:
cpu:
overhead: 0.0
vertical.cpu.limit
Field | Type | Required | Default | Description |
---|---|---|---|---|
limit | object | No | - | Configuration for container CPU limit scaling. Default behaviour when not specified: - If spec.containers[].resources.limits.cpu is not defined on the workload, no limit is set by the workload autoscaler.- If spec.containers[].resources.limits.cpu is defined on the workload, it is removed by workload autoscaler. |
vertical:
cpu:
limit:
type: multiplier
multiplier: 2.0
vertical.cpu.limit.type
Field | Type | Required | Default | Description |
---|---|---|---|---|
type | string | *Yes | - | Type of CPU limit scaling to apply: - noLimit - Remove the resource limit from the workload definition entirely.- multiplier - Set limit as a multiple of requests using the following formula:resources.requests.cpu * {multiplier} .*If using the vertical.cpu.limit configuration option, this field becomes required. |
vertical:
cpu:
limit:
type: multiplier
vertical.cpu.limit.multiplier
Field | Type | Required | Default | Description |
---|---|---|---|---|
multiplier | float | *Yes | - | Value to multiply the requests by to set the limit. The calculation is: resources.requests.cpu * {multiplier} . The value must be greater than or equal to 1 .*Required when type is set to multiplier . |
vertical:
cpu:
limit:
type: multiplier
multiplier: 2.0
vertical.cpu.optimization
Field | Type | Required | Default | Description |
---|---|---|---|---|
optimization | string | no | - | This configuration option can be used to disable CPU management for workloads that benefit from memory management only. The workload will then use CPU requests/limits configured in the template. * The only allowed option is off. If not set, the resource will inherit the workload management option as normal. Minimum required workload-autoscaler component version to use this feature is v0.23.1 . |
vertical:
cpu:
optimization: off
vertical.memory
Field | Type | Required | Default | Description |
---|---|---|---|---|
memory | object | No | - | Memory-specific scaling configuration. |
vertical:
memory:
target: max
lookBackPeriod: 24h
min: 128Mi
max: 2Gi
applyThresholdStrategy:
type: defaultAdaptive
overhead: 0.1
vertical.memory.target
Field | Type | Required | Default | Description |
---|---|---|---|---|
target | string | No | "max" | Resource usage target: - max - Use maximum observed usage.- p{0-99} - Use percentile (e.g., p80 for 80th percentile). |
vertical:
memory:
target: max
vertical.memory.lookBackPeriod
Field | Type | Required | Default | Description |
---|---|---|---|---|
lookBackPeriod | duration | No | "24h" | Historical resource usage data window to consider for recommendations (24h-168h). See Look-back Period. |
vertical:
memory:
lookBackPeriod: 24h
vertical.memory.min
Field | Type | Required | Default | Description |
---|---|---|---|---|
min | string | No | "10Mi" | Minimum resource limit. Uses standard Kubernetes memory notation (e.g., "2Gi", "1000Mi"). |
vertical:
memory:
min: 128Mi
vertical.memory.max
Field | Type | Required | Default | Description |
---|---|---|---|---|
max | string | No | - | Maximum resource limit. Uses standard Kubernetes memory notation (e.g., "2Gi", "1000Mi"). |
vertical:
memory:
max: 2Gi
vertical.memory.applyThreshold
Deprecation Notice
The
applyThreshold
configuration option is deprecated but still supported for backward compatibility. We strongly recommend migrating to the newapplyThresholdStrategy
configuration format for future compatibility and access to the latest features. See applyThresholdStrategy.
Field | Type | Required | Default | Description |
---|---|---|---|---|
applyThreshold | float | No | 0.1 | The relative difference required between current and recommended resource values to apply a change immediately: - for upscaling, the difference is calculated relative to current resource requests; - for downscaling, it's calculated relative to the new recommended value. For example, with a threshold of 0.1 (10%), an upscale from 100MiB to 120MiB of memory would be applied immediately (20% increase relative to current 100MiB), while an upscale from 110MiB to 120MiB would not be applied immediately (8% increase relative to new 120MiB). Value range: 0.01-2.5. |
vertical:
memory:
applyThreshold: 0.1
vertical.memory.applyThresholdStrategy
Warning
applyThreshold
andapplyThresholdStrategy
cannot be used simultaneously in a configuration as that will result in an error.applyThresholdStrategy
is the latest and recommended configuration option.
Field | Type | Required | Default | Description |
---|---|---|---|---|
applyThresholdStrategy | object | No | - | Configuration for the strategy used to determine when recommendations should be applied. The strategy determines how the threshold percentage is calculated based on current resource requests. |
vertical:
memory:
applyThresholdStrategy:
type: customAdaptive # Using custom adaptive threshold
numerator: 0.1
exponent: 0.1
denominator: 2
vertical.memory.applyThresholdStrategy.type
Field | Type | Required | Default | Description |
---|---|---|---|---|
type | string | *Yes | "defaultAdaptive" | The type of threshold strategy to use: - defaultAdaptive - Automatically adjusts thresholds based on workload size.- percentage - Uses a fixed percentage threshold.- customAdaptive - Allows custom configuration of the adaptive threshold formula. Recommended for power users only. It works in the same way as the Default Adaptive Threshold, but it allows tweaking the parameters of the adaptive threshold formula. The defaultAdaptive threshold option uses the following values:numerator = 0.5 denominator = 1 exponent = 1 (same effect as not used, i.e., no influence on the calculation)*Required when using applyThresholdStrategy |
vertical:
memory:
applyThresholdStrategy:
type: defaultAdaptive # Using default adaptive threshold
vertical.memory.applyThresholdStrategy.percentage
Field | Type | Required | Default | Description |
---|---|---|---|---|
percentage | float | *Yes | - | The fixed percentage threshold to use. Value range: 0.01-2.5. *Required when type is percentage |
vertical:
memory:
applyThresholdStrategy:
type: percentage # Using fixed percentage threshold
percentage: 0.3 # 30% threshold
vertical.memory.applyThresholdStrategy.numerator
Field | Type | Required | Default | Description |
---|---|---|---|---|
numerator | float | *Yes | 0.5 | Affects the vertical stretch of the threshold function. Lower values create smaller thresholds. *Required when type is customAdaptive |
vertical:
memory:
applyThresholdStrategy:
type: customAdaptive # Using custom adaptive threshold
numerator: 0.1
exponent: 0.1
denominator: 2
vertical.memory.applyThresholdStrategy.denominator
Field | Type | Required | Default | Description |
---|---|---|---|---|
denominator | float | *Yes | 1 | Affects threshold sensitivity for small workloads. Values close to 0 result in larger thresholds for small workloads. For example, when numerator is 1 , exponent is 1 and denominator is 0 the threshold for 0.5 req. Memory will be 200%.*Required when type is customAdaptive |
vertical:
memory:
applyThresholdStrategy:
type: customAdaptive
numerator: 0.1
exponent: 0.1
denominator: 2
vertical.cpu.applyThresholdStrategy.exponent
Field | Type | Required | Default | Description |
---|---|---|---|---|
exponent | float | *Yes | 1 | Controls how quickly the threshold decreases for larger workloads. Lower values prevent extremely small thresholds for large resources. *Required when type is customAdaptive |
vertical:
memory:
applyThresholdStrategy:
type: customAdaptive
numerator: 0.1
exponent: 0.1
denominator: 2
vertical.memory.overhead
Field | Type | Required | Default | Description |
---|---|---|---|---|
overhead | float | No | 0.1 | Additional resource buffer when applying recommendations (0.0-2.5, e.g., 0.1 = 10%). If a 10% buffer is configured, the issued recommendation will have +10% added to it so that the workload can handle further increased resource demand. |
vertical:
memory:
overhead: 0.1
vertical.memory.limit
Field | Type | Required | Default | Description |
---|---|---|---|---|
limit | object | No | - | Configuration for container memory limit scaling. The default behavior when not specified: - If spec.containers[].resources.limits.memory is not defined on the workload, no limit is set by workload autoscaler.- If spec.containers[].resources.limits.memory is defined on the workload, workload autoscaler calculates the limit using the following formula: max(resources.requests.memory * 1.5, current.resources.limit) (it will only increase the limit, but never lower it). |
vertical:
memory:
limit:
type: multiplier
multiplier: 1.5
vertical.memory.limit.type
Field | Type | Required | Default | Description |
---|---|---|---|---|
type | string | *Yes | - | Type of limit scaling to apply: - noLimit - Remove the resource limit from the workload definition entirely.- multiplier - Set limit as a multiple of requests using the following formula: resources.requests.memory * {multiplier} .*If using the vertical.memory.limit configuration option, this field (type) becomes required. |
vertical:
memory:
limit:
type: multiplier
vertical.memory.limit.multiplier
Field | Type | Required | Default | Description |
---|---|---|---|---|
multiplier | float | *Yes | - | Value to multiply the requests by to set the limit on the workload. The calculation is: resources.requests.memory * {multiplier} . The value must be greater than or equal to 1 .*Required when type is set to multiplier . |
vertical:
memory:
limit:
type: multiplier
multiplier: 1.5
vertical.memory.optimization
Field | Type | Required | Default | Description |
---|---|---|---|---|
optimization | string | no | - | This configuration option can be used to disable memory management for workloads that benefit from CPU management only (e.g., Java workloads with a fixed heap size). The workload will then use memory requests/limits configured in the template. * The only allowed option is off. If not set, the resource will inherit the workload management option as normal. Minimum required workload-autoscaler component version to use this feature is v0.23.1 . |
vertical:
memory:
optimization: off
vertical.downscaling
Field | Type | Required | Default | Description |
---|---|---|---|---|
downscaling | object | No | - | Downscaling behavior override. |
vertical:
downscaling:
applyType: immediate
vertical.downscaling.applyType
Field | Type | Required | Default | Description |
---|---|---|---|---|
applyType | string | No | Default is taken from the vertical scaling policy controlling the workload. | Override application mode: - immediate - Apply changes immediately- deferred - Apply during natural restarts |
vertical:
downscaling:
applyType: immediate
vertical.memoryEvent
Field | Type | Required | Default | Description |
---|---|---|---|---|
memoryEvent | object | No | - | Memory event behavior override. |
vertical:
memoryEvent:
applyType: immediate
vertical.memoryEvent.applyType
This configuration option is fully compatible with other applyType
options and is meant to be used in combination with them. This allows for fine-grained control over both upscaling and downscaling. Here's how they interact:
- If both configuration options are set to the same value (both
immediate
or bothdeferred
), the behavior remains unchanged. - If
vertical.downscaling.applyType
is set toimmediate
andvertical.memoryEvent.applyType
is set todeferred
:- Upscaling operations will be applied immediately.
- Downscaling operations will be deferred to natural pod restarts.
- If
vertical.downscaling.applyType
is set todeferred
andvertical.memoryEvent.applyType
is set toimmediate
:- Upscaling operations will be deferred to natural pod restarts.
- Downscaling operations will be applied immediately.
Field | Type | Required | Default | Description |
---|---|---|---|---|
applyType | string | *Yes | Default is taken from the vertical scaling policy controlling the workload. | Override application mode for memory-related events (OOM kills, pressure): - immediate - Apply changes immediately- deferred - Apply during natural restarts*If using the vertical.memoryEvent configuration option, this field becomes required. |
vertical:
memoryEvent:
applyType: immediate
vertical.containers
Configuration object that contains container-specific settings.
Field | Type | Required | Default | Description |
---|---|---|---|---|
containers | object | No | - | Container configuration mapping. |
vertical:
containers:
vertical.containers.{container_name}
Configuration object that contains resource constraints for a specific container. Replace {container_name}
with the name of your container.
Field | Type | Required | Default | Description |
---|---|---|---|---|
{container_name} | object | No | - | Container resource configuration object. Has to be the name of the container for which resources are being configured. |
vertical:
containers:
{container_name}:
vertical.containers.{container_name}.cpu
Container CPU constraints. Set minimum and maximum CPU limits for a specific container to define the workload autoscaler's scaling range.
Field | Type | Required | Default | Description |
---|---|---|---|---|
max | string | No | - | The upper limit for the recommendation. Uses standard Kubernetes CPU notation (e.g., "1000m" or "1"). Recommendations won't exceed this value. |
min | string | No | - | The lower limit for the recommendation. Uses standard Kubernetes CPU notation (e.g., "1000m" or "1"). Min cannot be greater than max. |
vertical:
containers:
{container_name}:
cpu:
min: 10m
max: 1000m
vertical.containers.{container_name}.memory
Container memory constraints. Set minimum and maximum memory limits for a specific container to define the workload autoscaler's scaling range.
Field | Type | Required | Default | Description |
---|---|---|---|---|
min | string | No | - | Minimum resource limit. Uses standard Kubernetes memory notation (e.g., "2Gi", "1000Mi"). |
max | string | No | - | Maximum resource limit. Uses standard Kubernetes memory notation (e.g., "2Gi", "1000Mi"). |
vertical:
containers:
{container_name}:
memory:
min: 10Mi
max: 2048Mi
horizontal
Field | Type | Required | Default | Description |
---|---|---|---|---|
horizontal | object | No | - | Horizontal scaling configuration. |
horizontal:
optimization: on
minReplicas: 1
maxReplicas: 10
scaleDown:
stabilizationWindow: 5m
shortAverage: 3m
horizontal.optimization
Field | Type | Required | Default | Description |
---|---|---|---|---|
optimization | string | Yes* | - | Enable horizontal scaling ("on"/"off"). *If using the horizontal configuration option, this field becomes required. |
horizontal:
optimization: on
horizontal.minReplicas
Field | Type | Required | Default | Description |
---|---|---|---|---|
minReplicas | integer | Yes* | - | Minimum number of replicas. |
horizontal:
minReplicas: 1
horizontal.maxReplicas
Field | Type | Required | Default | Description |
---|---|---|---|---|
maxReplicas | integer | Yes* | - | Maximum number of replicas. |
horizontal:
maxReplicas: 10
horizontal.scaleDown
Field | Type | Required | Default | Description |
---|---|---|---|---|
scaleDown | object | No | - | Houses scaledown cofiguration options. |
horizontal.scaleDown.stabilizationWindow
Field | Type | Required | Default | Description |
---|---|---|---|---|
stabilizationWindow | duration | *Yes | "5m" | Cooldown period between scale-downs. *If using the horizontal.scaleDown configuration option, this field becomes required. |
horizontal:
scaleDown:
stabilizationWindow: 5m
*Required if the parent object is present in the configuration.
Legacy Annotation Support
For documentation on the legacy annotation format, which is now deprecated, see the Legacy Annotations Reference page .
Migration Guide
Note
The annotations V2 structure cannot be combined with deprecated annotations V1. When the annotation
workloads.cast.ai/configuration
is detected, the workload is considered to be configured by using that annotation and all other annotations starting withworkloads.cast.ai
will be ignored.
To migrate from v1 to v2 annotations:
- Remove all individual legacy
workloads.cast.ai/*
annotations - Add the new
workloads.cast.ai/configuration
annotation - Move all settings into the YAML structure under the new annotation
For example, these v1 annotations:
workloads.cast.ai/vertical-autoscaling: "on"
workloads.cast.ai/cpu-target: "p80"
workloads.cast.ai/memory-max: "2Gi"
Would become:
workloads.cast.ai/configuration: |
vertical:
optimization: on
cpu:
target: p80
memory:
max: 2Gi
Updated 8 days ago