Workload Autoscaler Configuration

Workload Autoscaling can be configured in different ways to suit your specific needs. This can be achieved by using the CAST AI API (or changing the fields via the UI) or controlling the autoscaling settings at the workload level using workload annotations.


Currently workload autoscaler is installed as a in-cluster component via helm and can be upgraded by simply running the following:

helm upgrade -i castai-workload-autoscaler -n castai-agent castai-helm/castai-workload-autoscaler --reuse-values

Available Settings

The following settings are currently available to configure CAST AI Workload Autoscaling:

  • Automation - on/off - marks whether CAST AI should apply the recommendations or just generate them.
  • Recommendation Percentile - which percentile CAST AI will recommend, looking at the last day of the usage. The recommendation will be the average target percentile across all of the pods spanning the recommendation period. Setting the percentile to 100% will no longer use the average of all pods, but the maximum observed value over the period.
  • Overhead - marks how much extra resources should be added on top of the recommendation. By default it's set to 10% for memory, and 0% for CPU.
  • Optimization Threshold - when automation is enabled - how much of a difference there should be between the current pod requests and the new recommendation so that the recommendation is applied immediately. Defaults to 10% for both memory and CPU.
  • Workload autoscaler constraints - sets the minimum and maximum values for resources, which will dictate that workload autoscaler cannot scale CPU/Memory above the max or below the min limits. The limit is set for all containers.


It is recommended to wait for a week before enabling Workload Autoscaling for "all workloads", so that the system has understanding how the resource consumption varies on weekdays and weekends.

Configuration API/UI

We can configure the aforementioned settings via the UI.

Configuration via Annotations

All of the settings are also available by adding the settings as annotations on the workload controller. When workload configuration annotations are detected, changes to the settings via the API/UI are no longer permitted. To start configuring the settings via annotations the annotation must be set on the workload, configuration examples are shown bellow. When a workload does not have an annotation for a specific setting, a default is used.

The annotations generally follow a pattern of{resource}-{setting}. Currently the available resources are cpu and memory. Available settings:

AnnotationPossible ValuesDefaultInfoRequired*, off-automated scalingYes{resource}-overheadfloat >= 0cpu: 0, memory: 0.1overhead expressed as a fraction, ex. 10% would be expressed as 0.1.Optional{resource}-targetmax, p{x}cpu: p80, memory: maxthe x in the p{x} is the target percentile. Integers between 0 and 99.Optional{resource}-apply-thresholdfloat >= 0cpu: 0.1
memory: 0.1
the amount the recommendation should differ from the requests, so that it would be applied, ex. 10% difference would be expressed as 0.1Optional{resource}-max4Gi, 60m, etc.-the upper limit for the recommendation. Recommendations won't exceed this value.Optional{resource}-min4Gi, 60m, etc.-the lower limit for the recommendation. Min cannot be greater than max.Optional

Example config:

apiVersion: apps/v1
kind: Deployment
  name: my-app
    app: my-app
  annotations: "vertical" # enable vertical automatic scaling        "0"    # 0% "0.05" # 5%          "p80"  # 80th percentile             "400m" # max 0.4 cpu             "120m" # min 0.12 cpu        "0.1"  # 10% "0.05" # 5%          "max"  # max usage             "2Gi"  # max 2Gi             "1Gi"  # min 1Gi

Configuration Errors

If the workload manifest contains an invalid configuration, as an example "unknown-value" the configuration will not be updated (old configuration values will be used until the erroneous configuration is fixed) and you should be able to see the error in the workload details in the CAST AI Console.