Workload Autoscaler Configuration

Workload Autoscaling can be configured in different ways to suit your specific needs. This can be achieved by using the CAST AI API (or changing the fields via the UI) or controlling the autoscaling settings at the workload level using workload annotations.


Currently, workload autoscaler is installed as an in-cluster component via helm and can be upgraded by simply running the following:

helm upgrade -i castai-workload-autoscaler -n castai-agent castai-helm/castai-workload-autoscaler --reuse-values

Dynamically Injected containers

By default, containers that are injected during runtime (e.g.,istio-proxy) won't be managed by workload autoscaler, and recommendations won't be applied. To enable that, you must configure the in-cluster component with the following command:

helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler -n castai-agent --reuse-values --set webhook.reinvocationPolicy=IfNeeded

Available Workload Settings

The following settings are currently available to configure CAST AI Workload Autoscaling:

  • Automation - on/off - marks whether CAST AI should apply the recommendations or just generate them.
  • Scaling policy- allows selecting policy name. Must be one of the policies available for a cluster.
  • Recommendation Percentile - which percentile CAST AI will recommend, looking at the last day of the usage. The recommendation will be the average target percentile across all of the pods spanning the recommendation period. Setting the percentile to 100% will no longer use the average of all pods, but the maximum observed value over the period.
  • Overhead - marks how much extra resources should be added on top of the recommendation. By default it's set to 10% for memory, and 0% for CPU.
  • Optimization Threshold - when automation is enabled - how much of a difference there should be between the current pod requests and the new recommendation so that the recommendation is applied immediately. Defaults to 10% for both memory and CPU.
  • Workload autoscaler constraints - sets the minimum and maximum values for resources, which will dictate that workload autoscaler cannot scale CPU/Memory above the max or below the min limits. The limit is set for all containers.


It is recommended to wait for a week before enabling Workload Autoscaling for "all workloads", so that the system has understanding how the resource consumption varies on weekdays and weekends.

Configuration API/UI

We can configure the aforementioned settings via the UI.

Configuration via Annotations

All of the settings are also available by adding the settings as annotations on the workload controller. When workload configuration annotations are detected, changes to the settings via the API/UI are no longer permitted. To start configuring the settings via annotations the annotation must be set on the workload, configuration examples are shown bellow. When a workload does not have an annotation for a specific setting, a default is used.

The annotations generally follow a pattern of{resource}-{setting}. Currently the available resources are cpu and memory. Available settings:

AnnotationPossible ValuesDefaultInfoRequired*, off-Automated scaling.Yes valid k8s annotation valuedefaultAllows selecting scaling policy name. When not specified, the default policy is used.Optional, deferredimmediateAllows configuring the autoscaler operating mode to apply the recommendations.
Use immediate to apply recommendations as soon as the thresholds are passed.
Note: immediate mode can cause pod restarts.
Use deferred to apply recommendations only on natural pod restarts.
Optional{resource}-overheadfloat >= 0cpu: 0, memory: 0.1Overhead expressed as a fraction, e.g., 10% would be expressed as 0.1.Optional{resource}-targetmax, p{x}cpu: p80, memory: maxThe x in the p{x} is the target percentile. Integers between 0 and 99.Optional{resource}-apply-thresholdfloat >= 0cpu: 0.1
memory: 0.1
The amount of the recommendation should differ from the requests so that it can be applied. For example, a 10% difference would be expressed as 0.1.Optional{resource}-max4Gi, 60m, etc.-The upper limit for the recommendation. Recommendations won't exceed this value.Optional{resource}-min4Gi, 60m, etc.-The lower limit for the recommendation. Min cannot be greater than max.Optional

Example config:

apiVersion: apps/v1
kind: Deployment
  name: my-app
    app: my-app
  annotations: "vertical" # enable vertical automatic scaling "my-custom" # select my-custom scaling policy "deferred" # use deferred to apply recommendations only on natural pod restarts        "0"    # 0% "0.05" # 5%          "p80"  # 80th percentile             "400m" # max 0.4 cpu             "120m" # min 0.12 cpu        "0.1"  # 10% "0.05" # 5%          "max"  # max usage             "2Gi"  # max 2Gi             "1Gi"  # min 1Gi

Configuration Errors

If the workload manifest contains an invalid configuration, as an example "unknown-value" the configuration will not be updated (old configuration values will be used until the erroneous configuration is fixed) and you should be able to see the error in the workload details in the CAST AI Console. Since scaling policy names are not restricted character wise - any value can be set but a non-existent policy will be treated as invalid configuration