Workload Autoscaler Configuration

Workload Autoscaling can be configured in different ways to suit your specific needs. This can be achieved by using the CAST AI API (or changing the fields via the UI) or controlling the autoscaling settings at the workload level using workload annotations.

Upgrading

Currently, workload autoscaler is installed as an in-cluster component via helm and can be upgraded by simply running the following:

helm upgrade -i castai-workload-autoscaler -n castai-agent castai-helm/castai-workload-autoscaler --reuse-values

Dynamically Injected containers

By default, containers that are injected during runtime (e.g.,istio-proxy) won't be managed by workload autoscaler, and recommendations won't be applied. To enable that, you must configure the in-cluster component with the following command:

helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler -n castai-agent --reuse-values --set webhook.reinvocationPolicy=IfNeeded

Available Workload Settings

The following settings are currently available to configure CAST AI Workload Autoscaling:

  • Automation - on/off - marks whether CAST AI should apply the recommendations or just generate them.
  • Scaling policy- allows selecting policy name. Must be one of the policies available for a cluster.
  • Recommendation Percentile - which percentile CAST AI will recommend, looking at the last day of the usage. The recommendation will be the average target percentile across all of the pods spanning the recommendation period. Setting the percentile to 100% will no longer use the average of all pods, but the maximum observed value over the period.
  • Overhead - marks how much extra resources should be added on top of the recommendation. By default it's set to 10% for memory, and 0% for CPU.
  • Optimization Threshold - when automation is enabled - how much of a difference there should be between the current pod requests and the new recommendation so that the recommendation is applied immediately. Defaults to 10% for both memory and CPU.
  • Workload autoscaler constraints - sets the minimum and maximum values for resources, which will dictate that workload autoscaler cannot scale CPU/Memory above the max or below the min limits. The limit is set for all containers.

πŸ“˜

It is recommended to wait for a week before enabling Workload Autoscaling for "all workloads", so that the system has understanding how the resource consumption varies on weekdays and weekends.

Configuration API/UI

We can configure the aforementioned settings via the UI.

Configuration via Annotations

All of the settings are also available by adding the settings as annotations on the workload controller. When workload configuration annotations are detected, changes to the settings via the API/UI are no longer permitted. To start configuring the settings via annotations the workloads.cast.ai/autoscaling annotation must be set on the workload, configuration examples are shown bellow. When a workload does not have an annotation for a specific setting, a default is used.

The annotations generally follow a pattern of workloads.cast.ai/{resource}-{setting}. Currently the available resources are cpu and memory. Available settings:

AnnotationPossible ValuesDefaultInfoRequired*
workloads.cast.ai/autoscalingvertical, off-Automated scaling.Yes
workloads.cast.ai/scaling-policyany valid k8s annotation valuedefaultAllows selecting scaling policy name. When not specified, the default policy is used.Optional
workloads.cast.ai/apply-typeimmediate, deferredimmediateAllows configuring the autoscaler operating mode to apply the recommendations.
Use immediate to apply recommendations as soon as the thresholds are passed.
Note: immediate mode can cause pod restarts.
Use deferred to apply recommendations only on natural pod restarts.
Optional
workloads.cast.ai/{resource}-overheadfloat >= 0cpu: 0, memory: 0.1Overhead expressed as a fraction, e.g., 10% would be expressed as 0.1.Optional
workloads.cast.ai/{resource}-targetmax, p{x}cpu: p80, memory: maxThe x in the p{x} is the target percentile. Integers between 0 and 99.Optional
workloads.cast.ai/{resource}-apply-thresholdfloat >= 0cpu: 0.1
memory: 0.1
The amount of the recommendation should differ from the requests so that it can be applied. For example, a 10% difference would be expressed as 0.1.Optional
workloads.cast.ai/{resource}-max4Gi, 60m, etc.-The upper limit for the recommendation. Recommendations won't exceed this value.Optional
workloads.cast.ai/{resource}-min4Gi, 60m, etc.-The lower limit for the recommendation. Min cannot be greater than max.Optional

Example config:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
  annotations:
    workloads.cast.ai/autoscaling: "vertical" # enable vertical automatic scaling
    workloads.cast.ai/scaling-policy: "my-custom" # select my-custom scaling policy

    workloads.cast.ai/apply-type: "deferred" # use deferred to apply recommendations only on natural pod restarts

    workloads.cast.ai/cpu-overhead:        "0"    # 0%
    workloads.cast.ai/cpu-apply-threshold: "0.05" # 5% 
    workloads.cast.ai/cpu-target:          "p80"  # 80th percentile
    workloads.cast.ai/cpu-max:             "400m" # max 0.4 cpu
    workloads.cast.ai/cpu-min:             "120m" # min 0.12 cpu

    workloads.cast.ai/memory-overhead:        "0.1"  # 10%
    workloads.cast.ai/memory-apply-threshold: "0.05" # 5%
    workloads.cast.ai/memory-target:          "max"  # max usage
    workloads.cast.ai/memory-max:             "2Gi"  # max 2Gi
    workloads.cast.ai/memory-min:             "1Gi"  # min 1Gi

Configuration Errors

If the workload manifest contains an invalid configuration, as an example workloads.cast.ai/autoscaling: "unknown-value" the configuration will not be updated (old configuration values will be used until the erroneous configuration is fixed) and you should be able to see the error in the workload details in the CAST AI Console. Since scaling policy names are not restricted character wise - any value can be set but a non-existent policy will be treated as invalid configuration