Overview
Workload Autoscaler automatically scales your workload requests up or down to ensure optimal performance and cost-effectiveness.
Getting started
To start using workload optimization, you need to install the Workload Autoscaler component in addition to the custom resource definitions for the recommendation objects. You can do this by getting the install script from our API or by using our console once you visit the workload optimization page.
Note that your cluster must be running in automated optimization mode, as workload optimization relies on the cluster controller to create the recommendation objects in the cluster.

Metrics collection and recommendation generation
To be able to generate recommendations, CAST AI needs to process metrics, so you need to install a metrics server.
Recommendations are regenerated every 30 minutes. The default configuration is maximum usage over 5 days with 10% overhead for memory, and 80th percentile usage over 5 days for CPU.
Note: All generated recommendations will take the current requests/limits into account.
How to enable Workload Autoscaler for your workload
Enabling Workload Autoscaler for a single workload
To enable optimization for a single workload:
- Select the workload you want to optimize.
- In the drawer that opens, you can change the settings, review the past 7 days' historical usage and recommendations, and request data.
- Once you’ve made all of your changes, click the Turn Optimization On button and save the changes.

Enabling Workload Autoscaler for multiple workloads
To enable optimization for multiple workloads, follow these steps:
- Go to the workload optimization section.
- Select one or more workloads by clicking on the checkboxes located next to them.
- Click the Optimization ON button at the top of the table.

How to configure recommendations
You can configure recommendations via the API to add additional overhead for a certain resource or change the function used to select the baseline for the recommendation.
For example, you can configure the MAX function to be used for CPU and set the overhead to 20%. This means that the CPU recommendation would be max observed CPU usage over 5 days plus 20% overhead.
You can find the optimization settings for a particular workload in the workload drawer. You can carry out the following configuration tasks:
- Scale recommendations by adding overhead.
- Fine-tune the percentile values for CPU and memory recommendations.
- Specify the optimization threshold.

Applying recommendations automatically
Once the recommendation lands in the cluster, the Workload Autoscaler component is notified that a recommendation was created or updated.
Workload Autoscaler:
- works as an admission webhook for pods - when pods are created matching the recommendation target, it modifies the pod to have its requests/limits set to what is defined in the recommendation.
- finds the controller and triggers an update to cause the pods controlled by the controller to be re-created - for example, for a deployment object, it adds an annotation to the pod template.
Updated about 3 hours ago