Getting started

Workload Autoscaler automatically scales your workload requests up or down to ensure optimal performance and cost-effectiveness.

To start using workload optimization, you need to install the Workload Autoscaler component in addition to the custom resource definitions for the recommendation objects. [Not sure what this refers to]

πŸ“˜

Note

Your cluster must be running in automated optimization mode, as workload optimization relies on the cluster controller to create the recommendation objects in the cluster.

Installation

You can install the Workload Autoscaler component using an install script. Either obtain the script using our API or visit the CAST AI console and navigate to the workload optimization page.

Install Workload Autoscaler component

Lorem ipsum

Install script via API

Lorem ipsum

Install script in the console

Lorem ipsum

Install metrics-server

Lorem ipsum

How Workload Autoscaler works

Lorem ipsum

Metrics Collection and Recommendation Generation

Recommendations are regenerated every 30 minutes. The default configuration is maximum usage over 5 days with 10% overhead for memory and 80th percentile usage over 5 days for CPU.

πŸ“˜

Note

All generated recommendations will consider the current requests/limits.

Applying Recommendations Automatically

Once the recommendation lands in the cluster, the Workload Autoscaler component is notified that a recommendation has been created or updated.

Next, Workload Autoscaler works in the following order.

  • It works as an admission webhook for pods. When pods are created matching the recommendation target, it modifies the pod to have its requests/limits set to what is defined in the recommendation.
  • It finds the controller and triggers an update to re-create the pods controlled by the controller (for example, for a deployment object, it adds an annotation to the pod template).

Supported controllers

Workload Autoscaler currently supports deployments and rollouts.

Default behavior

By default, deployments are updated immediately, which may result in the restart of pods.

Rollouts are updated in a deferred manner. Workload Autoscaler waits for pods to restart naturally before applying new recommendations. Examples include a new service release or a pod dying because of a business or technical error.

For more information on immediate and deferred recommendation application modes, see Immediate vs. deferred scaling.