Workload Autoscaler automatically scales your workload requests up or down to ensure optimal performance and cost-effectiveness.

Getting started

To start using workload optimization, you need to install the Workload Autoscaler component in addition to the custom resource definitions for the recommendation objects. You can do this by getting the install script from our API or using our console once you visit the workload optimization page.

Note that your cluster must be running in automated optimization mode, as workload optimization relies on the cluster controller to create the recommendation objects in the cluster.

Metrics collection and recommendation generation

CAST AI needs to process metrics to generate recommendations, so you need to install a metrics server.

Recommendations are regenerated every 30 minutes. The default configuration is maximum usage over 5 days with 10% overhead for memory and 80th percentile usage over 5 days for CPU.

Note: All generated recommendations will consider the current requests/limits.

Applying recommendations automatically

Once the recommendation lands in the cluster, the Workload Autoscaler component is notified that a recommendation has been created or updated.

Next, Workload Autoscaler:

  • works as an admission webhook for pods - when pods are created matching the recommendation target, it modifies the pod to have its requests/limits set to what is defined in the recommendation.
  • finds the controller and triggers an update to cause the pods controlled by the controller to be re-created (for example, for a deployment object, it adds an annotation to the pod template).

Workload Autoscaler currently supports deployments and rollouts. By default, deployments are updated immediately, which may result in the restart of pods.

Rollouts are updated in a deferred manner. Workload Autoscaler waits for pods to restart naturally before applying new recommendations. Example scenarios include a new service release or a pod dying because of a business or technical error.


How to enable Workload Autoscaler

Scaling policies

Scaling policies allow you to manage all your workloads centrally, with the ability to apply the same settings to multiple workloads simultaneously. Moreover, you can create your custom policy with different settings and apply it to multiple workloads simultaneously.

When you start using our Workload Autoscaler component, all of your workloads will automatically have a default scaling policy applied to them at first, using our default settings. When a new workload appears in the cluster, it will automatically be assigned to the default policy.

How to configure recommendations

You can configure recommendations via the API to add additional overhead for a particular resource or change the function used to select the baseline for the recommendation.

For example, you can configure the MAX function to be used for CPU and set the overhead to 20%. This means that the CPU recommendation would be the maximum observed CPU usage over 24 hours plus 20% overhead.

You can find the optimization settings in the scaling policies. You can carry out the following configuration tasks:

  • Scale recommendations by adding overhead.
  • Fine-tune the percentile values for CPU and memory recommendations.
  • Specify the optimization threshold.

You can fine-tune the following settings in the scaling policies:

  • Automatically Optimize Workloads – the policy allows you to specify whether our recommendations should be automatically applied to all workloads associated with the scaling policy. This feature enables automation only when there is enough data available to make informed recommendations.
  • Recommendation percentile – this section determines which percentile CAST AI will recommend considering the last day of usage. The recommendation will be the average target percentile across all pods spanning the recommendation period. Setting the percentile to 100% will no longer use the average of all pods but the maximum observed value over the period.
  • Overhead – it marks how much extra resource should be added on top of the recommendation. By default, it's set to 10% for memory and 0% for CPU.
  • Autoscaler mode - this can be set to immediate or deferred.
  • Optimization threshold – when automation is enabled and Workload Autoscaler works in immediate mode, this value sets the difference between the current pod requests and the new recommendation so that the recommendation is applied immediately. The default value for both memory and CPU is 10%.

Immediate vs. deferred scaling mode

If the autoscaler mode is set to immediate, it will check if a new recommendation meets the user-set optimization thresholds. It won't be applied if the recommendation doesn’t meet these thresholds. If it does pass the threshold, Workload Autoscaler will automatically modify pod requests per the recommendation.

Moreover, Workload Autoscaler will also apply new recommendations upon natural pod restarts, such as a new service release or when a pod dies due to a business or technical error. This helps to avoid unnecessary pod restarts.

If the scaling mode is set to deferred, Workload Autoscaler will not initiate a forced restart of the pod. Instead, it will apply the recommendation whenever external factors initiate pod restarts.

Getting back to the policies themselves, if the default scaling policy is suitable for your workloads, you can enable scaling in two ways:

  • Globally via the scaling policy by enabling Automatically Optimize Workloads – this will enable scaling only for the workloads we have enough data about. Workloads that aren’t ready will be checked later and enabled once the platform has enough data. When this setting is enabled on the default scaling policy, every new workload created in the cluster will be scaled automatically once the platform has enough data.
  • Directly from the workload, – once enabled, autoscaling will start immediately (depending on the autoscaler mode chosen at the policy level).

Mark of recommendation confidence

The "Recommendations Confidence" column can include a mark indicating low confidence in the recommended values.

If an orange mark appears, we don't have sufficient data on workload resource usage to generate trusted recommendations. You can start using Workload Autoscaler if you enable it from a workload level, but we advise waiting at least one week before enrolling your workloads in workload autoscaling.

This mark can appear next to workloads that have run too short for CAST AI to gather enough data and generate accurate recommendations. Workloads that have this mark and belong to a scaling policy that has the "Auto enable" option turned on, won't be optimized unless we will get enough data.

How to create a new scaling policy?

Scaling policies are a great tool for managing multiple workloads at once. Some workloads may require a higher overhead, while others would be unnecessary. To create a policy, navigate to Scaling policies and click Create a scaling policy.

Set your desired settings and choose workloads from the list. After everything is set, save the configuration.

Once you have all the required scaling policies, you can switch the policies for your workloads. You can do that in batches or for individual workloads:

  • To change a policy for batch workloads, select your workloads in the table, click Assign the policy, choose the policy you want to use, and save your changes.
  • To change a policy at the workload level, open the workload drawer, choose a new policy in the drop-down list, and save the changes.

When policy is changed, new configuration settings will impact a new recommendation. The newest data will show new values on workload recommendation graphs.

Enabling Workload Autoscaler for a single workload

To enable optimization for a single workload:

  • Select the workload you want to optimize.
  • In the drawer that opens, you can change the settings, review the past 7 days' historical usage and recommendations, and request data.
  • Once you’ve made the review, click the Turn Optimization On button and save the changes.