Vertical scaling policies

Introduction

Scaling policies allow you to manage all your workloads centrally. You can apply the same settings to multiple workloads simultaneously or create custom policies with different settings and apply them to multiple workloads simultaneously.

When you start using the Workload Autoscaler component, all your workloads will automatically have a default scaling policy using our default settings. When a new workload appears in the cluster, it will automatically be assigned to the default policy.

Policy settings

You can configure the following settings in your scaling policies:

  1. Automatically optimize workloads: Specify whether recommendations should be automatically applied to all workloads associated with the scaling policy. This feature enables automation only when enough data is available to make informed recommendations.

  2. Recommendation percentile: Determine which percentile CAST AI will recommend, considering the last day of usage. The recommendation will be the average target percentile across all pods spanning the recommendation period. Setting the percentile to 100% will use the maximum observed value over the period instead of the average of all pods.

  3. Overhead: Specify how much extra resource should be added to the recommendation. By default, it's set to 10% for memory and 0% for CPU.

  4. Autoscaler mode: Choose between immediate or deferred mode.

    • Immediate: Apply recommendations when the thresholds are passed. This can cause pod restarts.
    • Deferred: Apply recommendations only on natural pod restarts.
      Read more about the differences between these two autoscaler modes here: Scaling modes.
  5. Optimization threshold: When automation is enabled, and Workload Autoscaler works in immediate mode, this value sets the difference between the current pod requests and the new recommendation so that the recommendation is applied immediately. The default value for both memory and CPU is 10%.

Creating a new scaling policy

To create a new scaling policy:

  1. From the left-hand menu in the CAST AI Console, navigate to Workload autoscaler, then Scaling policies, and click Create scaling policy.
  2. Set your desired settings by referring to the section above.
  3. Choose workloads from the list to associate with this policy.
  4. Save the configuration.

Applying scaling policies

Once you have all the required scaling policies, you can switch the policies for your workloads. You can do that in batches or for individual workloads:

  1. Batch application:

    • Select multiple workloads in the table.
    • Click "Assign the policy."
    • Choose the policy you want to use.
    • Save your changes.
  2. Individual application:

    • Open the workload drawer for a specific workload.
    • Choose a new policy in the drop-down list.
    • Save the changes.

When a policy changes, the new configuration settings will impact future recommendations. The latest data will show updated values on workload recommendation graphs.

Scaling policy behavior

If the configured scaling policy is suitable for your workloads, you can enable scaling in two ways:

  1. Globally via the scaling policy itself: Enable "Automatically Optimize Workloads." This will enable scaling only for workloads with enough data. Workloads that aren't ready will be checked later and enabled once the platform has enough data. When this setting is enabled on the default scaling policy, every new workload created in the cluster will be scaled automatically once sufficient data is available.

  2. Directly from the workload: Once enabled, autoscaling will start immediately (depending on the autoscaler mode chosen at the policy level).

By effectively using scaling policies, you can streamline workload management and ensure consistent optimization across your cluster.