Vertical scaling policies

Introduction

Scaling policies allow you to manage all your workloads centrally. You can apply the same settings to multiple workloads simultaneously or create custom policies with different settings and apply them to selected workloads.

When you start using the Workload Autoscaler component, all your workloads will automatically have a default scaling policy using our default settings. When a new workload appears in the cluster, it will automatically be assigned to the default policy.

System Policies

Cast AI offers predefined System Policies designed to provide optimized configurations for specific use cases:

  • cost-savings: Maximizes cost efficiency while maintaining acceptable performance
  • balanced: Provides an optimal balance between cost optimization and stability/performance
  • stability: Prioritizes consistent workload performance with minimal disruption

System Policies are pre-configured with settings based on Cast AI's extensive experience with Kubernetes optimization. While these policies cannot be directly modified, you can duplicate any System Policy to create a fully customizable version with the same starting configuration.

Policy settings

You can configure the following settings in your custom scaling policies:

  1. Automatically optimize workloads: Specify whether recommendations should be automatically applied to all workloads associated with the scaling policy. This feature enables automation only when enough data is available to make informed recommendations.

  2. Recommendation percentile: Determine which percentile CAST AI will recommend, considering the last day of usage. The recommendation will be the average target percentile across all pods spanning the recommendation period. Setting the percentile to 100% will use the maximum observed value over the period instead of the average of all pods.

  3. Overhead: Specify how much extra resource should be added to the recommendation. By default, it's set to 10% for memory and 0% for CPU.

  4. Autoscaler mode: Choose between immediate or deferred mode.

    • Immediate: Apply recommendations when the thresholds are passed. This can cause pod restarts.
    • Deferred: Apply recommendations only on natural pod restarts.
      Read more about the differences between these two autoscaler modes here: Scaling modes.
  5. Optimization threshold: When automation is enabled, and Workload Autoscaler works in immediate mode, this value sets the difference between the current pod requests and the new recommendation so that the recommendation is applied immediately. The default value for both memory and CPU is 10%.

Creating a new scaling policy

To create a new scaling policy:

  1. From the left-hand menu in the Cast AI Console, navigate to Workload autoscaler, then Scaling policies, and click Create scaling policy.
  2. Set your desired settings by referring to the section above.
  3. Choose workloads from the list to associate with this policy.
  4. Save the configuration.

Duplicating a System Policy

If you want to use a System Policy as a starting point but need to customize it:

  1. Navigate to Workload autoscaler > Scaling policies.
  2. Select the System Policy you wish to duplicate.
  3. Click the Duplicate policy option.
  4. The system creates a fully editable copy with all the original settings.
  5. Modify any settings as needed.
  6. Save your customized policy.

Applying scaling policies

Once you have all the required scaling policies, you can switch the policies for your workloads. You can do that in batches or for individual workloads:

  1. Batch application:

    • Select multiple workloads in the table.
    • Click "Assign the policy."
    • Choose the policy you want to use.
    • Save your changes.
  2. Individual application:

    • Open the workload drawer for a specific workload.
    • Choose a new policy in the drop-down list.
    • Save the changes.

When a policy changes, the new configuration settings will impact future recommendations. The latest data will show updated values on workload recommendation graphs.

Scaling policy behavior

If the configured scaling policy is suitable for your workloads, you can enable scaling in two ways:

  1. Globally via the scaling policy itself: Enable "Automatically Optimize Workloads." This will enable scaling only for workloads with enough data. Workloads that aren't ready will be checked later and enabled once the platform has enough data. When this setting is enabled on the default scaling policy, every new workload created in the cluster will be scaled automatically once sufficient data is available.

  2. Directly from the workload: Once enabled, autoscaling will start immediately (depending on the autoscaler mode chosen at the policy level).

By effectively using scaling policies, you can streamline workload management and ensure consistent optimization across your cluster. The new System Policies provide an excellent starting point for most common scenarios, while the ability to duplicate and customize these policies offers the flexibility needed for specific requirements.