Scaling policies enable centralized management of vertical workload autoscaling across your Kubernetes workloads. These policies define how Cast AI's Workload Autoscaler should optimize CPU and memory resource requests for your pods, allowing you to apply consistent optimization strategies across multiple workloads simultaneously.

When you enable Workload Autoscaler, all workloads are automatically profiled and assigned to appropriate system scaling policies based on their labels, types, and namespaces. This intelligent assignment ensures immediate optimization coverage without manual intervention, making onboarding faster and simpler.

How scaling policies work

Vertical scaling policies control how the Workload Autoscaler analyzes resource usage patterns and applies optimization recommendations to your workloads. Each policy defines:

Resource optimization targets - Which percentiles of historical usage to target for CPU and memory recommendations
Automation behavior – Whether to automatically apply recommendations or wait for manual approval
Scaling modes – When to apply changes (immediately with pod restarts or during natural restart events)
Safety constraints – Minimum and maximum resource limits to prevent over- or under-provisioning
Timing settings – How much historical data to consider and when to trigger optimizations

These policies ensure your workloads receive appropriate resource allocations based on their actual usage patterns, leading to improved cost efficiency and performance.

Accessing scaling policies

To access and manage your scaling policies:

Select your cluster in the Cast AI console
From the left-hand menu, navigate to Workload Autoscaler
Select Scaling policies

This dedicated scaling policies page provides a comprehensive view of all your policies, including system policies and custom policies, along with their automation status, optimized resources, associated workloads, and when changes to resource requests will be applied.

System policies

Cast AI provides predefined system policies designed for common optimization scenarios. These policies are pre-configured with settings based on Cast AI's extensive experience with Kubernetes workload optimization:

resiliency: StatefulSets are assigned to this policy by default
balanced: Provides an optimal balance between cost optimization and stability/performance
cost-savings: Maximizes cost efficiency while maintaining acceptable performance
stability: Prioritizes consistent workload performance with minimal disruption
burstable: Designed to add compute power when needed, then scale back down to save costs; well suited for Jobs, CronJobs, or other job-like workloads
readonly: Reserved for Cast AI components (cannot be modified)
default (deprecated): The standard configuration applied to new workloads
default-statefulset (deprecated): Specialized configuration optimized for StatefulSet workloads

📘
Note
default and default-statefulset policies are now deprecated. While they may still appear in clusters onboarded prior to July 18th, 2025, newly onboarded clusters rely on the automatic profiling system to assign workloads to more appropriate system policies.

While system policies cannot be directly modified, you can duplicate any system policy (except readonly) to create a fully customizable version with the same starting configuration. This provides an excellent foundation for creating policies tailored to your specific requirements.

Automatic workload profiling

Cast AI automatically profiles all workloads and assigns them to the most appropriate system scaling policies based on:

Workload labels - Application tier, environment, and other identifying labels
Workload types - Deployment, StatefulSet, DaemonSet, CronJob, etc.
Namespaces - System namespaces, application namespaces, and naming patterns

This intelligent profiling happens instantly when the cluster is onboarded, ensuring optimal policy assignment from the start. The only remaining step is to enable automation on the policies to begin applying recommendations.

Enabling scaling behavior

You can enable vertical scaling for your workloads in two ways:

Globally via the scaling policy: Enable "Automatically optimize workloads" in the policy settings. This enables scaling only for workloads with sufficient historical data. Workloads without enough data are monitored and automatically enabled once adequate usage patterns are established.

Directly from individual workloads: Enable optimization on specific workloads through the workload interface. Once enabled, autoscaling begins immediately based on the scaling mode configured in the associated policy.

For clusters with automatic workload profiling, workloads are already assigned to appropriate policies, so you only need to enable automation to begin optimization.

📘
Gradual scaling for new clusters
For clusters onboarded within the last 24 hours, workloads enabled via scaling policies automatically follow gradual scaling logic with progressive optimization limits. See Mark of recommendation confidence.

Next steps

To get started with scaling policies:

Create scaling policies - Learn how to create custom policies with specific optimization settings for your workloads
Manage scaling policies - Discover how to apply policies to workloads, manage policy assignments, and maintain your optimization strategy
Workload autoscaling configuration - Explore detailed configuration options and advanced settings for fine-tuning your optimization approach

By effectively using scaling policies, you can streamline workload management and ensure consistent optimization across your cluster while maintaining the flexibility to customize settings for specific use cases.