Scaling policies
Vertical scaling policies
Scaling policies enable centralized management of vertical workload autoscaling across your Kubernetes workloads. These policies define how Cast AI's Workload Autoscaler should optimize CPU and memory resource requests for your pods, allowing you to apply consistent optimization strategies across multiple workloads simultaneously.
When you enable Workload Autoscaler, all workloads are automatically assigned a default scaling policy. As new workloads appear in your cluster, they inherit these default settings, ensuring immediate optimization coverage without manual intervention.
How scaling policies work
Vertical scaling policies control how the Workload Autoscaler analyzes resource usage patterns and applies optimization recommendations to your workloads. Each policy defines:
- Resource optimization targets - Which percentiles of historical usage to target for CPU and memory recommendations
- Automation behavior – Whether to automatically apply recommendations or wait for manual approval
- Scaling modes – When to apply changes (immediately with pod restarts or during natural restart events)
- Safety constraints – Minimum and maximum resource limits to prevent over- or under-provisioning
- Timing settings – How much historical data to consider and when to trigger optimizations
These policies ensure your workloads receive appropriate resource allocations based on their actual usage patterns, leading to improved cost efficiency and performance.
Accessing scaling policies
To access and manage your scaling policies:
- Select your cluster in the Cast AI console
- From the left-hand menu, navigate to Workload Autoscaler
- Select Scaling policies
This dedicated scaling policies page provides a comprehensive view of all your policies, including system policies and custom policies, along with their automation status, optimized resources, associated workloads, and when changes to resource requests will be applied.
System policies
Cast AI provides predefined system policies designed for common optimization scenarios. These policies are pre-configured with settings based on Cast AI's extensive experience with Kubernetes workload optimization:
- cost-savings: Maximizes cost efficiency while maintaining acceptable performance
- balanced: Provides an optimal balance between cost optimization and stability/performance
- stability: Prioritizes consistent workload performance with minimal disruption
- default: The standard configuration applied to new workloads
- default-statefulset: Specialized configuration optimized for StatefulSet workloads
- readonly: Reserved for Cast AI components (cannot be modified)
While system policies cannot be directly modified, you can duplicate any system policy (except readonly) to create a fully customizable version with the same starting configuration. This provides an excellent foundation for creating policies tailored to your specific requirements.
Enabling scaling behavior
You can enable vertical scaling for your workloads in two ways:
Globally via the scaling policy: Enable "Automatically optimize workloads" in the policy settings. This enables scaling only for workloads with sufficient historical data. Workloads without enough data are monitored and automatically enabled once adequate usage patterns are established. When enabled on the default scaling policy, every new workload in the cluster will be automatically optimized once sufficient data is available.
Directly from individual workloads: Enable optimization on specific workloads through the workload interface. Once enabled, autoscaling begins immediately based on the scaling mode configured in the associated policy.
Next steps
To get started with scaling policies:
- Create scaling policies - Learn how to create custom policies with specific optimization settings for your workloads
- Manage scaling policies - Discover how to apply policies to workloads, manage policy assignments, and maintain your optimization strategy
- Workload autoscaling configuration - Explore detailed configuration options and advanced settings for fine-tuning your optimization approach
By effectively using scaling policies, you can streamline workload management and ensure consistent optimization across your cluster while maintaining the flexibility to customize settings for specific use cases.
Updated 10 days ago