November 2024
Pod Mutations, Workload Autoscaler Annotations v2, and Custom Workload Support
Major Features and Improvements
Introducing Pod Mutations for Simplified Workload Configuration
Pod Mutations streamlines the onboarding and configuration process for Kubernetes workloads. This new feature automates pod setup by offering predefined configurations for labels, tolerations, and topology settings based on Node Templates and other platform features.
Key capabilities include:
- Label-based workload selection and configuration
- Direct integration with Node Templates and Evictor settings
- Support for custom node selectors
The feature benefits organizations managing complex environments by reducing configuration complexity and enabling more efficient resource utilization. It is available in both the console UI and through infrastructure as code.
Head to our docs to learn more.
Reworked Workload Autoscaler Annotations
The Workload Autoscaler now features a simplified annotation structure, consolidating all configuration options under a single workloads.cast.ai/configuration
key. This new format:
- Combines vertical and horizontal scaling settings in one clear structure
- Removes dependency on explicit autoscaling flags
- Supports more flexible policy overrides
While the previous annotation format remains supported, new features will be developed exclusively for the new structure. For migration guidance, refer to our updated documentation.
Custom Workload Autoscaling
The Workload Autoscaler now supports custom and programmatically created workloads through label-based selection. This enhancement enables autoscaling for:
- Bare pods and programmatically created pods
- Jobs without parent controllers
- Custom controller workloads
Workloads tagged with the workloads.cast.ai/custom-workload
label in their pod template specification can now receive resource recommendations and participate in autoscaling. See guidance in our documentation.
Cloud Provider Integrations
Enhanced Spot Instance Availability Map
The spot instance availability map now includes GPU-specific data points, offering better visibility into GPU instance availability across cloud providers.
EKS: Support for Prefix Delegation
Native support was added for EKS prefix delegation. When this feature is enabled, the autoscaler now handles subnet calculations, ensuring accurate node provisioning within available subnet space. See our documentation for more information.
EKS: IPv6 Support Added
Extended EKS node configuration to support IPv6.
GCP: Updated C4A Instance Specifications
Updated instance metadata for Google Cloud's C4A machine type series to include accurate CPU architectures, manufacturers, and supported disk configurations.
GCP: Load Balancer Configuration Support
Added load balancer configuration options for GCP nodes through API and Terraform. The feature enables direct management of load balancer associations when provisioning GCP nodes.
Check out the updated endpoint or visit our Terraform module docs.
AKS: Support for Multiple Subnet Prefixes
Extended subnet management for AKS clusters to support multiple IPv4 address prefixes within a single subnet. The autoscaler now properly recognizes and utilizes subnets configured with multiple prefixes, enabling more flexible network address management.
AKS: Improved CNI Overlay Network Calculations
The autoscaler now accounts for pod CIDR limitations in AKS clusters using CNI overlay networking. This prevents scaling issues by accurately calculating available IP space when each node requires a /24 network allocation from the pod CIDR range.
AKS: Support for CNI Overlay with Pod CIDR
Added native support for Azure CNI Overlay networking with Pod CIDR configuration. The autoscaler now correctly calculates node capacity based on the available Pod CIDR range, ensuring proper IP address allocation.
Optimization and Cost Management
Phased Rebalancing for Blue/Green Deployments
A new API-driven phased rebalancing capability gives you precise control over cluster node transitions. This feature enables blue/green deployments by pausing the rebalancing process after node cordoning, allowing for custom application migration strategies before final cleanup.
The phased approach includes:
- Initial rebalancing plan creation with node filtering options
- Automated new node provisioning and cordoning of old nodes
- Configurable pause point before drain/deletion phase
This change benefits teams running workloads or applications requiring custom migration procedures.
Improved Startup Handling in Workload Autoscaler
Our Workload Autoscaler now differentiates between startup resource spikes and runtime surges, preventing unnecessary restarts during application initialization. Surge protection automatically extends to cover the specified startup duration for workloads with configured startup settings. Applications without explicit startup configurations receive a default 2-minute grace period.
Global Resource Constraints for Workload Autoscaling
We've introduced global resource constraints in workload autoscaling policies, allowing you to set default minimum and maximum limits for CPU and memory across all containers managed by a policy. These global constraints serve as guardrails while allowing individual workload-level settings to take precedence when specified.
For more information on configuring global constraints, see our Workload Autoscaler documentation.
Enhanced Problematic Node Detection
The rebalancer now identifies nodes that cannot be safely rebalanced, regardless of their workload status, including specific blockers like unknown instance types or removal restrictions. The UI clearly marks these nodes and prevents their selection during rebalancing plan creation, while backend safeguards ensure they remain protected from unintended modifications.
Improved GPU Commitment Handling
The autoscaler now reserves GPU commitment capacity exclusively for GPU workloads, preventing non-GPU workloads from being scheduled on GPU-enabled nodes. This optimization ensures GPU resources remain available for intended workloads and eliminates potential provisioning errors.
Enhanced Rebalancing Plan Visibility
Manual rebalancing now displays full plan details regardless of potential cost impact. Previously hidden negative-savings configurations are now visible, allowing users to evaluate all rebalancing options and make informed decisions based on their specific needs.
Node Configuration
Load Balancer Configuration for AKS
We've added load balancer configuration support for the node configuration API. Users can now specify and manage load balancers and their associated backend pools directly through API, with UI support coming soon. Such configuration is now available for AKS and EKS clusters, which were the first to get this functionality.
Standardized Node Labeling
We began adding a scheduling.cast.ai/on-demand
label to nodes provisioned with on-demand instances, completing the set of scheduling labels alongside existing spot labels that we’ve been using so far. This helps improve label-based tracking across systems.
Enhanced Maximum Pods Configuration
We've updated the Maximum Pods formula configuration for custom pod number entries. Users can now directly input their desired maximum pod value instead of using a preset selection, providing greater flexibility in node configuration.
Architecture Prioritization in Node Templates
Node templates now support prioritized CPU architecture selection. When both ARM and x86_64
architectures are enabled, users can specify their preferred order:
- Primary architecture receives priority in node selection
- Secondary architecture serves as a fallback
- Leaving both unprioritized selects the most cost-effective option
Default node templates maintain x86_64
as the preferred architecture for backward compatibility.
Kubernetes Security Posture Management
Enhanced Runtime Security Event Details
Runtime security events now include a detailed process tree visualization, providing a better context for security anomalies. This hierarchical view helps trace the relationship between processes involved in security events.
AI Enabler
Streaming Support Added
We've added streaming response support to the AI Enabler LLM Proxy. Enable streaming by setting "stream": true
in your request body to receive responses in a streaming format compatible with OpenAI's streaming response structure. See our documentation for guidance.
Support for Llama 3.1 and 3.2 Models
Model deployment options now include Llama 3.1 and Llama 3.2, expanding the available choices for AI workloads.
API Path Update
The proxy-component API endpoint has been updated to a different path and now returns component status for multiple clusters in a single call. Previous single-cluster responses have been updated to return a list format.
See the updated endpoint in the documentation.
Model Cleanup API Endpoint Added
The new endpoint enables the programmatic removal of deployed hosted models and their associated providers. The cleanup process preserves any automatically created node templates for future use.
Configurable Router Quality Settings
Router quality weights can now be configured at both organization and request levels. This setting balances model quality against cost when routing requests:
- Organization-wide default configuration option
- Per-request override capability through the routerQualityWeight parameter
- Scale from 0 (cost-focused) to 1 (quality-focused), defaulting to 0.5
Check out the updated endpoint documentation.
User Interface Improvements
Expanded Workload Autoscaler Event Logs
The Workload Autoscaler event log now includes detailed Horizontal Pod Autoscaling and Vertical Pod Autoscaling-related events, providing better visibility into automated scaling decisions and their impacts.
Standardized Search in Workload Autoscaler Logs
The Workload Autoscaler event log search now includes autocomplete functionality for names and IDs, matching the behavior found in other reporting sections of the platform.
Enhanced Resource Surge Event Details
Resource surge events now display both the original recommendation value and the usage value that triggered the surge, providing a clearer context for scaling decisions.
More Focus on Critical Notifications in the UI
We have listened to customer feedback and updated our console UI to better inform you about any critical notifications you might have missed before. Critical notifications will appear and persist in your cluster list until you actively dismiss them.
Improved Date Selection across the UI
The Cast AI console interface now features an improved date picker with a more granular time selection. The update enables more precise period analysis and clearer data representation.
API and Metrics Improvements
Resource Usage API Update
We've updated the /v1/cost-reports/clusters/{clusterId}/resource-usage
endpoint to support a smaller minimum aggregation interval of 5 minutes for the stepSeconds
parameter.
Infrastructure as Code
New Terraform Data Source for Rebalancing Schedules
Added a Terraform data source for rebalancing schedules, enabling better separation between organization-wide and cluster-level configurations. This allows teams to:
- Define rebalancing schedules in organization-level workspaces
- Reference these schedules in cluster-level workspaces
- Avoid naming conflicts across multiple Terraform configurations
See the pull request for more details and updated documentation.
Improved Terraform Credentials Management
The Terraform provider (v7.23.0) introduces credential synchronization between the Terraform state and Cast AI. Key improvements include:
- Automatic drift detection for credential mismatches
- Forced re-application when credentials are reset
These changes prevent state inconsistencies and enable more reliable credential management through infrastructure as code.
Terraform and Agent Updates
We've released an updated version of our Terraform provider. As always, the latest changes are detailed in the changelog. The updated provider and modules are ready for use in your infrastructure as code projects in Terraform's registry.
We have released a new version of the CAST AI agent. The complete list of changes is here. To update the agent in your cluster, please follow these steps.