Feature reference
Cast AI for Karpenter brings Cast AI's optimization capabilities to clusters running open-source Karpenter. This page provides an overview of available features and how they integrate with your existing Karpenter setup.
For a conceptual introduction to Cast AI for Karpenter, see Cast AI for Karpenter overview.
Feature availability
The following features are available for Karpenter-managed clusters:
| Feature | Description | Karpenter integration |
|---|---|---|
| Continuous rebalancing | Ongoing workload consolidation with container live migration capabilities | Replaces Karpenter's native consolidation |
| Rebalancer | Cluster-wide cost optimization through Node selection and replacement | Coordinates with Karpenter provisioning |
| Spot intelligence | Interruption prediction | Enhances Karpenter's Spot handling |
| Workload Autoscaler | Continuous workload rightsizing | Feeds optimized requests to Karpenter |
| Pod mutations | Automated Pod spec adjustments | Simplifies workload configuration |
| Cost reporting | Savings analysis and cost monitoring | Read-only analysis of Karpenter clusters |
How Cast AI features work with Karpenter
Cast AI features are designed to extend Karpenter rather than replace it. The integration follows these principles:
Karpenter remains the provisioner
Node creation and deletion continue to flow through Karpenter. Cast AI influences decisions by modifying Karpenter CRDs and providing optimization signals, but Karpenter executes the actual infrastructure changes.
CRD-native configuration
Where possible, Cast AI stores configuration in Kubernetes-native formats. Your existing NodePools and EC2NodeClasses remain the source of truth for provisioning constraints.
Incremental enablement
Each feature can be enabled independently. You can start with cost reporting only, then gradually enable optimization features as you build confidence.
Feature details
Continuous rebalancing
Kentroller's Continuous Rebalancing monitors your cluster on a recurring cycle and consolidates underutilized nodes. When enabled, it takes over consolidation from Karpenter entirely — Karpenter continues to handle drift detection, but cost-driven consolidation is managed exclusively by Kentroller.
What it adds to Karpenter:
- Coordination with Workload Autoscaler to consolidate Pods based on actual resource usage, including pending rightsizing recommendations not yet applied
- Container Live Migration for eligible workloads, preserving Pod state and TCP connections (graceful fallback to traditional eviction included)
- Configurable modes —
delete-empty,drain-only, andfull— for progressive or aggressive consolidation - Savings thresholds that ensure rebalancing only runs when projected gains are worthwhile
- Fine-grained per-Pod and per-Node eviction policies via eviction config
Protecting Nodes from consolidation
To exclude specific Nodes from Continuous Rebalancing, apply the autoscaling.cast.ai/removal-disabled label:
kubectl label node <node-name> autoscaling.cast.ai/removal-disabled=trueFor full configuration details, see Continuous rebalancing.
Rebalancer
The Rebalancer optimizes your entire cluster by identifying Nodes that could be replaced with more cost-effective alternatives for your workloads.
What it adds to Karpenter:
- Cross-NodePool optimization that Karpenter doesn't perform natively
- Awareness of Reserved Instances and Savings Plans
- Coordinated replacements that maintain workload stability
- Integration with Workload Autoscaler to rebalance based on optimized resource requirements
How it differs from standard Cast AI:
| Aspect | With Karpenter | Standard Cast AI |
|---|---|---|
| Node replacement | Rebalancer cordons Nodes; Karpenter provisions replacements | Cast AI handles both cordoning and provisioning |
| Instance selection | Influences Karpenter via CRD modifications | Cast AI selects instances directly |
| Commitment awareness | No native commitments integration | Native commitments integration |
| Drain controls | Limited by Karpenter's drain behavior | Full control over drain timing and aggression |
Rebalancing scope
Cast AI for Karpenter only rebalances nodes that were created and are managed by Karpenter. The following nodes are excluded from rebalancing:
- Legacy nodes created before Karpenter was installed
- EKS-managed node group nodes
- Nodes created by other provisioners (Cluster Autoscaler, etc.)
This limitation exists by design: Cast AI for Karpenter operates through Karpenter's CRDs and permissions, and does not have cloud-side control over non-Karpenter nodes. If you need to optimize non-Karpenter nodes, consider migrating them to Karpenter management or using Cast AI's standard Autoscaler.
Spot intelligence
Cast AI improves Karpenter's Spot Instance handling with predictive capabilities and reliability improvements.
What it adds to Karpenter:
- Interruption prediction — Identifies at-risk Nodes before AWS announces interruptions
- Spot reliability model (coming in future releases) — Steers toward historically stable Spot pools
- Spot fallback recovery (coming in future releases) — Automatically returns to Spot when capacity becomes available again
How interruption prediction works
Cast AI's Spot interruption prediction provides up to 30 minutes of advance warning before interruptions occur—significantly extending AWS's standard 2-minute notice. The prediction mechanism differs based on whether Container Live Migration is enabled:
| CLM Status | Prediction Response |
|---|---|
| Without CLM | Kentroller signals Karpenter's interruption queue, triggering Karpenter's standard node replacement workflow with extended lead time |
| With CLM | Kentroller uses Continuous Rebalancing to consolidate the at-risk node, using Container Live Migration to move workloads to stable nodes with zero downtime before the interruption occurs |
In both cases, the extended prediction window provides significantly more time for graceful workload migration compared to waiting for AWS's native interruption signal.
How it differs from standard Cast AI:
| Aspect | With Karpenter | Standard Cast AI |
|---|---|---|
| Pool selection | Influences Karpenter's instance type priorities | Cast AI selects pools directly |
| Fallback handling | Monitors Karpenter's fallback Nodes for recovery | Native fallback and recovery |
| Prediction response | Signals Karpenter to replace at-risk Nodes | Direct Node replacement |
For Spot handling documentation, see Spot Instances and Spot Handler.
Workload Autoscaler
Workload Autoscaler continuously rightsizes workloads based on actual resource usage.
What it adds to Karpenter:
- Automatic adjustment of CPU and memory requests to match actual usage
- Tighter bin-packing as rightsized workloads require less capacity
- Integration with Continuous Rebalancing and Rebalancer for coordinated optimization
How it differs from standard Cast AI:
| Aspect | With Karpenter | Standard Cast AI |
|---|---|---|
| Request updates | Workload Autoscaler updates requests; Karpenter sees new requirements | Same behavior |
| Node impact | Karpenter may consolidate as requests decrease | Cast AI coordinates this directly with Continuous Rebalancing |
| Scaling policies | Applied identically | Applied identically |
Workload Autoscaler behavior is largely identical whether you're using Cast AI for Karpenter or Cast AI's Autoscaler—it operates at the workload level independently of Node provisioning.
For Workload Autoscaler documentation, see Workload Autoscaling.
Pod mutations
Pod mutations automate Pod spec adjustments to simplify workload configuration and reduce manual efforts by teams.
What it adds to Karpenter:
- Automatic application of labels, tolerations, and NodeSelectors
- Simplified onboarding without modifying Deployment manifests
- Consistent Pod configuration across workloads
How it differs from standard Cast AI:
Pod mutations work identically with Karpenter and standard Cast AI. The mutations apply to Pod specs before creation, independent of which autoscaler provisions Nodes.
For Pod mutations documentation, see Pod mutations.
Cost reporting
The savings report and other cost monitoring capabilities provide visibility into your cluster's optimization potential without making any changes.
What it provides:
- Current vs. optimized cost comparison
- Node utilization and bin-packing analysis
- Commitment utilization tracking
- Spot adoption opportunities
- Workload rightsizing recommendations
How it differs from standard Cast AI:
Cost reporting works identically for Karpenter clusters. The analysis examines your current state and models what Cast AI optimization could achieve.
For general cost monitoring, see Cost Monitoring.
Labels reference
Cast AI for Karpenter uses specific labels to coordinate optimization activities with Karpenter. Understanding these labels helps with troubleshooting and protecting critical nodes.
Node labels
| Label | Purpose | Applied by |
|---|---|---|
karpenter.sh/do-not-disrupt | Prevents Karpenter from consolidating or disrupting the node | Kentroller (automatically managed during consolidation) |
autoscaling.cast.ai/removal-disabled | Prevents Continuous Rebalancing from selecting the node | Customer (manual protection) |
Workload labels
| Label | Purpose | Applied by |
|---|---|---|
live.cast.ai/migration-enabled=true | Indicates the workload is eligible for Container Live Migration | Live Controller (automatic assessment) |
live.cast.ai/migration-enabled=false | Indicates the workload cannot be live-migrated | Live Controller (automatic assessment) |
Protecting nodes from optimization
To exclude a node from Continuous Rebalancing:
kubectl label node <node-name> autoscaling.cast.ai/removal-disabled=trueTo also prevent Karpenter from disrupting the node (including drift-triggered replacements):
kubectl label node <node-name> karpenter.sh/do-not-disrupt=trueFeatures not available with Karpenter
Some Cast AI capabilities require tighter integration with Node scheduling than the Karpenter-layered approach allows:
| Feature | Why it's not available | Alternative |
|---|---|---|
| Pod Pinner | Requires Cast AI Autoscaler's scheduling integration | Use Karpenter's native Pod affinity |
| Cluster hibernation | Requires direct control over Node lifecycle | Use Karpenter's NodePool weight=0 for manual scaling |
| Commitment-aware instance selection | Rebalancer cannot directly influence Karpenter's instance selection | Use NodePool requirements to prefer commitment-covered instance families |
Additionally, the following Rebalancer capabilities are not available when using Karpenter (see Rebalancing scope for details):
- Aggressive mode
- Graceful eviction controls
- Paused drain configuration
- Progress bar in UI
To benefit from these capabilities, consider migrating to Cast AI Autoscaler.
Related resources
- Cast AI for Karpenter overview — Conceptual introduction
- Getting started — Connect your cluster
Updated 3 days ago
