📣
Early Access Feature
This feature is in early access. It may undergo changes based on user feedback and continued development. We recommend testing in non-production environments first and welcome your feedback to help us improve.

Cast AI for Karpenter brings Cast AI's optimization capabilities to clusters running open-source Karpenter. This page provides an overview of available features and how they integrate with your existing Karpenter setup.

For a conceptual introduction to Cast AI for Karpenter, see Cast AI for Karpenter overview.

Feature availability

The following features are available for Karpenter-managed clusters:

Feature	Description	Karpenter integration
Evictor	Workload consolidation through Evictor with container live migration capabilities	Works alongside Karpenter's consolidation
Rebalancer	Cluster-wide cost optimization through Node selection and replacement	Coordinates with Karpenter provisioning
Spot intelligence	Interruption prediction	Enhances Karpenter's Spot handling
Workload Autoscaler	Continuous workload rightsizing	Feeds optimized requests to Karpenter
Pod mutations	Automated Pod spec adjustments	Simplifies workload configuration
Cost reporting	Savings analysis and cost monitoring	Read-only analysis of Karpenter clusters

How Cast AI features work with Karpenter

Cast AI features are designed to extend Karpenter rather than replace it. The integration follows these principles:

Karpenter remains the provisioner
Node creation and deletion continue to flow through Karpenter. Cast AI influences decisions by modifying Karpenter CRDs and providing optimization signals, but Karpenter executes the actual infrastructure changes.

CRD-native configuration
Where possible, Cast AI stores configuration in Kubernetes-native formats. Your existing NodePools and EC2NodeClasses remain the source of truth for provisioning constraints.

Incremental enablement
Each feature can be enabled independently. You can start with cost reporting only, then gradually enable optimization features as you build confidence.

Feature details

Evictor

Cast AI's Evictor integrates with Karpenter to improve Node utilization while minimizing workload disruption.

Although Karpenter has native consolidation, Evictor provides significantly better bin-packing through its integration with Workload Autoscaler and Container Live Migration. This coordination allows Evictor to consolidate based on actual resource usage rather than just requested resources, achieving utilization levels that Karpenter's consolidation alone cannot reach.

What it adds to Karpenter:

Coordination with Workload Autoscaler to consolidate Pods according to optimized resource requests, even if they are yet to be applied to the Pod and are pending
Container Live Migration support for eligible workloads, preserving Pod state and TCP connections when moving Pods from Node to Node (graceful fallback to traditional eviction included)
Progressive consolidation that respects Pod Disruption Budgets
Superior bin-packing efficiency by considering actual workload usage patterns, not just static resource requests

How it differs from standard Cast AI:

Aspect	With Karpenter	Standard Cast AI
Node selection	Evictor identifies candidates; Karpenter handles Node lifecycle	Evictor works with Cast AI Autoscaler directly
Consolidation trigger	Coordinates with Karpenter's consolidation settings	Cast AI controls consolidation timing
Node deletion	Karpenter deletes empty Nodes	Cast AI Autoscaler deletes Nodes

Coordinating with Karpenter consolidation

When Evictor is enabled, Cast AI marks Nodes with karpenter.sh/do-not-disrupt to prevent Karpenter from consolidating them. This ensures Evictor and Karpenter's consolidation don't conflict.

Evictor respects your NodePool's consolidateAfter setting. Evictor will not consolidate a Node until both its own grace period and Karpenter's consolidateAfter window have passed. For example, if your NodePool specifies:

disruption:
  consolidateAfter: 30m
  consolidationPolicy: WhenEmptyOrUnderutilized

Evictor will wait at least 30 minutes before considering the Node for consolidation, preventing conflicts with Karpenter's consolidation policy.

Consolidation priority

When both Evictor and Karpenter's native consolidation are enabled, Karpenter's consolidation takes precedence. Evictor will not attempt to consolidate a node until:

Karpenter's consolidateAfter window has passed
Evictor's own grace period has elapsed

This ensures predictable behavior and prevents conflicts between the two consolidation mechanisms. If you notice Evictor not acting on apparently underutilized nodes, check your NodePool's consolidateAfter setting first—Evictor is likely waiting for that window to pass.

Protecting Nodes from consolidation

To exclude specific Nodes from Evictor consolidation, apply the autoscaling.cast.ai/removal-disabled label. This works similarly to Karpenter's karpenter.sh/do-not-disrupt label and allows you to manually protect critical Nodes.

For Evictor documentation, see Evictor.

Rebalancer

The Rebalancer optimizes your entire cluster by identifying Nodes that could be replaced with more cost-effective alternatives for your workloads.

What it adds to Karpenter:

Cross-NodePool optimization that Karpenter doesn't perform natively
Awareness of Reserved Instances and Savings Plans
Coordinated replacements that maintain workload stability
Integration with Workload Autoscaler to rebalance based on optimized resource requirements

How it differs from standard Cast AI:

Aspect	With Karpenter	Standard Cast AI
Node replacement	Rebalancer cordons Nodes; Karpenter provisions replacements	Cast AI handles both cordoning and provisioning
Instance selection	Influences Karpenter via CRD modifications	Cast AI selects instances directly
Commitment awareness	No native commitments integration	Native commitments integration
Drain controls	Limited by Karpenter's drain behavior	Full control over drain timing and aggression

Rebalancing scope

Cast AI for Karpenter only rebalances nodes that were created and are managed by Karpenter. The following nodes are excluded from rebalancing:

Legacy nodes created before Karpenter was installed
EKS-managed node group nodes
Nodes created by other provisioners (Cluster Autoscaler, etc.)

This limitation exists by design: Cast AI for Karpenter operates through Karpenter's CRDs and permissions, and does not have cloud-side control over non-Karpenter nodes. If you need to optimize non-Karpenter nodes, consider migrating them to Karpenter management or using Cast AI's standard Autoscaler.

Rebalancer capabilities not available with Karpenter

When using Rebalancer with Karpenter, the following capabilities from standard Cast AI Rebalancer are not available:

Capability	Why it's unavailable
Aggressive mode	Karpenter controls drain timing
Graceful eviction controls	Karpenter handles node lifecycle
Paused drain configuration	Not supported with Karpenter
Progress bar in UI	Track progress via Rebalancing CRDs instead
Container Live Migration	CLM is only available with Evictor consolidation, not Rebalancer

These limitations exist because Karpenter manages the Node lifecycle. Rebalancer coordinates with Karpenter by cordoning Nodes and creating new NodeClaims, while Karpenter handles the actual provisioning, drain, and deletion.

Tracking rebalancing progress

The Cast AI console may not show detailed progress during rebalancing. For full visibility, inspect the Rebalancing CRD:

kubectl get rebalancing -n castai-agent
kubectl describe rebalancing <name> -n castai-agent

The Rebalancing CRD contains the current phase, human-readable status messages, and related events for each step. This is the source of truth for rebalancing status and troubleshooting.

For Rebalancer documentation, see Rebalancer.

Spot intelligence

Cast AI improves Karpenter's Spot Instance handling with predictive capabilities and reliability improvements.

What it adds to Karpenter:

Interruption prediction — Identifies at-risk Nodes before AWS announces interruptions
Spot reliability model (coming in future releases) — Steers toward historically stable Spot pools
Spot fallback recovery (coming in future releases) — Automatically returns to Spot when capacity becomes available again

How interruption prediction works

Cast AI's Spot interruption prediction provides up to 30 minutes of advance warning before interruptions occur—significantly extending AWS's standard 2-minute notice. The prediction mechanism differs based on whether Container Live Migration is enabled:

CLM Status	Prediction Response
Without CLM	Kentroller signals Karpenter's interruption queue, triggering Karpenter's standard node replacement workflow with extended lead time
With CLM	Kentroller triggers Evictor to consolidate the at-risk node, using Container Live Migration to move workloads to stable nodes with zero downtime before the interruption occurs

In both cases, the extended prediction window provides significantly more time for graceful workload migration compared to waiting for AWS's native interruption signal.

How it differs from standard Cast AI:

Aspect	With Karpenter	Standard Cast AI
Pool selection	Influences Karpenter's instance type priorities	Cast AI selects pools directly
Fallback handling	Monitors Karpenter's fallback Nodes for recovery	Native fallback and recovery
Prediction response	Signals Karpenter to replace at-risk Nodes	Direct Node replacement

For Spot handling documentation, see Spot Instances and Spot Handler.

Workload Autoscaler

Workload Autoscaler continuously rightsizes workloads based on actual resource usage.

What it adds to Karpenter:

Automatic adjustment of CPU and memory requests to match actual usage
Tighter bin-packing as rightsized workloads require less capacity
Integration with Evictor and Rebalancer for coordinated optimization

How it differs from standard Cast AI:

Aspect	With Karpenter	Standard Cast AI
Request updates	Workload Autoscaler updates requests; Karpenter sees new requirements	Same behavior
Node impact	Karpenter may consolidate as requests decrease	Cast AI coordinates this directly with Evictor
Scaling policies	Applied identically	Applied identically

Workload Autoscaler behavior is largely identical whether you're using Cast AI for Karpenter or Cast AI's Autoscaler—it operates at the workload level independently of Node provisioning.

For Workload Autoscaler documentation, see Workload Autoscaling.

Pod mutations

Pod mutations automate Pod spec adjustments to simplify workload configuration and reduce manual efforts by teams.

What it adds to Karpenter:

Automatic application of labels, tolerations, and NodeSelectors
Simplified onboarding without modifying Deployment manifests
Consistent Pod configuration across workloads

How it differs from standard Cast AI:

Pod mutations work identically with Karpenter and standard Cast AI. The mutations apply to Pod specs before creation, independent of which autoscaler provisions Nodes.

For Pod mutations documentation, see Pod mutations.

Cost reporting

The savings report and other cost monitoring capabilities provide visibility into your cluster's optimization potential without making any changes.

What it provides:

Current vs. optimized cost comparison
Node utilization and bin-packing analysis
Commitment utilization tracking
Spot adoption opportunities
Workload rightsizing recommendations

How it differs from standard Cast AI:

Cost reporting works identically for Karpenter clusters. The analysis examines your current state and models what Cast AI optimization could achieve.

For general cost monitoring, see Cost Monitoring.

Labels reference

Cast AI for Karpenter uses specific labels to coordinate optimization activities with Karpenter. Understanding these labels helps with troubleshooting and protecting critical nodes.

Node labels

Label	Purpose	Applied by
`karpenter.sh/do-not-disrupt`	Prevents Karpenter from consolidating or disrupting the node	Kentroller (automatically applied when Evictor is active)
`autoscaling.cast.ai/removal-disabled`	Prevents Evictor from consolidating the node	Customer (manual protection)

Workload labels

Label	Purpose	Applied by
`live.cast.ai/migration-enabled=true`	Indicates the workload is eligible for Container Live Migration	Live Controller (automatic assessment)
`live.cast.ai/migration-enabled=false`	Indicates the workload cannot be live-migrated	Live Controller (automatic assessment)

Protecting nodes from optimization

To exclude a node from Evictor consolidation only:

kubectl label node <node-name> autoscaling.cast.ai/removal-disabled=true

The node remains available for Karpenter's native consolidation.

To exclude a node from both Evictor and Karpenter consolidation:

kubectl label node <node-name> karpenter.sh/do-not-disrupt=true

Note: When Evictor is enabled, Kentroller automatically applies karpenter.sh/do-not-disrupt to nodes to prevent conflicts between Evictor and Karpenter consolidation. Evictor then manages consolidation exclusively for those nodes.

Features not available with Karpenter

Some Cast AI capabilities require tighter integration with Node scheduling than the Karpenter-layered approach allows:

Feature	Why it's not available	Alternative
Pod Pinner	Requires Cast AI Autoscaler's scheduling integration	Use Karpenter's native Pod affinity
Cluster hibernation	Requires direct control over Node lifecycle	Use Karpenter's NodePool weight=0 for manual scaling
Commitment-aware instance selection	Rebalancer cannot directly influence Karpenter's instance selection	Use NodePool requirements to prefer commitment-covered instance families

Additionally, the following Rebalancer capabilities are not available when using Karpenter (see Rebalancing scope for details):

Aggressive mode
Graceful eviction controls
Paused drain configuration
Progress bar in UI
Container Live Migration (available with Evictor only)

To benefit from these capabilities, consider migrating to Cast AI Autoscaler.

Related resources

Cast AI for Karpenter overview — Conceptual introduction
Getting started — Connect your cluster