Getting started with Cast AI for Karpenter
This guide walks you through connecting your Karpenter-managed cluster to Cast AI, deploying optimization features, and running your first rebalancing operation. By the end, you'll have Cast AI working alongside Karpenter with active optimization in place.
What you'll accomplish:
- Connect your EKS cluster running Karpenter to Cast AI
- Deploy the full Karpenter Enterprise suite
- Run your first cluster rebalancing
- Enable workload autoscaling for continuous optimization
Prerequisites
Required tools
Ensure the following tools are installed and accessible:
-
kubectl (v1.19+) configured to access your cluster
Install kubectl -
Helm (v3.0+) for deploying Cast AI components
Install Helm
The onboarding script handles component installation via Helm automatically.
Cluster requirements
Your cluster must meet these requirements:
- Amazon EKS cluster with Karpenter version 0.32.0 or later installed and operational
- kubectl access with cluster-admin or equivalent permissions to create namespaces and deploy workloads
- Outbound internet access to
api.cast.aiandgrpc.cast.ai
AWS permissions
The IAM principal running the onboarding script needs permissions to:
- Create IAM roles and policies
- Create instance profiles
- Manage EC2 security groups
For the minimal required permissions policy, see Cloud permissions.
Cast AI account
If you don't have a Cast AI account, sign up here. The savings report and basic features are available on the free tier.
Step 1: Connect cluster and deploy Cast AI
- Log in to the Cast AI console and click Connect cluster.

- In the connection modal, select EKS as your provider and Karpenter as your autoscaling tool, then click Generate script.

- Copy the generated script on the Enable automation screen and run it in your terminal.
The full Karpenter Enterprise suite includes Rebalancer, Workload Autoscaler, Evictor, Spot interruption prediction, and Pod mutations. These components work alongside Karpenter without replacing it.

What components get installed
The onboarding script installs Cast AI components in the castai-agent namespace:
| Component | Purpose |
|---|---|
castai-agent | Collects cluster metrics and workload data |
castai-aws-node | AWS VPC CNI integration for container networking |
castai-cluster-controller | Coordinates optimization actions |
castai-evictor | Handles intelligent workload consolidation |
castai-kentroller | Integrates Cast AI with Karpenter |
castai-live-controller | Orchestrates container live migration |
castai-live-daemon | Per-node agent for live migration operations |
castai-live-patch-daemonset | Patches nodes for live migration support |
castai-workload-autoscaler | Continuous workload rightsizing |
castai-pod-mutator | Automates Pod spec adjustments |
For more details, see Hosted components.
The script automatically adds Helm repositories, installs all components, detects your cluster ID, and configures feature integrations. Deployment takes 2-3 minutes.
- After deployment completes, verify all pods are running:
All pods should show Running status.
Step 2: Review optimization potential
After closing the confirmation dialog, you'll land on the Get started page.
The Get started page is your optimization dashboard. At the top, you'll see:
- Potential savings – Total monthly savings available
- Current cluster cost vs. Optimal cluster cost – Where you are now vs. where you could be
Below that are feature cards that show each optimization opportunity, along with projected savings and its current status. The Rebalancer card shows your biggest immediate opportunity. This is what you'll run first.
What if features show 'Install' instead of data?
If you deselected features during connection, those cards will show an Install button instead of data. You can install them anytime:
- Click Install on the feature card
- Follow the installation prompts
- The feature will appear as active once installation completes
The Get started page always reflects what's currently deployed in your cluster.
Where did this data come from?
Cast AI analyzed your cluster after onboarding:
- Inventoried all Nodes, Pods, and workloads
- Measured actual CPU and memory usage
- Identified over-provisioned resources
- Compared your current instance types against more optimal alternatives
- Calculated potential consolidation opportunities
This analysis runs continuously, so the numbers stay current as your cluster changes.
You're now ready to either run Rebalancer immediately or configure features first. Most users start with Rebalancer to see immediate results.
Step 3: Configure features (optional)
Before running your first rebalancing, you may want to enable Evictor and review your Karpenter configuration in Cast AI.
Enable Evictor for ongoing consolidation
Evictor is disabled by default, allowing you to control when active Pod consolidation begins. To enable it, navigate to Settings and toggle Evictor under Additional features.
Container live migration with Evictor is automatically enabled, allowing workloads to move with potentially no downtime when supported. Evictor will now continuously monitor your cluster and consolidate workloads when opportunities arise, respecting Pod Disruption Budgets throughout.
Should I enable Evictor now or later?
Enable now if: You want fully automated, continuous optimization. Evictor runs safely in the background and only consolidates when it won't disrupt workloads.
Enable later if: You prefer to run Rebalancer manually first to see how Cast AI optimizes your cluster, then enable continuous consolidation once you're comfortable.
Either approach works—Rebalancer provides immediate, one-time optimization, while Evictor provides ongoing efficiency.
Review your Karpenter configuration
Cast AI automatically imported your Karpenter NodePools and NodeClasses. Navigate to Karpenter in the left sidebar to see them.
This page shows how Cast AI recognizes and preserves your existing Karpenter setup.
What are Node overlays?
You'll see a Node overlays tab marked "Coming soon." This feature will allow Cast AI to dynamically adjust NodePool configurations based on commitment utilization, Spot availability, and other optimization signals—without permanently modifying your NodePool definitions.
For now, Cast AI optimizes by working with your existing NodePools and NodeClasses as-is.
Once you've reviewed Settings and your Karpenter configuration, you're ready to run your first rebalancing.
Step 4: Run your first rebalancing
Rebalancing replaces nodes with more cost-efficient alternatives. This is your first active optimization.
-
Locate the Rebalancer card on the Get started page or the navigation sidebar.
-
Review the projected savings and configuration comparison:
- Current cluster cost and instance count
- Rebalanced cluster cost and optimized instance count
-
Click Rebalance now on the Rebalancer card.
-
Review the rebalancing plan that appears:
- Nodes to be replaced
- Replacement instance types
- Expected cost reduction
-
Click Start rebalancing to begin the operation.
What happens during rebalancing
The rebalancing process:
- Creates new Nodes with more cost-efficient instance types
- Drains workloads from old Nodes (using live migration where supported)
- Deletes old Nodes once workloads are safely moved
- Respects Pod Disruption Budgets throughout
Rebalancing typically completes within 10-15 minutes, depending on cluster size.
Monitor rebalancing progress
You can monitor rebalancing from the Rebalancer page:
- Navigate to Rebalancer in the left sidebar.
- View the active rebalancing operation showing:
- Nodes being replaced
- Progress percentage
- Time remaining
Once rebalancing is complete, return to the Get Started page to view your updated savings figures.
Step 5: Enable workload autoscaling
Workload Autoscaler continuously rightsizes your workloads based on actual resource usage. This optimization runs continuously in the background.
-
Locate the Workload Autoscaler card on the Get started page.
-
Review the projected savings:
- CPU and memory over-provisioning to be reduced
- Monthly cost savings from rightsizing
[SCREENSHOT: Workload Autoscaler card showing projected CPU and memory reductions]
- Click Configure on the Workload Autoscaler card.
[SCREENSHOT: Workload Autoscaler configuration page showing recommendation mode options]
- On the configuration page, review the default settings:
- Recommendation application mode (automatic or manual)
- Resource safety margins
- Workload selection criteria
[SCREENSHOT: Close-up of recommendation settings with safety margins and workload filters]
- Keep the defaults and click Enable Workload Autoscaler.
What happens after enabling
Workload Autoscaler:
- Begins monitoring resource usage across all workloads
- Generates rightsizing recommendations within 24-48 hours
- Applies recommendations automatically (if configured)
- Updates continuously as usage patterns change
You can view recommendations and adjustments on the Workload Autoscaler page.
[SCREENSHOT: Workload Autoscaler page showing active recommendations and applied changes]
Next steps
You've successfully connected your Karpenter cluster and enabled active optimization. From here, you can:
Review the Cluster Scorecard
The Cluster Scorecard provides an overall health and efficiency rating, showing opportunities for further optimization.
Customize feature settings
Visit the Settings page to adjust advanced configurations for each feature.
Understand Karpenter integration
Learn more about how Cast AI works alongside Karpenter in Karpenter Enterprise overview and Karpenter Enterprise features.
Set up alerts and reporting
Configure Slack or email notifications for optimization events and schedule regular savings reports. See Notifications.
Troubleshooting
Components not starting
If pods in the castai-agent namespace are not reaching Running status:
-
Check pod events:
kubectl describe pod -n castai-agent <pod-name> -
Check component logs:
kubectl logs -n castai-agent -l app.kubernetes.io/name=castai-agent -
Verify IAM permissions were created correctly by the onboarding script.
For more guidance, see Troubleshooting Cast AI components.
Cluster not appearing in console
If your cluster doesn't appear after running the script:
-
Verify the agent is running:
kubectl get pods -n castai-agent -l app.kubernetes.io/name=castai-agent -
Check agent logs for connection errors:
kubectl logs -n castai-agent -l app.kubernetes.io/name=castai-agent -c agent -
Ensure your cluster has outbound internet access to
api.cast.aiandgrpc.cast.ai.
For EKS-specific issues, see EKS connection troubleshooting.
Karpenter not detected
If Cast AI doesn't detect Karpenter in your cluster:
-
Verify Karpenter is installed:
kubectl get pods -n karpenter -
Check that Karpenter CRDs exist:
kubectl get crd | grep karpenter
If Karpenter is installed but not detected, contact Cast AI support with your cluster details.
Rebalancing not starting
If rebalancing won't start or shows errors:
-
Verify the Kentroller component is running:
kubectl get pods -n castai-agent -l app=castai-kentroller -
Check Kentroller logs:
kubectl logs -n castai-agent -l app=castai-kentroller -
Ensure Karpenter is operational and able to provision nodes:
kubectl get nodeclaims
Rebalancing requires both Cast AI and Karpenter to be fully operational.
Workload Autoscaler not generating recommendations
If Workload Autoscaler is enabled but not showing recommendations after 48 hours:
-
Verify the Workload Autoscaler pod is running:
kubectl get pods -n castai-agent -l app=castai-workload-autoscaler -
Check that metrics-server is installed and working:
kubectl top nodes kubectl top pods -A -
Review Workload Autoscaler logs for errors:
kubectl logs -n castai-agent -l app=castai-workload-autoscaler
Workload Autoscaler requires metrics-server for resource usage data. The onboarding script installs it automatically if not present.
Related resources
- Cast AI for Karpenter overview – How Cast AI extends Karpenter
- Cast AI for Karpenter features – Detailed feature documentation
- Rebalancer – Understanding cluster rebalancing
- Workload Autoscaler – Workload rightsizing details
- Migration from Karpenter – Moving to full Cast AI Autoscaler
Updated 32 minutes ago
