Spot Handler
Spot Handler is an open-source component that monitors Spot Instance interruption events from your cloud provider and reports them to the Cast AI platform. You can find the source code on GitHub.
What Spot Handler does
Spot Handler monitors Spot Instance interruption signals from your cloud provider and sends this data to Cast AI. This data helps improve Cast AI's Spot reliability and interruption prediction models.
Important: Spot Handler does not take any action on nodes or workloads. It does not:
- Drain nodes when receiving interruption signals
- Initiate pod rescheduling
- Perform graceful shutdowns
How Spot interruptions are handled
When a Spot Instance is interrupted by your cloud provider:
- Spot Handler detects the interruption signal and reports it to Cast AI
- The cloud provider terminates the node according to their policies (e.g., GCP provides 30 seconds before termination)
- Kubernetes handles pod termination using standard procedures
- Cast AI's Autoscaler provisions replacement capacity
For proactive interruption mitigation before nodes are interrupted, enable these Autoscaler features:
- Spot reliability: Targets instance types less likely to be interrupted
- Interruption prediction model: Gracefully drains and rebalances nodes predicted for interruption
Install Spot Handler
Spot Handler is installed automatically during Phase 1 cluster onboarding using the Helm chart available at https://github.com/castai/helm-charts/tree/main/charts/castai-spot-handler. This ensures that Cast AI can begin collecting Spot interruption data immediately after connecting your cluster, improving Spot reliability predictions from the start.
If Spot Handler was uninstalled or you need to reinstall it manually, you can do so using the following commands.
- Add the Cast AI Helm charts repository:
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update- Install Spot Handler:
helm upgrade --install castai-spot-handler castai-helm/castai-spot-handler -n castai-agent \
--set castai.apiKey=<your-api-token> \
--set castai.clusterID=<your-cluster-id> \
--set castai.provider=<your-CSP-provider> #AWS|AZURE|GCPUpgrade Spot Handler
To upgrade Spot Handler to the latest version:
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
helm upgrade castai-spot-handler castai-helm/castai-spot-handler --reset-then-reuse-values -n castai-agentImprove Spot interruption handling
To minimize the impact of Spot interruptions:
- Design for high availability: Use multiple replicas and distribute pods across failure domains
- Enable Spot handling mechanisms: Configure Spot reliability and interruption prediction in your Node templates
- Configure Pod Disruption Budgets: Set appropriate PDBs for critical workloads
- Use preStop hooks: Configure lifecycle hooks in your containers for graceful shutdown
For more information, see Spot Instances.
Troubleshooting
Check Spot Handler logs:
kubectl logs -l name=spot-handler -n castai-agentUpdated 1 day ago
