Spot Handler

Spot Handler is an open-source component that monitors Spot Instance interruption events from your cloud provider and reports them to the Cast AI platform. You can find the source code on GitHub.

What Spot Handler does

Spot Handler monitors Spot Instance interruption signals from your cloud provider and sends this data to Cast AI. This data helps improve Cast AI's Spot reliability and interruption prediction models.

Important: Spot Handler does not take any action on nodes or workloads. It does not:

  • Drain nodes when receiving interruption signals
  • Initiate pod rescheduling
  • Perform graceful shutdowns

How Spot interruptions are handled

When a Spot Instance is interrupted by your cloud provider:

  1. Spot Handler detects the interruption signal and reports it to Cast AI
  2. The cloud provider terminates the node according to their policies (e.g., GCP provides 30 seconds before termination)
  3. Kubernetes handles pod termination using standard procedures
  4. Cast AI's Autoscaler provisions replacement capacity

For proactive interruption mitigation before nodes are interrupted, enable these Autoscaler features:

Install Spot Handler

Spot Handler is installed automatically during Phase 1 cluster onboarding using the Helm chart available at https://github.com/castai/helm-charts/tree/main/charts/castai-spot-handler. This ensures that Cast AI can begin collecting Spot interruption data immediately after connecting your cluster, improving Spot reliability predictions from the start.

If Spot Handler was uninstalled or you need to reinstall it manually, you can do so using the following commands.

⚠️

Helm 4 users

Helm 4 currently has a known issue causing installation failures on the first attempt due to Server-Side Apply conflicts. Add --force-conflicts to your install command if you encounter this. See GitHub issues #31516 and #31510.

  1. Add the Cast AI Helm charts repository:
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
  1. Install Spot Handler:
helm upgrade --install castai-spot-handler castai-helm/castai-spot-handler -n castai-agent \
  --set castai.apiKey=<your-api-token> \
  --set castai.clusterID=<your-cluster-id> \
  --set castai.provider=<your-CSP-provider> #AWS|AZURE|GCP

Upgrade Spot Handler

For Operator-managed installations

If Spot Handler is managed by the Cast AI Operator, upgrades are handled automatically through the Cast AI console:

  1. Navigate to Manage Org >Component Control in the Console
  2. Select Spot Handler
  3. Click Update when a new version is available

No additional steps are required.

For manual (Helm) installations

To upgrade Spot Handler to the latest version manually:

helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
helm upgrade castai-spot-handler castai-helm/castai-spot-handler --reuse-values -n castai-agent

Recommendation

Consider installing the Cast AI Operator to simplify future upgrades and enable automatic component management.

Improve Spot interruption handling

To minimize the impact of Spot interruptions:

  1. Design for high availability: Use multiple replicas and distribute pods across failure domains
  2. Enable Spot handling mechanisms: Configure Spot reliability and interruption prediction in your Node templates
  3. Configure Pod Disruption Budgets: Set appropriate PDBs for critical workloads
  4. Use preStop hooks: Configure lifecycle hooks in your containers for graceful shutdown

For more information, see Spot Instances.

Troubleshooting

Check Spot Handler logs:

kubectl logs -l name=spot-handler -n castai-agent