Spot Handler

Spot Handler is an open-source component that monitors Spot Instance interruption events from your cloud provider and reports them to the Cast AI platform. You can find the source code on GitHub.

What Spot Handler does

Spot Handler monitors Spot Instance interruption signals from your cloud provider and sends this data to Cast AI. This data helps improve Cast AI's Spot reliability and interruption prediction models.

Important: Spot Handler does not take any action on nodes or workloads. It does not:

  • Drain nodes when receiving interruption signals
  • Initiate pod rescheduling
  • Perform graceful shutdowns

How Spot interruptions are handled

When a Spot Instance is interrupted by your cloud provider:

  1. Spot Handler detects the interruption signal and reports it to Cast AI
  2. The cloud provider terminates the node according to their policies (e.g., GCP provides 30 seconds before termination)
  3. Kubernetes handles pod termination using standard procedures
  4. Cast AI's Autoscaler provisions replacement capacity

For proactive interruption mitigation before nodes are interrupted, enable these Autoscaler features:

Install Spot Handler

Spot Handler is installed automatically during Phase 1 cluster onboarding using the Helm chart available at https://github.com/castai/helm-charts/tree/main/charts/castai-spot-handler. This ensures that Cast AI can begin collecting Spot interruption data immediately after connecting your cluster, improving Spot reliability predictions from the start.

If Spot Handler was uninstalled or you need to reinstall it manually, you can do so using the following commands.

  1. Add the Cast AI Helm charts repository:
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
  1. Install Spot Handler:
helm upgrade --install castai-spot-handler castai-helm/castai-spot-handler -n castai-agent \
  --set castai.apiKey=<your-api-token> \
  --set castai.clusterID=<your-cluster-id> \
  --set castai.provider=<your-CSP-provider> #AWS|AZURE|GCP

Upgrade Spot Handler

To upgrade Spot Handler to the latest version:

helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
helm upgrade castai-spot-handler castai-helm/castai-spot-handler --reset-then-reuse-values -n castai-agent

Improve Spot interruption handling

To minimize the impact of Spot interruptions:

  1. Design for high availability: Use multiple replicas and distribute pods across failure domains
  2. Enable Spot handling mechanisms: Configure Spot reliability and interruption prediction in your Node templates
  3. Configure Pod Disruption Budgets: Set appropriate PDBs for critical workloads
  4. Use preStop hooks: Configure lifecycle hooks in your containers for graceful shutdown

For more information, see Spot Instances.

Troubleshooting

Check Spot Handler logs:

kubectl logs -l name=spot-handler -n castai-agent