Spot interruption prediction API

The Spot Interruption Prediction API allows you to predict whether Spot Instances will be interrupted by the cloud provider within a defined time window. This enables proactive management of workloads running on Spot Instances.

📘

Limited Availability

This API is available upon request. Contact your Cast AI Account Manager or Customer Success team for access.

Overview

Spot Instances offer significant cost savings compared to On-Demand Instances but can be interrupted by cloud providers with minimal notice (30 seconds to 2 minutes, depending on the provider) on a best-effort basis. Cloud providers may not always provide advance notice before interruptions. These interruptions can cause application downtime and service disruptions.

Cast AI uses machine learning models trained on historical interruption data to predict Spot Instance interruptions based on near real-time cloud information. By predicting interruptions before they occur, you can:

  • Proactively migrate workloads to new instances before interruption signals arrive
  • Minimize downtime by starting replacement instances ahead of time
  • Reduce pod eviction delays from minutes to seconds

The API accepts Spot Instance characteristics and returns predictions with associated probability scores for each instance.

Endpoint access: Contact your Cast AI Account Manager or Customer Success team to obtain the API endpoint URL.

Supported cloud providers and prediction windows:

  • AWS: Predictions cover the next 1 hour
  • GCP: Predictions cover the next 3 hours

When to use this API

This API is for users who want to integrate Spot interruption predictions into their own infrastructure management solutions, independent of Cast AI's cluster management platform.

Use this API if you:

  • Use infrastructure tools like Karpenter that lack native Spot interruption prediction
  • Build custom autoscaling logic and need interruption predictions as input
  • Manage Spot Instances outside of Cast AI's autoscaler
  • Develop custom schedulers that factor in interruption risk

Use Cast AI's platform instead if you: Let Cast AI manage your clusters with built-in interruption handling. Configure it through node templates by enabling the Interruption prediction model feature—Cast AI will automatically handle node rebalancing when interruptions are predicted.

How it works

Traditional reactive approaches wait for interruption signals from cloud providers before taking action. This leaves insufficient time to provision replacement instances, especially on GCP and Azure where mean node creation time (50-160 seconds) exceeds the interruption notice period.

The Spot Interruption Prediction API enables a proactive approach:

  1. You send Spot Instance metadata to the API at regular intervals
  2. The API uses machine learning models to predict interruption likelihood within the prediction window (1 hour for AWS, 3 hours for GCP)
  3. When a high-probability interruption is predicted, you can create replacement instances and migrate workloads before the actual interruption occurs
  4. This reduces pod downtime to only the time required to start pods on the new instance

The models use features including instance type, region, Availability Zone, Spot and On-Demand pricing trends, node mortality rates (historical interruption frequency), cluster operation patterns, and node age and lifecycle events.

Authentication

All API requests require authentication using a Cast AI API token. Include your API token in the request header:

X-API-Key: <your-api-token>

To obtain an API token, see Obtaining API access key. You can generate a Full Access token in the Cast AI console.

Request format

Send a POST request with a JSON body containing an array of Spot Instances to predict.

The request body requires a nodes array containing the list of Spot Instances to predict interruptions for.

Node object parameters:

Each node in the nodes array must include:

FieldTypeRequiredDescription
idstringYesUnique identifier for the node. Used to correlate requests and responses.
cloudstringYesCloud provider where the node runs. Possible values: AWS, GCP, CLOUD_UNSPECIFIED.
regionstringYesCloud region (e.g., us-west-2, eu-central-1).
availabilityZoneIdstringYesCloud-specific zone identifier. For AWS, use the Availability Zone ID (e.g., use1-az2).
instanceTypestringYesInstance type (e.g., t3.medium, n1-standard-4).
nodeCreateTimestring (ISO 8601)YesTimestamp when the node was created in ISO 8601 format.
rebalanceRecommendationTimestring (ISO 8601)NoTimestamp when a rebalance recommendation was received. AWS only. ISO 8601 format.

To retrieve node information for your predictions, use the Cast AI API:

  1. List nodes in your cluster using the List nodes endpoint to get node IDs
  2. Get detailed node information using the Get node endpoint with each node ID

Response format

The API returns predictions for each node submitted in the request.

Response body parameters:

FieldTypeDescription
predictionsarrayArray of prediction objects, one for each node.

Prediction object parameters:

FieldTypeDescription
idstringNode identifier from the request.
interruptionbooleanWhether the Spot Instance is predicted to be interrupted within the prediction window (1 hour for AWS, 3 hours for GCP).
probabilityfloatProbability of interruption within the prediction window (0.0 to 1.0). The prediction window is 1 hour for AWS and 3 hours for GCP.

Example API calls

curl -X POST <API_ENDPOINT_URL> \
  -H "X-API-Key: YOUR_API_KEY_HERE" \
  -H "Content-Type: application/json" \
  -d '{
    "nodes": [
      {
        "id": "i-1234567890abcdef0",
        "cloud": "AWS",
        "region": "us-east-1",
        "availabilityZoneId": "use1-az2",
        "instanceType": "t3.medium",
        "nodeCreateTime": "2025-10-20T10:00:00Z",
        "rebalanceRecommendationTime": "2025-10-20T11:30:00Z"
      }
    ]
  }'

Example response:

{
  "predictions": [
    {
      "id": "i-1234567890abcdef0",
      "interruption": true,
      "probability": 0.87
    }
  ]
}

Error handling

The API returns standard HTTP status codes:

Status CodeDescription
400Invalid request format or missing required fields.
401Authentication failed. Verify your API token.
500Internal server error. Retry the request.

Error responses include a Status object with a numeric code, human-readable message, and additional error context in details (if available).

Additional resources


What’s Next