The Spot Interruption Prediction API allows you to predict whether Spot Instances will be interrupted by the cloud provider within a defined time window. This enables proactive management of workloads running on Spot Instances.

📘
Limited Availability
This API is available upon request. Contact your Cast AI Account Manager or Customer Success team for access.

Overview

Spot Instances offer significant cost savings compared to On-Demand Instances but can be interrupted by cloud providers with minimal notice (30 seconds to 2 minutes, depending on the provider) on a best-effort basis. Cloud providers may not always provide advance notice before interruptions. These interruptions can cause application downtime and service disruptions.

Cast AI uses machine learning models trained on historical interruption data to predict Spot Instance interruptions based on near real-time cloud information. By predicting interruptions before they occur, you can:

Proactively migrate workloads to new instances before interruption signals arrive
Minimize downtime by starting replacement instances ahead of time
Reduce pod eviction delays from minutes to seconds

The API accepts Spot Instance characteristics and returns predictions with associated probability scores for each instance.

Endpoint access: Contact your Cast AI Account Manager or Customer Success team to obtain the API endpoint URL.

Supported cloud providers and prediction windows:

AWS: Predictions cover the next 1 hour
GCP: Predictions cover the next 3 hours

When to use this API

This API is for users who want to integrate Spot interruption predictions into their own infrastructure management solutions, independent of Cast AI's cluster management platform.

Use this API if you:

Use infrastructure tools like Karpenter that lack native Spot interruption prediction
Build custom autoscaling logic and need interruption predictions as input
Manage Spot Instances outside of Cast AI's autoscaler
Develop custom schedulers that factor in interruption risk

Use Cast AI's platform instead if you: Let Cast AI manage your clusters with built-in interruption handling. Configure it through node templates by enabling the Interruption prediction model feature—Cast AI will automatically handle node rebalancing when interruptions are predicted.

How it works

Traditional reactive approaches wait for interruption signals from cloud providers before taking action. This leaves insufficient time to provision replacement instances, especially on GCP and Azure where mean node creation time (50-160 seconds) exceeds the interruption notice period.

The Spot Interruption Prediction API enables a proactive approach:

You send Spot Instance metadata to the API at regular intervals
The API uses machine learning models to predict interruption likelihood within the prediction window (1 hour for AWS, 3 hours for GCP)
When a high-probability interruption is predicted, you can create replacement instances and migrate workloads before the actual interruption occurs
This reduces pod downtime to only the time required to start pods on the new instance

The models use features including instance type, region, Availability Zone, Spot and On-Demand pricing trends, node mortality rates (historical interruption frequency), cluster operation patterns, and node age and lifecycle events.

Authentication

All API requests require authentication using a Cast AI API token. Include your API token in the request header:

X-API-Key: <your-api-token>

To obtain an API token, see Obtaining API access key. You can generate a Full Access token in the Cast AI console.

Request format

Send a POST request with a JSON body containing an array of Spot Instances to predict.

The request body requires a nodes array containing the list of Spot Instances to predict interruptions for.

Node object parameters:

Each node in the nodes array must include:

Field	Type	Required	Description
`id`	string	Yes	Unique identifier for the node. Used to correlate requests and responses.
`cloud`	string	Yes	Cloud provider where the node runs. Possible values: `AWS`, `GCP`, `CLOUD_UNSPECIFIED`.
`region`	string	Yes	Cloud region (e.g., `us-west-2`, `eu-central-1`).
`availabilityZoneId`	string	Yes	Cloud-specific zone identifier. For AWS, use the Availability Zone ID (e.g., `use1-az2`).
`instanceType`	string	Yes	Instance type (e.g., `t3.medium`, `n1-standard-4`).
`nodeCreateTime`	string (ISO 8601)	Yes	Timestamp when the node was created in ISO 8601 format.
`rebalanceRecommendationTime`	string (ISO 8601)	No	Timestamp when a rebalance recommendation was received. AWS only. ISO 8601 format.

To retrieve node information for your predictions, use the Cast AI API:

List nodes in your cluster using the List nodes endpoint to get node IDs
Get detailed node information using the Get node endpoint with each node ID

Response format

The API returns predictions for each node submitted in the request.

Response body parameters:

Field	Type	Description
`predictions`	array	Array of prediction objects, one for each node.

Prediction object parameters:

Field	Type	Description
`id`	string	Node identifier from the request.
`interruption`	boolean	Whether the Spot Instance is predicted to be interrupted within the prediction window (1 hour for AWS, 3 hours for GCP).
`probability`	float	Probability of interruption within the prediction window (0.0 to 1.0). The prediction window is 1 hour for AWS and 3 hours for GCP.

Example API calls

curl -X POST <API_ENDPOINT_URL> \
  -H "X-API-Key: YOUR_API_KEY_HERE" \
  -H "Content-Type: application/json" \
  -d '{
    "nodes": [
      {
        "id": "i-1234567890abcdef0",
        "cloud": "AWS",
        "region": "us-east-1",
        "availabilityZoneId": "use1-az2",
        "instanceType": "t3.medium",
        "nodeCreateTime": "2025-10-20T10:00:00Z",
        "rebalanceRecommendationTime": "2025-10-20T11:30:00Z"
      }
    ]
  }'

Example response:

{
  "predictions": [
    {
      "id": "i-1234567890abcdef0",
      "interruption": true,
      "probability": 0.87
    }
  ]
}

Error handling

The API returns standard HTTP status codes:

Status Code	Description
`400`	Invalid request format or missing required fields.
`401`	Authentication failed. Verify your API token.
`500`	Internal server error. Retry the request.

Error responses include a Status object with a numeric code, human-readable message, and additional error context in details (if available).

Additional resources

Cast AI API authentication documentation
Spot Instances documentation - Learn about Cast AI's platform features for Spot Instance management