Create and manage batch jobs

Upload input files, create batch jobs, track progress, and download results using the OpenAI-compatible Batch and Files APIs.

This guide walks through the full batch processing workflow: preparing an input file, uploading it, creating a batch job, monitoring progress, and downloading results. All operations use the OpenAI-compatible APIs exposed by the AI Enabler proxy in your cluster.

For background on how batch processing works, see Batch processing.

Before you start

Before creating a batch job, ensure the following:

  • GKE cluster onboarded with AI Enabler — batch processing currently requires a GKE cluster. AWS (EKS) and Azure (AKS) support is not yet available. The cluster must be onboarded through the Cast AI console with the storage component included. The GCS storage bucket is provisioned automatically during onboarding. See Hosted model deployment for setup instructions.
  • Hosted model deployed and running — batch processing uses an existing hosted model in your cluster. Deploy one from the Model library tab in AI Enabler > Self-hosted models if you haven't already. See Hosted model deployment for details.
  • Tools installed — you need kubectl (configured for your cluster context), gcloud CLI (authenticated), and curl or another HTTP client.

To verify that storage is configured, check for the bucket using your cluster ID. The bucket name follows the pattern castai-ai-optimizer-proxy-storage-<CLUSTER_ID_PREFIX>, where the prefix is the first 8 characters of your cluster ID:

gcloud storage buckets describe gs://castai-ai-optimizer-proxy-storage-<CLUSTER_ID_PREFIX>

Access the AI Enabler proxy

The AI Enabler proxy is not externally exposed. Port-forward the service to access the APIs from your local machine:

kubectl port-forward svc/castai-ai-optimizer-proxy 8080:443 -n castai-agent

All API calls in this guide use http://localhost:8080 as the base URL. Replace the Authorization header value with your Cast AI API key.

📘

Note

Keep the port-forward session running in a separate terminal while you work through the steps below.

Step 1: Prepare the input file

Create a JSONL file where each line is a JSON object representing one inference request. Every line must include custom_id, method, url, and body with a model field.

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama3.2:1b", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Summarize the concept of GPU scheduling."}], "max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama3.2:1b", "messages": [{"role": "user", "content": "Translate the following to French: Hello, how are you?"}], "max_tokens": 1000}}

All requests in a single file must use the same endpoint — either /v1/chat/completions or /v1/embeddings.

⚠️

Warning

The model value in each request must match a self-hosted model deployed in your cluster. The models list endpoint returns all models the proxy can route to — including SaaS provider models and serverless endpoints — but only self-hosted models are valid for batch processing.

The simplest way to find the correct model ID is to check AI Enabler > Self-hosted models > Deployed models in the Cast AI console. The model name shown there is the ID to use in your input file.

Alternatively, you can query the API and look for your deployed model in the response:

curl http://localhost:8080/openai/v1/models \
  --header 'Authorization: Bearer <YOUR_API_KEY>'

This returns all routable models. Match the ID against what you see in the Deployed models list in AI Enabler > Self-hosted models to confirm it's a valid batch target.

Input file limits:

LimitValue
Maximum file size100 MB
Maximum lines (requests)50,000

If you need higher limits, contact Cast AI support.

Step 2: Upload the input file

Upload the JSONL file using the Files API. Set the purpose to batch, which applies a default expiration of 30 days.

curl http://localhost:8080/openai/v1/files \
  --header 'Authorization: Bearer <YOUR_API_KEY>' \
  --form 'purpose="batch"' \
  --form 'file=@"my_batch_input.jsonl"'

The response includes the id you'll need when creating the batch job:

{
  "id": "file_50718733-4ef8-4e25-9fd5-92d8475454b9",
  "object": "file",
  "bytes": 17537,
  "created_at": 1771945452,
  "filename": "my_batch_input.jsonl",
  "purpose": "batch",
  "expires_at": 1774537452
}

The file is stored in your cluster's GCS bucket at gs://castai-ai-optimizer-proxy-storage-<CLUSTER_ID_PREFIX>/openai/files/<FILE_ID>.jsonl. Public access is disabled, so use the Files API to retrieve contents or authenticate directly with GCS.

You can set a custom TTL using the expires_after parameter. See the OpenAI Files API documentation for details.

Step 3: Create a batch job

Create a batch job by calling the Batch API with the uploaded file ID:

curl http://localhost:8080/openai/v1/batches \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <YOUR_API_KEY>' \
  --data '{
    "input_file_id": "file_50718733-4ef8-4e25-9fd5-92d8475454b9",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h",
    "metadata": {
      "cast.ai/batch-name": "My content translation batch"
    }
  }'

Required parameters:

ParameterDescription
input_file_idThe file ID returned from the upload step.
endpointEither /v1/chat/completions or /v1/embeddings. Must match the url values in the input file.
completion_windowSet to 24h. This is the only supported value.

Optional parameters:

ParameterDescription
metadataKey-value pairs for tracking. Use cast.ai/batch-name to assign a custom display name visible in the console.

The job starts processing immediately after creation. The response includes the batch ID and initial status of VALIDATING:

{
  "id": "batch_46f310d4-8e1c-41a6-a9dc-15f66a7be0d2",
  "object": "BATCH",
  "endpoint": "/v1/chat/completions",
  "input_file_id": "file_50718733-4ef8-4e25-9fd5-92d8475454b9",
  "completion_window": "24h",
  "status": "VALIDATING",
  "output_file_id": "",
  "error_file_id": "",
  "created_at": 1771948236,
  "request_counts": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "metadata": {
    "cast.ai/batch-name": "My content translation batch"
  }
}
⚠️

Warning

If the job does not complete within 24 hours, processing terminates and the job status becomes expired. Partial results remain available in the output file.

Step 4: Track job progress

Via API

Poll the batch endpoint to monitor progress. The request_counts field updates periodically during execution:

curl http://localhost:8080/openai/v1/batches/<batch-id> \
  --header 'Authorization: Bearer <YOUR_API_KEY>'

Key fields in the response:

FieldDescription
statusCurrent job status. See Batch job lifecycle for all values.
request_countsObject with total, completed, and failed counts.
output_file_idFile ID for the results. Available once processing begins.
error_file_idFile ID containing only failed request entries.
usageToken counts: input_tokens, output_tokens, total_tokens.

For the full response schema, see the Batch API reference.

Via console

Navigate to AI Enabler > Batch jobs in the Cast AI console. The list view shows all batch jobs with their status, progress, duration, and cost.

Click a job to open the detail drawer, which includes:

  • Status and timestamps (created, started)
  • GPU type and instance type used for processing
  • Progress breakdown with completed requests, errors, and remaining requests
  • Duration and estimated time remaining
  • Links to input, output, and error files (requires GCS bucket access permissions)
  • Cost (total and per 1M tokens)
  • Token statistics (input, output, total, average per request)

Via kubectl

You can also locate the batch job processor running in your cluster:

kubectl get jobs -n castai-agent | grep castai-ai-optimizer-batch-processor

The processor is a short-lived Kubernetes job that restarts automatically if interrupted and terminates upon completion.

Step 5: Download the output file

Once the job reaches completed status (or while it's still in_progress for partial results), download the output file using the file ID from the batch response:

curl http://localhost:8080/openai/v1/files/<OUTPUT_FILE_ID>/content \
  --header 'Authorization: Bearer <YOUR_API_KEY>' \
  --output results.jsonl

Each line in the output file is a JSON object corresponding to one input request:

{"id": "batch_req_123", "custom_id": "request-1", "response": {"status_code": 200, "request_id": "req_123", "body": {"id": "chatcmpl-abc", "object": "chat.completion", "created": 1711652795, "model": "llama3.2:1b", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello."}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 22, "completion_tokens": 2, "total_tokens": 24}}}, "error": null}

Lines are not guaranteed to appear in input order. Use the custom_id field to match results to requests.

You can also download output files from the console. Open the batch job detail drawer and click the output file link. Error files are available the same way through the error file link. Both links point to objects in your GCS bucket, so you need the appropriate bucket access permissions.

If the batch had any request-level errors, the error file contains only the failed request entries. It follows the same JSONL structure, with the error field populated and response set to null.

Cancel a batch job

You can cancel a running batch job at any time. This stops processing immediately, including in-flight requests, and tears down the job deployment:

curl -X POST http://localhost:8080/openai/v1/batches/<BATCH_ID>/cancel \
  --header 'Authorization: Bearer <YOUR_API_KEY>'

You can also cancel a batch job from the console. Open the batch job detail drawer, click the overflow menu (three dots), and select Cancel.

Cancellation may take up to 10 minutes to complete. During this time the job status is cancelling, after which it becomes cancelled.

Partial output files remain available after cancellation. You cannot restart a cancelled job — create a new one instead.

Delete files

Remove input or output files you no longer need:

curl -X DELETE http://localhost:8080/openai/v1/files/<FILE_ID> \
  --header 'Authorization: Bearer <YOUR_API_KEY>'

Files uploaded with purpose: "batch" expire automatically after 30 days unless a custom TTL was specified.

See also