Supported LLM providers
The AI Enabler Proxy integrates with various Large Language Model (LLM) providers, enabling efficient request routing based on complexity and cost. This document outlines how to discover supported providers and their available models for routing and proxying.
Getting current provider information
Cast AI supports a comprehensive range of LLM providers, including both external API services and self-hosted deployment options. Since provider support and available models are continuously expanding, we recommend using our API to get the most current information.
Using the supported providers API
To get the most up-to-date list of supported providers and their available models, use the /v1/llm/openai/supported-providers API endpoint:
curl --request GET \
--url https://api.cast.ai/v1/llm/openai/supported-providers \
--header 'X-API-Key: $CASTAI_API_KEY' \
--header 'accept: application/json'This endpoint returns a comprehensive list that includes:
- Provider identifiers and their complete model catalogs
- Detailed model specifications including token limits and transparent pricing
- Routing capabilities indicating which models support intelligent routing
- Supported modalities (text, image, embedding types)
- Model types (chat, embedding) and deployment options
Response structure
The API response provides detailed information about each supported provider and their models:
{
"supportedProviders": [
{
"provider": "openai",
"models": [
{
"name": "gpt-4o-2024-05-13",
"maxInputTokens": 128000,
"promptPricePerMilTokens": "5",
"completionPricePerMilTokens": "15",
"isRoutable": true,
"modalities": [
"text",
"image"
],
"type": "chat"
}
],
"pricingUrl": "https://openai.com/pricing",
"websiteUrl": "https://openai.com",
"rateLimitsPerModel": true
},
{
"provider": "hosted_vllm",
"models": [
{
"name": "llama3.1:8b",
"maxInputTokens": 128000,
"promptPricePerMilTokens": "0",
"completionPricePerMilTokens": "0",
"isRoutable": true,
"modalities": [
"text"
],
"type": "chat"
}
],
"pricingUrl": "https://docs.vllm.ai/en/latest",
"websiteUrl": "https://docs.vllm.ai/en/latest",
"rateLimitsPerModel": false
}
]
}Response fields explained:
| Field | Description |
|---|---|
| provider | The provider identifier (e.g., "openai", "anthropic", "hosted_vllm") |
| models | An array of available models with detailed specifications |
| ├─ name | Model identifier used in API requests |
| ├─ maxInputTokens | Maximum input context length |
| ├─ promptPricePerMilTokens | Cost per million input tokens in USD |
| ├─ completionPricePerMilTokens | Cost per million output tokens in USD |
| ├─ isRoutable | Whether the model can be used for routing |
| ├─ modalities | Array of supported input types ("text", "image") |
| └─ type | Model category ("chat", "embedding") |
| pricingUrl | Link to the provider's official pricing information |
| websiteUrl | Provider's main website |
| rateLimitsPerModel | Whether rate limits are applied per model (boolean) |
Provider types and deployment options
AI Enabler integrates with two main categories of AI providers to give you flexibility in how you access and deploy models.
External API providers like OpenAI, Google Gemini, Anthropic, and Mistral operate as traditional cloud services. You provide an API key, make requests, and pay per token used. These providers offer the latest models with minimal setup but charge based on usage volume.
Self-hosted deployments run models directly in your infrastructure using Cast AI's hosting and autoscaling capabilities. The hosted_vllm provider deploys models in your Kubernetes cluster. These options show $0 per-token pricing since you're paying for compute resources instead of API throughput.
Routing and intelligence varies by model. Models marked as routable work with AI Enabler's intelligent routing, which automatically selects the best model for each request based on complexity and cost. Non-routable models can only be accessed directly, but still benefit from AI Enabler's unified endpoint and monitoring.
Modalities and capabilities differ across providers. Most models handle text conversations, while newer models also process images. Specialized embedding models can convert text into numerical representations for semantic search and similarity tasks.
Next steps
Once you've identified the providers and models you want to use:
- Register your providers using the provider registration API
- Configure your proxy settings to enable routing and other features
- Start making requests to the AI Enabler Proxy endpoint
For detailed setup instructions, see the getting started guide.
Stay updatedProvider support and available models are regularly updated. We recommend checking the supported providers API endpoint periodically.
Updated 9 days ago
