Supported LLM providers

The AI Enabler Proxy integrates with various Large Language Model (LLM) providers, enabling efficient request routing based on complexity and cost. This document outlines how to discover supported providers and their available models for routing and proxying.

Getting current provider information

Cast AI supports a comprehensive range of LLM providers, including both external API services and self-hosted deployment options. Since provider support and available models are continuously expanding, we recommend using our API to get the most current information.

Using the supported providers API

To get the most up-to-date list of supported providers and their available models, use the /v1/llm/openai/supported-providers API endpoint:

curl --request GET \
     --url https://api.cast.ai/v1/llm/openai/supported-providers \
     --header 'X-API-Key: $CASTAI_API_KEY' \
     --header 'accept: application/json'

This endpoint returns a comprehensive list that includes:

  • Provider identifiers and their complete model catalogs
  • Detailed model specifications including token limits and transparent pricing
  • Routing capabilities indicating which models support intelligent routing
  • Supported modalities (text, image, embedding types)
  • Model types (chat, embedding) and deployment options

Response structure

The API response provides detailed information about each supported provider and their models:

{
  "supportedProviders": [
    {
      "provider": "openai",
      "models": [
        {
          "name": "gpt-4o-2024-05-13",
          "maxInputTokens": 128000,
          "promptPricePerMilTokens": "5",
          "completionPricePerMilTokens": "15",
          "isRoutable": true,
          "modalities": [
            "text",
            "image"
          ],
          "type": "chat"
        }
      ],
      "pricingUrl": "https://openai.com/pricing",
      "websiteUrl": "https://openai.com",
      "rateLimitsPerModel": true
    },
    {
      "provider": "hosted_vllm",
      "models": [
        {
          "name": "llama3.1:8b",
          "maxInputTokens": 128000,
          "promptPricePerMilTokens": "0",
          "completionPricePerMilTokens": "0",
          "isRoutable": true,
          "modalities": [
            "text"
          ],
          "type": "chat"
        }
      ],
      "pricingUrl": "https://docs.vllm.ai/en/latest",
      "websiteUrl": "https://docs.vllm.ai/en/latest",
      "rateLimitsPerModel": false
    }
  ]
}

Response fields explained:

FieldDescription
providerThe provider identifier (e.g., "openai", "anthropic", "hosted_vllm")
modelsAn array of available models with detailed specifications
├─ nameModel identifier used in API requests
├─ maxInputTokensMaximum input context length
├─ promptPricePerMilTokensCost per million input tokens in USD
├─ completionPricePerMilTokensCost per million output tokens in USD
├─ isRoutableWhether the model can be used for routing
├─ modalitiesArray of supported input types ("text", "image")
└─ typeModel category ("chat", "embedding")
pricingUrlLink to the provider's official pricing information
websiteUrlProvider's main website
rateLimitsPerModelWhether rate limits are applied per model (boolean)

Provider types and deployment options

AI Enabler integrates with two main categories of AI providers to give you flexibility in how you access and deploy models.

External API providers like OpenAI, Google Gemini, Anthropic, and Mistral operate as traditional cloud services. You provide an API key, make requests, and pay per token used. These providers offer the latest models with minimal setup but charge based on usage volume.

Self-hosted deployments run models directly in your infrastructure using Cast AI's hosting and autoscaling capabilities. The hosted_vllm provider deploys models in your Kubernetes cluster. These options show $0 per-token pricing since you're paying for compute resources instead of API throughput.

Routing and intelligence varies by model. Models marked as routable work with AI Enabler's intelligent routing, which automatically selects the best model for each request based on complexity and cost. Non-routable models can only be accessed directly, but still benefit from AI Enabler's unified endpoint and monitoring.

Modalities and capabilities differ across providers. Most models handle text conversations, while newer models also process images. Specialized embedding models can convert text into numerical representations for semantic search and similarity tasks.

Next steps

Once you've identified the providers and models you want to use:

  1. Register your providers using the provider registration API
  2. Configure your proxy settings to enable routing and other features
  3. Start making requests to the AI Enabler Proxy endpoint

For detailed setup instructions, see the getting started guide.

🔄

Stay updated

Provider support and available models are regularly updated. We recommend checking the supported providers API endpoint periodically.