Supported LLM providers

The AI Enabler Proxy integrates with various Large Language Model (LLM) providers, acting as a gateway. This document outlines how to discover supported providers and their available models and proxying.

Getting current provider information

Cast AI supports a comprehensive range of LLM providers, including both external API services and self-hosted deployment options. Since provider support and available models are continuously expanding, we recommend using our API to get the most current information.

Using the supported providers API

To get the most up-to-date list of supported providers and their available models, use the /v1/llm/providers API endpoint:

curl --request GET \
     --url https://api.cast.ai/v1/llm/openai/supported-providers \
     --header 'X-API-Key: $CASTAI_API_KEY' \
     --header 'accept: application/json'

This endpoint returns a comprehensive list that includes:

  • Provider identifiers and their complete model catalogs
  • Detailed model specifications including token limits and transparent pricing
  • Supported modalities (text, image, embedding types)
  • Model types (chat, embedding) and deployment options

Response structure

The API response provides detailed information about each supported provider and their models:

{
  "supportedProviders": [
    {
      "provider": "openai",
      "models": [
        {
          "name": "gpt-4o-2024-05-13",
          "maxInputTokens": 128000,
          "promptPricePerMilTokens": "5",
          "completionPricePerMilTokens": "15",
          "modalities": [
            "text",
            "image"
          ],
          "type": "chat"
        }
      ],
      "pricingUrl": "https://openai.com/pricing",
      "websiteUrl": "https://openai.com",
      "rateLimitsPerModel": true
    },
    {
      "provider": "hosted_vllm",
      "models": [
        {
          "name": "llama3.1:8b",
          "maxInputTokens": 128000,
          "promptPricePerMilTokens": "0",
          "completionPricePerMilTokens": "0",
          "modalities": [
            "text"
          ],
          "type": "chat"
        }
      ],
      "pricingUrl": "https://docs.vllm.ai/en/latest",
      "websiteUrl": "https://docs.vllm.ai/en/latest",
      "rateLimitsPerModel": false
    }
  ]
}

Response fields explained:

FieldDescription
providerThe provider identifier (e.g., "openai", "anthropic", "hosted_vllm")
modelsAn array of available models with detailed specifications
├─ nameModel identifier used in API requests
├─ maxInputTokensMaximum input context length
├─ promptPricePerMilTokensCost per million input tokens in USD
├─ completionPricePerMilTokensCost per million output tokens in USD
├─ modalitiesArray of supported input types ("text", "image")
└─ typeModel category ("chat", "embedding")
pricingUrlLink to the provider's official pricing information
websiteUrlProvider's main website
rateLimitsPerModelWhether rate limits are applied per model (boolean)

Provider types and deployment options

AI Enabler integrates with two main categories of AI providers to give you flexibility in how you access and deploy models.

External API providers like OpenAI, Google Gemini, Anthropic, and Mistral operate as traditional cloud services. You provide an API key, make requests, and pay per token used. These providers offer the latest models with minimal setup but charge based on usage volume.

Self-hosted deployments run models directly in your infrastructure using Cast AI's hosting and autoscaling capabilities. The hosted_vllm provider deploys models in your Kubernetes cluster. These options show $0 per-token pricing since you're paying for compute resources instead of API throughput.

Modalities and capabilities differ across providers. Most models handle text conversations, while newer models also process images. Specialized embedding models can convert text into numerical representations for semantic search and similarity tasks.

Next steps

Once you've identified the providers and models you want to use:

  1. Register your providers using the provider registration API
  2. Start making requests to the AI Enabler Proxy endpoint

For detailed setup instructions, see the getting started guide.

🔄

Stay updated

Provider support and available models are regularly updated. We recommend checking the supported providers API endpoint periodically.