Getting started
The AI Enabler Proxy allows you to route requests to the best and cheapest Large Language Model (LLM). This guide provides instructions on how to configure and use the AI Enabler Proxy.
AI Enabler Proxy is a feature that allows you to route requests to different Large Language Model (LLM) providers based on complexity and associated cost.
You can run the AI Enabler Proxy in your Kubernetes cluster or use the one on the CAST AI platform. In both cases, the Proxy expects the request to follow the OpenAI API contract described in the OpenAI API Reference documentation. The response will also follow the OpenAI API contract.
The only supported endpoint is the/openai/v1/chat/completions
, which mimics the OpenAI's /v1/chat/completions
endpoint.
Streaming
The API fully supports both streaming and non-streaming responses.
To enable streaming, simply add "stream": true
to your request body. When streaming is enabled, you'll receive the response as a stream of data, following the same format as OpenAI's streaming responses.
Example request with streaming enabled:
curl https://llm.cast.ai/openai/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'Authorization: Bearer $CASTAI_API_KEY' \
-X POST -d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "What kind of instance types to use in GCP for running an AI training model?"
}
],
"stream": true
}'
Supported providers
You can find the list of supported LLM providers and their supported models here. CAST AI can proxy requests to any provider and model combination from this list once they are registered.
Register LLM providers
To enable the AI Enabler Proxy to route your requests to the appropriate LLM provider, you must register the providers you want to use (e.g., OpenAI, Gemini, Groq, Azure).
To register the LLM providers, make a POST
request to the relevant CAST AI API endpoint. Below is an example of OpenAI, Azure, Gemini, and VertexAI providers being registered, specifying authentication, available models, and provider-specific parameters.
curl https://api.cast.ai/v1/llm/providers \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'X-API-Key: $CASTAI_API_KEY' \
-X POST -d '{
"providers": [
{
"name": "openai-gpt3.5",
"supportedProvider": "OPENAI",
"apiKey": "<openai-api-key-1>",
"models": ["gpt-3.5-turbo-0125"]
},
{
"name": "openai-gpt4+",
"supportedProvider": "OPENAI",
"apiKey": "<openai-api-key-2>",
"models": ["gpt-4o-2024-05-13", "gpt-4-0613"]
},
// Azure OpenAI configuration
{
"name": "azure-provider",
"supportedProvider": "AZURE",
"url": "https://something-azure-openai.openai.azure.com",
"apiKey": "<azure-api-key>",
"apiVersion": "2024-02-01",
"models": ["gpt-3.5-turbo-0125", "gpt-3.5-turbo-0301", "gpt-4o"],
"isHosted": true
},
// Google's Gemini API configuration
{
"name": "gemini-api-provider",
"supportedProvider": "GEMINI",
"apiKey": "<gemini-api-key>",
"models": ["gemini-1.5-flash", "gemini-1.5-pro"]
},
// Google Cloud Vertex AI Gemini configuration
{
"name": "vertex-ai-gemini-provider",
"supportedProvider": "VERTEXAIGEMINI",
"apiKey": "<gcloud-access-token>",
"models": ["gemini-1.5-flash", "gemini-1.5-pro"],
"url": "https://us-central1-aiplatform.googleapis.com/v1/projects/some-project/locations/us-central1",
"isHosted": true
}
]
}'
- Replace
$CASTAI_API_KEY
with your actual CAST AI API key, and<api_key>
with the API key for the provider you are registering. - Modify the
supportedProvider
field to match the provider you are registering. - Specify the models you want to use for each provider in the
models
array. - The
isHosted
field specifies whether the LLM Provider is hosted on the user side and should be picked over the non-hosted ones.
Note that you may register a single Provider multiple times. For instance, you can have an OpenAI Provider per OpenAI API Key to limit the models that can be used by each API Key.
Note
The Provider API Keys are not stored on the CAST AI side. They are securely stored in a Secret Vault and accessed only when proxying/routing requests. CAST AI stores only the last 4 characters of each used API Key for reporting purposes.
Configure the Proxy
To configure the Proxy's behavior, such as enabling request routing and prompt sharing, follow these steps:
- Make a
PUT
request to the CAST AI API endpoint for updating proxy settings:
curl https://api.cast.ai/v1/llm/settings \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'X-API-Key: $CASTAI_API_KEY' \
-X PUT -d '{"promptSharingEnabled": true, "routingEnabled": true, "apiKey": <cast-ai-api-key>}'
- Set
promptSharingEnabled
totrue
for CAST AI to store the prompts and allow you to provide feedback on prompt categorization and response quality. This feedback is used to improve the Proxy's decision-making. - Set
routingEnabled
totrue
to enable request routing to the registered providers. If set tofalse
, requests can only be proxied to OpenAI. No other Provider is supported for proxying. - (Optional) Set the
apiKey
to the CAST AI API Key, which should have these settings configured. IfapiKey
is unset, the settings will be organization-wide.
Make requests to the Proxy
To start making requests to the AI Enabler Proxy running on the CAST AI platform, follow these steps:
- Generate an API Access Key from your CAST AI account.
- Include the API Access Key in the
X-API-Key
header or theAuthorization
header with theBearer
schema when making requests to the Proxy endpoint. - Make a POST request to the Proxy endpoint with the desired payload:
curl https://llm.cast.ai/openai/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'Authorization: Bearer $CASTAI_API_KEY' \
-X POST -d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "What kind of instance types to use in GCP for running an AI training model?"
}
]
}'
curl https://llm.cast.ai/openai/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'X-API-Key: $CASTAI_API_KEY' \ \
-X POST -d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "What kind of instance types to use in GCP for running an AI training model?"
}
]
}'
Modify the request payload as needed, following the OpenAI API Reference documentation.
Note
You can specify any model that you've registered, and CAST AI will route the request to the appropriate provider.
Supported endpoints
Different tools and integrations may require different base URLs for the AI Enabler Proxy. Here's a table of known endpoint requirements:
Tool/Integration | Base URL |
---|---|
Default | https://llm.cast.ai/openai/v1/chat/completions |
LangChain | https://llm.cast.ai/openai/v1 |
MemGPT | https://llm.cast.ai/openai |
Please verify the correct endpoint for your specific use case. This list will be updated as more tool-specific requirements are discovered.
Run the AI Enabler Proxy in-cluster
If you prefer to run the AI Enabler Proxy in your own Kubernetes cluster, follow these steps:
- Install the AI Enabler Proxy using Helm:
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
helm upgrade --install castai-ai-optimizer-proxy castai-helm/castai-ai-optimizer-proxy \
-n castai-agent --create-namespace \
--set castai.apiKey=<CASTAI_API_KEY>,castai.clusterID=<CLUSTER_ID>,castai.apiURL=https://api.cast.ai
Replace <CASTAI_API_KEY>
with your actual CAST AI API key and <CLUSTER_ID>
with the ID of your Kubernetes cluster.
- Make requests to the in-cluster Proxy endpoint. The requests to the proxy are the same, except that you no longer need to provide any authorization header with the CAST AI API Key. If you have a pod running in the same cluster, you can access the Proxy like so:
curl http://castai-ai-optimizer-proxy.castai-agent.svc.cluster.local:443/openai/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-v -X POST -d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "How to use golang generics?"
}
]
}'
Ensure you have registered the providers and adjusted the proxy settings on the CAST AI platform as described in prior sections.
Updated about 2 months ago