AI Enabler settings

Learn how to configure AI Enabler settings to control LLM proxying behavior, optimize costs, and manage prompt data.

Service account token scopes

Service account tokens in Cast AI can be scoped at two levels: organization or cluster. For AI Enabler, the token scope affects which providers and models are accessible, and how usage is tracked.

Understanding token scopes

Token scopeDescriptionSupported model types
Organization-scopedGrants access to all AI Enabler resources across your organization. This is the default scope for service accounts and is required for SaaS provider models.SaaS providers (OpenAI, Anthropic, etc.) and hosted models
Cluster-scopedRestricts access to AI Enabler resources associated with a specific cluster. Use this for stricter environment separation, such as different tokens for staging versus production.Hosted/self-hosted models only

SaaS provider models (OpenAI, Anthropic, Google Gemini, Mistral, and others) always require organization-scoped tokens because they operate at the organization level. Cluster-scoped tokens are only supported for hosted model deployments running in your Kubernetes clusters.

📘

Note

To create a cluster-scoped service account, navigate to Manage organization > Access control > Service Accounts and select specific cluster access when configuring the service account. See Creating service accounts for detailed instructions.

Behavior with cluster-scoped tokens

When you use a cluster-scoped token with AI Enabler, the following behaviors apply:

Provider filtering

Requests made with cluster-scoped tokens only have access to providers and models associated with that specific cluster. For hosted model deployments, this means:

  • You only see hosted models deployed to that cluster
  • Provider lists returned by the API are filtered to show cluster-relevant options
  • Attempts to access models from other clusters will fail

Usage tracking

The AI Enabler proxy correctly attributes usage to the appropriate cluster when using cluster-scoped tokens. This ensures accurate cost reporting and analytics per cluster.

Settings endpoint

The settings endpoint (/v1/llm/settings) works with both organization-scoped and cluster-scoped tokens. When using a cluster-scoped token:

  • Settings queries return the configuration relevant to that cluster
  • Settings updates apply to the cluster context

Fallback model behavior

When a hosted model has a SaaS provider configured as a fallback, the fallback continues to work even when using cluster-scoped tokens. This is because:

  1. The primary request uses your cluster-scoped token to access the hosted model
  2. If the hosted model is unavailable (hibernating, scaling, or erroring), the system routes to the fallback
  3. The SaaS fallback operates at the organization level, using the organization context associated with your cluster

This ensures service continuity without requiring separate organization-scoped tokens for fallback scenarios.

📘

Note

Fallback routing to SaaS providers is transparent to your application. The cluster-scoped token handles authentication, and the system manages the fallback routing internally.

API Reference

For developers looking to override settings programmatically, here are the available request headers:

HeaderTypeValuesDescription
X-Provider-NameStringProvider nameRoute the request to a specific registered provider. The provider must belong to your Cast AI organization.

Remember that header overrides have the highest priority and will take precedence over both API key settings and organization settings for the specific request.

Troubleshooting

Settings changes not taking effect

Check the following:

  • Look for request headers that might be overriding your settings
  • Verify the API key you're using doesn't have overriding settings that conflict with your expectations
  • Verify the API key you're using is scoped to the correct resource (cluster, organization)
  • Allow a few minutes for changes to propagate

Remember the priority order: Headers → API Key Settings → Organization Settings. A higher-priority setting will always override a lower-priority one.

For additional assistance with AI Enabler, contact Cast AI support or visit our community Slack channel.