Hosted components
Cast AI components hosted on customer clusters.
The Cast AI cluster connection process installs several components into a customer's cluster in phases, providing different levels of functionality:
- Phase 1: Provides visibility into connected clusters without the ability to tune them. This phase operates in a read-only mode.
- Phase 2: Enables full functionality of the Cast AI platform, primarily for cluster optimization. In this phase, Cast AI can instruct clusters and Cloud Providers to reorganize resources for optimal performance.
Phase 1 components
Phase 1 provides visibility into connected clusters, but does not allow for modification. This phase operates in read-only mode and installs the following components:
» kubectl get pods -n castai-agent
NAME READY STATUS RESTARTS AGE
castai-agent-7f9d7ff65b-8qm7p 1/1 Running 0 78m
castai-agent-cpvpa-56f749fb-n2wzp 1/1 Running 0 22d
castai-spot-handler-44shj 1/1 Running 0 43m- The Cast AI Kubernetes Agent sends cluster state data (snapshots) to the Cast AI SaaS platform.
- The Cluster Proportional Vertical Autoscaler adjusts allocated resources for
castai-agentPods based on a predefined formula. - The Spot Handler monitors Spot Instance interruption events from cloud providers and reports them to Cast AI. This data improves Cast AI's Spot reliability and interruption prediction models. Spot Handler does not take any action on nodes or workloads.
Phase 2 autoscaling components
When a connected cluster is promoted to Phase 2 by enabling automation, Cast AI installs additional components to support this automated cluster management and feature delivery:
❯ kubectl get pods -n castai-agent
NAME READY STATUS RESTARTS AGE
castai-agent-7f9d7ff65b-8qm7p 1/1 Running 0 80m
castai-agent-7f9d7ff65b-kf2zp 1/1 Running 0 5h7m
castai-agent-cpvpa-56f749fb-n2wzp 1/1 Running 0 22d
castai-cluster-controller-757997ff6c-r6x25 1/1 Running 0 27d
castai-cluster-controller-757997ff6c-xw54g 1/1 Running 0 27d
castai-evictor-5684748495-kl2q4 1/1 Running 0 22d
castai-kvisor-787c5dd946-gmzs5 1/1 Running 0 6d18h
castai-spot-handler-44shj 1/1 Running 0 43m
castai-live-controller-6c89d5f7d9-xyz12 1/1 Running 0 2h15m
castai-pod-mutator-7b4f6d9c5a-abc23 1/1 Running 0 4h20m
castai-pod-pinner-56d9f8c7b2-def45 1/1 Running 0 3h15m- The Cluster Controller executes actions received from the central platform, such as accepting newly created nodes into the cluster and managing Container Live Migration operations.
- The Evictor removes pods from underutilized nodes to reduce the overall number of cluster nodes. When Container Live Migration is enabled, Evictor automatically attempts to live-migrate eligible workloads before falling back to traditional eviction.
- The Live Controller (AWS EKS only) manages Container Live Migration operations, including workload eligibility assessment, migration orchestration, and specialized VPC CNI management. This component is installed automatically during Phase 2 onboarding.
- The Pod Mutator modifies pod specifications for improved efficiency, implementing optimizations like GPU driver injection and resource adjustments.
- The Pod Pinner controls pod placement for optimal resource usage, ensuring workloads are placed on appropriate nodes.
Phase 2 workload autoscaling components
- The Workload Autoscaler dynamically adjusts workload resource requests based on actual usage patterns.
- The Workload Autoscaler Exporter collects workload metrics from your cluster to support recommendation generation. It is installed automatically alongside the Workload Autoscaler.
Phase 2 security components
- Kvisor enables image vulnerability scanning, Kubernetes YAML manifest linting, and other security and networking features offered by Cast AI. You will find more information in the Kvisor documentation.
- The Audit Logs Receiver captures cluster events for analysis and compliance reporting.
Additional components
AI Enabler
- The AI Enabler Proxy routes LLM requests to the most appropriate provider based on cost and performance. See AI Enabler.
Database optimization
- The DB Optimizer monitors database performance and provides cost optimization recommendations. See Database Optimizer.
Reporting
- The GPU Metrics Exporter captures GPU usage metrics for specialized compute workloads.
- The Egressd Exporter (deprecated) collects network traffic information for visibility and optimization. It has been replaced by Kvisor, which offers all of the capabilities that Egressd used to offer and much more.
OMNI
When OMNI is enabled for cluster extension to other regions and cloud providers, additional components are deployed in the castai-omni namespace:
- OMNI Agent - Manages edge location connections and node provisioning
- Liqo components - Enable multi-cluster topology and virtual node functionality
- liqo-controller-manager
- liqo-crd-replicator
- liqo-fabric
- liqo-ipam
- liqo-metric-agent
- liqo-proxy
- liqo-webhook
See OMNI Overview for more details about extending your cluster to other regions and cloud providers.
Component upgrade methods
Cast AI components installed in your cluster are upgraded using different methods. Understanding which components upgrade automatically versus those requiring manual intervention helps maintain optimal cluster operation.
The table below outlines the upgrade method for each Cast AI component:
| Product | Component | Upgrade Method | Frequency | Description |
|---|---|---|---|---|
| Cluster Autoscaling | Agent | Manual* | N/A | Must be manually upgraded by running the upgrade script or the Helm command * See "Automatic upgrades" section below |
| Evictor | Auto* | Upon new release | Automatically upgraded by Cast AI as soon as new versions are available * See "Automatic upgrades" section below | |
| Spot-handler | Manual* | N/A | Must be manually upgraded using the helm command * - See "Automatic upgrades" section below | |
| Cluster Controller | Manual* | Manual process | Cluster Controller updates are handled through a manual process by Cast AI. | |
| Pod Pinner | Auto | Upon new release | Automatically upgraded by Cast AI as soon as new versions are available | |
| Pod Mutator | Manual | N/A | Must be manually upgraded using the helm command | |
| Live Controller | Manual | N/A | Must be manually upgraded using the helm command | |
| Workload Autoscaling | Workload Autoscaler | Manual | N/A | Must be manually upgraded using the helm command |
| Workload Autoscaler Exporter | Manual | N/A | Upgraded together with the Workload Autoscaler using the helm command | |
| Security | kvisor | Manual | N/A | Must be manually upgraded using the helm command |
audit-logs-receiver | Manual | N/A | Must be manually upgraded using the helm command | |
| Reporting | gpu-metrics-exporter | Manual | N/A | Must be manually upgraded using the helm command |
| Egressd exporter | Manual | N/A | Must be manually upgraded using the helm command | |
| AI Enabler | ai-optimizer-proxy | Manual | N/A | Must be manually upgraded using the helm command |
| Database Optimization | db-optimizer | Manual | N/A | Must be manually upgraded using the helm command |
| OMNI | omni-agent | Manual | N/A | Must be manually upgraded using the helm command |
| Liqo components | Auto | N/A | Liqo components, being OMNI dependancies, are updated automatically when the omni-agent is updated. |
Automatic upgrades
Components marked as "Auto" are automatically upgraded by Cast AI to ensure you always have the latest features and security updates. These upgrades typically occur shortly after a new version is released. Cluster administrators do not need to take any action for these components.
While the cluster-controller can theoretically update itself by receiving an update action from Cast AI; these updates are managed through a manual internal process. However, it cannot update other components, such as castai-evictor, castai-spot-handler or castai-agent. You can explicitly bind a role, such as cluster-admin to the castai-cluster-controller service account. This will allow cluster-controller to manage all other Cast AI components automatically. For more details, visit the Cluster controller auto-update documentation.
Self-managed component options
For customers who prefer to manage their own update schedules, we provide self-managed installation options for several components:
- Evictor: Documentation for self-managed installation is available at Manually Install Evictor
- Pod Pinner: Documentation for self-managed installation is available at Self-Managed Pod Pinner
Self-managed components can be updated using tools like Argo CD or Helm on your preferred schedule, giving you greater control over your infrastructure.
Manual upgrades
Components marked as "Manual" require cluster administrators to perform upgrades when new versions are released. These upgrades can typically be performed using Helm commands or upgrade scripts provided in the component documentation.
Please refer to each component's dedicated documentation section for detailed instructions for manually upgrading components.
NoteAlways check the release notes before upgrading manually updated components to understand potential impacts and required actions.
Updated about 21 hours ago
