Hosted components

Cast AI components installed in your Kubernetes cluster and how they map to the umbrella chart operating modes.

When you connect a cluster, Cast AI installs several components into it. Which components are present depends on the operating mode you chose during onboarding. The umbrella Helm chart (castai-helm/castai) uses tag-based modes to control the component set:

ModeTagWhat it provides
Read-onlytags.readonly=trueCost visibility and security telemetry
Workload Autoscalertags.workload-autoscaler=trueRight-sizing of workload resource requests
Node Autoscalertags.node-autoscaler=trueAutomated node provisioning and bin-packing
Fulltags.full=trueNode Autoscaler + Workload Autoscaler combined
📘

Installation methods

These components are installed automatically when you connect a cluster using castctl, the Cast AI console, or the umbrella Helm chart. You do not need to install them individually unless you have a specific reason to do so.

Components by mode

The table below shows which components each mode installs.

Componentreadonlyworkload-autoscalernode-autoscalerfull
castai-agentYesYesYesYes
castai-spot-handlerYesYesYesYes
castai-kvisorYesYesYesYes
castai-cluster-controllerYesYesYes
castai-evictorYesYesYes
castai-pod-mutatorYesYesYes
castai-workload-autoscalerYesYes
castai-workload-autoscaler-exporterYesYes
castai-pod-pinnerYesYes
castai-live (Container Live Migration)YesYes

You can also enable or disable any component individually using autoscaler.<component>.enabled overrides in your Helm values. Explicit overrides always take precedence over the mode tag.

Shared base components

These components are installed in every mode, including read-only.

  • castai-agent sends cluster state snapshots to the Cast AI platform.
  • castai-spot-handler monitors Spot Instance interruption events from your cloud provider and reports them to Cast AI. This data improves Spot reliability and interruption prediction models. Spot Handler does not take any action on nodes or workloads. Learn more in the Spot Handler documentation.
  • castai-kvisor is a security and telemetry agent. In read-only mode it collects network traffic flows, resource usage statistics, GPU metrics, and storage-related metrics. When security features are enabled, it additionally performs image vulnerability scanning and Kubernetes manifest linting. See the Kvisor documentation.
📘

Operator-managed components

When clusters are onboarded using the Cast AI Operator, castai-agent and castai-spot-handler are installed and managed automatically. Updates can be applied through the Cast AI console without running additional scripts.

Autoscaling components

These components are added in the workload-autoscaler, node-autoscaler, and full modes.

  • castai-cluster-controller executes actions received from the Cast AI platform, such as accepting newly created nodes into the cluster and managing Container Live Migration operations. See the Cluster Controller documentation.
  • castai-evictor removes pods from underutilized nodes to reduce the overall number of cluster nodes. When Container Live Migration is enabled, Evictor automatically attempts to live-migrate eligible workloads before falling back to traditional eviction. See Evictor.
  • castai-pod-mutator modifies pod specifications for improved efficiency, including GPU driver injection and resource adjustments. See Pod mutations.

Workload Autoscaler components

These components are added in the workload-autoscaler and full modes.

  • castai-workload-autoscaler dynamically adjusts workload resource requests based on actual usage patterns. See Workload Autoscaler.
  • castai-workload-autoscaler-exporter collects workload metrics from your cluster to support recommendation generation. It is installed automatically alongside the Workload Autoscaler.

Node Autoscaler components

These components are added in the node-autoscaler and full modes.

  • castai-pod-pinner controls pod placement for optimal resource usage, ensuring workloads are placed on appropriate nodes. See Pod Pinner.
  • castai-live (Container Live Migration controller) manages live migration operations, including workload eligibility assessment, migration orchestration, and specialized VPC CNI management. The umbrella chart installs this component in node-autoscaler and full modes, but Container Live Migration must also be enabled in your node templates before it becomes active. Supported on EKS, GKE (partial support), and AKS (partial support). See Container Live Migration for requirements and limitations.

Karpenter integration

For clusters using Karpenter as their node provisioner, an additional component replaces the standard node autoscaling path.

  • castai-kentroller coordinates Cast AI optimization and automation features with Karpenter. This component is installed automatically when a Karpenter-managed cluster is connected through the onboarding script or castctl. See Karpenter Enterprise suite.

Additional components

These components serve specialized use cases and are enabled independently of the mode tags.

  • castai-db-optimizer monitors database performance and provides cost optimization recommendations. See Database Optimizer.
  • castai-audit-logs-receiver captures cluster events for analysis and compliance reporting. See Audit Logs Receiver.

OMNI components

When OMNI is enabled for cluster extension to other regions and cloud providers, additional components are deployed in the castai-omni namespace:

  • OMNI Agent manages edge location connections and node provisioning.
  • Liqo components enable multi-cluster topology and virtual node functionality (controller-manager, crd-replicator, fabric, ipam, metric-agent, proxy, webhook).

Component upgrade methods

Cast AI components are upgraded using different methods. The table below outlines the upgrade approach for each component.

CategoryComponentUpgrade methodNotes
Corecastai-agentManualUpgrade via helm upgrade or the upgrade script
castai-spot-handlerManualUpgrade via helm upgrade
castai-kvisorManualUpgrade via helm upgrade
Autoscalingcastai-cluster-controllerAutoCan self-update via actions from the Cast AI platform
castai-evictorAutoAutomatically upgraded when new versions are available
castai-pod-mutatorManualUpgrade via helm upgrade
castai-pod-pinnerAutoAutomatically upgraded when new versions are available
castai-liveManualUpgrade via helm upgrade
Workload Autoscalingcastai-workload-autoscalerManualUpgrade both charts together
castai-workload-autoscaler-exporterManualUpgraded alongside the Workload Autoscaler
Karpentercastai-kentrollerManualUpgrade via helm upgrade
Othercastai-db-optimizerManualUpgrade via helm upgrade
castai-audit-logs-receiverManualUpgrade via helm upgrade
castai-omni-agentManualLiqo components update automatically with the agent

Umbrella chart upgrades

When using the umbrella chart, upgrade all components at once:

helm repo update castai-helm
helm upgrade castai castai-helm/castai -n castai-agent --reset-then-reuse-values
📘

Note

The --reset-then-reuse-values flag requires Helm v3.14.0 or higher.

Automatic upgrades

Components marked as "Auto" are upgraded by the Cast AI platform without manual intervention. The castai-cluster-controller can also update other components (castai-evictor, castai-spot-handler, castai-agent) when it has sufficient permissions. By default, it cannot apply updates that require permission changes. You can bind a role such as cluster-admin to the castai-cluster-controller service account to enable it to manage all Cast AI components automatically. See the Cluster Controller auto-update documentation.

Self-managed component options

For environments that require control over update schedules, several components offer self-managed installation options:

Self-managed components can be updated using tools like Argo CD or Helm on your preferred schedule.

Next steps