Cluster controller
The Cluster Controller is responsible for handling specific Kubernetes actions, including draining and deleting nodes, adding labels, and approving CSR requests. It's open source and can be found on Github.
Install Cluster Controller
The Cluster Controller is installed during Phase 2 (automation) cluster onboarding using the Helm chart available at https://github.com/castai/helm-charts/tree/main/charts/castai-cluster-controller.
Installation methods:
-
Via Cast AI Operator (recommended): When onboarding new clusters with automation enabled (Phase 2), the Cast AI Operator automatically installs and manages Cluster Controller when
extendedPermissions:trueis set. This ensures you receive updates through the Cast AI console without running additional scripts. -
Via Helm (manual): If Cluster Controller was uninstalled or you're managing an older cluster without the Operator, you can reinstall it manually using the instructions below.
Manual installation
If Cluster Controller was uninstalled or you need to reinstall it manually:
Add the Cast AI helm charts repository:
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo updateYou can list all available components and versions:
helm search repo castai-helmExpected example output:
NAME CHART VERSION APP VERSION DESCRIPTION
castai-helm/castai-agent 0.18.0 v0.23.0 CAST AI agent deployment chart.
castai-helm/castai-cluster-controller 0.17.0 v0.14.0 CAST AI cluster controller deployment chart.
castai-helm/castai-evictor 0.10.0 0.5.1 Cluster utilization defragmentation tool
castai-helm/castai-spot-handler 0.3.0 v0.3.0 CAST AI spot handler daemonset chart.Now let's install it.
helm upgrade --install cluster-controller castai-helm/castai-cluster-controller -n castai-agent \
--set castai.apiKey=<your-api-token> \
--set castai.clusterID=<your-cluster-id>Upgrade Cluster Controller
The Cluster Controller supports auto-update out of the box and is enabled by default. However, sometimes, due to changes in RBAC, it cannot be updated and requires a manual upgrade.
Upgrade to the latest version:
# requires helm eq or above 3.14.0
helm repo update
helm upgrade cluster-controller castai-helm/castai-cluster-controller --reset-then-reuse-values -n castai-agent Troubleshooting
Check Cluster Controller logs:
kubectl logs -l app.kubernetes.io/name=castai-cluster-controller -n castai-agentThrottling due to rate limiting
The Cluster Controller implements a client-side rate limiter to regulate requests to the Kubernetes API server, preventing excessive load on the control plane. This rate limiter uses the token bucket algorithm with default settings designed for typical cluster environments.
While the default configuration works well for most deployments, large or highly dynamic clusters may experience performance issues due to conservative rate limits. You can adjust the rate-limiting parameters if you observe throttling-related delays in a cluster with an appropriately scaled control plane.
Adjust rate limit settings
Modify the following environment variables in the Cluster Controller deployment:
# Requests per second rate limit (tokens replenished per second). This should be higher than the observed continuous load
KUBECLIENT_QPS=<number>
# Maximum allowed burst of requests. This can be adjusted to match extreme spikes or expected bursts of operations
KUBECLIENT_BURST=<number>When to adjust settings
Consider increasing these values when:
- The cluster has a scaled-up control plane capable of handling higher throughput
- Logs show frequent throttling messages
- Operations that modify many resources (like large-scale rebalancing) are taking longer than expected
How to apply changes
When using Helm, you can set these values through the additionalEnv or envFrom parameters:
helm upgrade cluster-controller castai-helm/castai-cluster-controller -n castai-agent \
--reuse-values \
--set additionalEnv.KUBECLIENT_QPS=50 \
--set additionalEnv.KUBECLIENT_BURST=200For the complete default configuration, refer to the cluster-controller repository.
Automatic updates
By default, the cluster-controller component can update itself by receiving an update action (scheduled by Cast AI). It can also update other components, such as castai-evictor, castai-spot-handler or castai-agent as well with one caveat: the cluster-controller can't change permissions for other components (and for the cluster-controller itself either).
However, permission changes are sometimes required for new features. To make this possible, you can explicitly bind a role such as cluster-admin to a cluster-controller service account. This will allow the cluster-controller to manage other Cast AI components automatically without issue.
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: castai-cluster-controller-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: castai-cluster-controller
namespace: castai-agent
EOFUpdated 3 days ago
