Kubernetes permissions

Kubernetes Service Accounts and permissions used by Cast AI components.

Cast AI components running on your clusters use predefined service accounts and relevant permissions to perform functions such as sending data about the cluster state. This section discusses all required service accounts and permissions granted to Cast AI components.

Kubernetes Service Accounts used by Cast AI components

Each Cast AI component installed in your cluster uses a dedicated service account. Such a setup allows you to fine-tune permissions for each component:

» kubectl get serviceAccounts -n castai-agent
NAME                                   SECRETS   AGE
castai-agent                           0         97d
castai-ai-optimizer-proxy              0         55d
castai-aibrix-controller-manager       0         55d
castai-aws-node                        0         57d
castai-cluster-controller              0         97d
castai-evictor                         0         97d
castai-kvisor                          0         97d
castai-kvisor-controller               0         97d
castai-live-controller                 0         57d
castai-live-daemon                     3         57d
castai-live-patch-sa                   0         16d
castai-pod-mutator                     0         85d
castai-pod-pinner                      0         97d
castai-spot-handler                    0         97d
castai-workload-autoscaler             0         97d
castware-operator-controller-manager   0         3d16h
default                                0         97d

The Cast AI Operator's permissions

The Cast AI Operator manages the installation, configuration, and updating of Cast AI components. The Operator requires permissions to install and manage components through Helm charts, which include both read permissions to understand the cluster state and write permissions to create and update resources.

Privilege escalation prevention

The Operator follows Kubernetes privilege escalation prevention mechanisms. When creating or updating roles and cluster roles, the Operator's service account must already have all permissions contained in the resource being created or updated. This prevents the Operator from granting itself or other components permissions it doesn't already have.

Roles and cluster roles can only escalate privileges if the escalate verb is explicitly specified for the roles or clusterroles resource, which the Operator does not have.

Phase 1 permissions

Phase 1 permissions allow the Operator to install and manage the castai-agent component. These permissions include all the read-only permissions the agent needs, plus additional write permissions to manage the agent's lifecycle.

Namespace-wide permissions in castai-agent

Namespace-level permissions in the castai-agent namespace are lower risk than cluster-level permissions as they only affect Cast AI resources.

Permissions shared by both the agent and Operator:

API GroupResourcesVerbsDescription
coordination.k8s.ioleasescreate, get, list, watch, updateRequired for lease election
coreconfigmapsget, list, watchRequired for cost savings estimation features
appsdeploymentspatchRequired for proportional vertical cluster autoscaler to adjust castai-agent requests/limits. Scoped to castai-agent deployment only

Operator-only permissions in castai-agent namespace:

API GroupResourcesVerbsDescription
appsdeploymentscreate, delete, get, list, patch, updateRequired to create/update/delete agent deployment
coresecretscreate, delete, get, list, patch, update, watchRequired to create/update/delete secrets containing Helm release state and Operator webhook certificates
coreserviceaccountscreate, delete, get, list, patch, update, watchRequired to create/update/delete agent service accounts
coreservicescreate, delete, get, list, patch, update, watchRequired to create/update/delete Operator webhook services
policypoddisruptionbudgetscreate, delete, get, list, patch, update, watchRequired to create/update/delete agent pod disruption budget
rbac.authorization.k8s.iorolebindings, rolescreate, delete, get, list, patch, update, watchRequired to create/update/delete agent role and role binding
corepods/loggetRequired by the Operator to read agent logs and extract cluster ID
coreresourcequotascreate, delete, get, list, patch, update, watchRequired to create/update/delete agent resource quota
coreconfigmapscreate, update, patch, deleteRequired to create/update/delete agent configmap and webhook CA configmap
batchjobscreate, deleteRequired to run self-upgrade and post-uninstall jobs
coordination.k8s.ioleasescreate, update, patch, deleteRequired for Operator leader election
coreeventscreate, patchRequired for event recording

Operator-only permissions in kube-system namespace:

API GroupResourcesVerbsDescription
rbac.authorization.k8s.ioroles, rolebindingsdeleteRequired by the Operator to uninstall the agent

Resource names are scoped to castai-agent only.

Namespace-wide permissions in kube-system

API GroupResourcesVerbsDescription
coreconfigmapsget, list, watchRequired by the agent only in kube-system

Cluster-wide permissions

Cluster-level permissions are required by the agent to read resources in the cluster and create snapshots to send to the platform, and by the Operator to grant those permissions to the agent.

Permissions shared by both the agent and Operator (read-only):

API GroupResourcesVerbsDescription
corepods, nodes, replicationcontrollers, persistentvolumeclaims, persistentvolumes, services, namespaces, events, limitranges, resourcequotasget, list, watchRequired for cost savings estimation features
corenamespacesget
appsdeployments, replicasets, daemonsets, statefulsetsget, list, watch
storage.k8s.iostorageclasses, csinodesget, list, watch
batchjobs, cronjobsget, list, watch
autoscalinghorizontalpodautoscalersget, list, watch
metrics.k8s.iopodsget, list
policypoddisruptionbudgetsget, list, watch
(non-resource URL)/versionget
networking.k8s.ionetworkpolicies, ingressesget, list, watchRequired for security and k8s compliance reporting
rbac.authorization.k8s.ioroles, rolebindings, clusterroles, clusterrolebindingsget, list, watchRequired for security and k8s compliance reporting
karpenter.shprovisioners, machines, nodepools, nodeclaims, nodeoverlaysget, list, watchRequired for Karpenter resource monitoring
karpenter.k8s.awsawsnodetemplates, ec2nodeclassesget, list, watchRequired for Karpenter AWS resource monitoring
datadoghq.comextendeddaemonsetreplicasetsget, list, watchRequired for Datadog resource monitoring
argoproj.iorolloutsget, list, watchRequired for Argo Rollouts monitoring
autoscaling.cast.airecommendationsget, list, watchRequired for Cast AI autoscaling recommendations
pod-mutations.cast.aipodmutationsget, list, watchRequired for Cast AI pod mutation monitoring
resource.k8s.iodeviceclasses, resourceclaims, resourceclaimtemplates, resourceslices, devicetaintrulesget, list, watchRequired for Kubernetes resource management API monitoring
runbooks.cast.airecommendationsyncsget, list, watchRequired for Agentic AI workflows

Operator-only cluster permissions:

API GroupResourcesVerbsDescription
castware.cast.aiclusters, componentscreate, delete, get, list, patch, update, watchRequired to create/update/delete Operator custom resources
castware.cast.aiclusters/status, components/statusget, patch, updateRequired to update status of Operator custom resources
castware.cast.aiclusters/finalizers, components/finalizersupdateRequired to add finalizers to Operator custom resources
apiextensions.k8s.iocustomresourcedefinitionscreate, delete, get, list, patch, updateRequired to create/update/delete Operator CRDs during installation, runtime, and upgrades. Scoped to clusters.castware.cast.ai and components.castware.cast.ai only. The create, update, and patch permissions enable automatic CRD upgrades during Helm operations and allow the operator to self-manage CRD schemas.
rbac.authorization.k8s.ioclusterrolebindings, clusterrolesdeleteRequired to install/update/delete agent cluster role and cluster role binding if the agent is uninstalled. Scoped to castai-agent only
rbac.authorization.k8s.ioclusterrolebindings, clusterroles, rolebindings, rolescreate, patchRequired to create RBAC resources for managed components
admissionregistration.k8s.iomutatingwebhookconfigurations, validatingwebhookconfigurationsget, list, patch, update, watchRequired to create/update/delete Operator webhooks

Extended permissions (Phase 2)

Phase 2 permissions are optional and not required to install the agent. They're disabled by default and can be enabled by setting the Helm value extendedPermissions to true. These permissions are enabled automatically when running the install script for cluster-controller with the Operator.

Phase 2 permissions match those required by the Cluster Controller because the Operator cannot escalate privileges and needs the same permissions as the components it installs.

Namespace-wide permissions in castai-agent (Phase 2)

When Phase 2 permissions are enabled, both the Cluster Controller and Operator become namespace administrators for the castai-agent namespace:

API GroupResourcesVerbsDescription
***Full administrative permissions in castai-agent namespace

Cluster-wide permissions (Phase 2)

Cluster-level Phase 2 permissions are granted to both the Cluster Controller and Operator:

API GroupResourcesVerbsDescription
corenodes, podsget, list, watchRead-only permissions required for autoscaler
corenodespatch, updateWrite permissions required for node draining, patching, and deletion
corenodes, podsdeleteWrite permissions required for node draining, patching, and deletion
corenodes/statuspatchRequired to update node status
corepods/evictioncreateRequired for pod eviction
certificates.k8s.iocertificatesigningrequests/approvalpatch, updateRead/write permissions required for CSR approval
certificates.k8s.iosignersapproveRequired for approving kubelet certificates. Scoped to kubernetes.io/kube-apiserver-client-kubelet and kubernetes.io/kubelet-serving
certificates.k8s.iocertificatesigningrequestscreate, delete, get, list, watchRequired for certificate signing request management
corenamespacesdelete, getRequired to install/update/uninstall Cast AI components. Delete scoped to castai-llms only
corenamespacesgetRequired for namespace access
corenamespacesdeleteRequired to delete Cast AI namespace. Scoped to castai-agent only
autoscaling.cast.ai*create, delete, get, list, patch, update, watchRequired for autoscaling CRD management
live.cast.ai*create, delete, get, list, patch, update, watchRequired for Live Migration
pod-mutations.cast.ai*create, delete, get, list, patch, update, watchRequired for pod mutations related custom resources
storage.k8s.iovolumeattachmentsdelete, get, listRequired to manipulate volume attachments

The Cast AI agent's permissions (Read-only)

The Cast AI agent collects cluster operational details (snapshots) and delivers them to the central platform to assess if there is room for optimization. That's why it must get cluster-wide permissions:

API GroupResourcesVerbs
corepods, nodes, replicationcontrollers, persistentvolumeclaims, persistentvolumes, servicesget, list, watch
corenamespacesget
appsdeployments, replicasets, daemonsets, statefulsetsget, list, watch
storage.k8s.iostorageClasses, csinodesget, list, watch
batchjobsget, list, watch

The Cast AI agent's resource consumption vastly depends on the cluster size. The agent must be able to adjust resource limits proportionally to the size of your cluster. For that purpose, Cluster Proportional Vertical Autoscaler patches the Cast AI agent's deployment with re-estimated limits, which requires the following permission:

API GroupResourcesVerbsDescription
appsdeploymentspatchUsed only to patch the castai-agent deployment

Cluster Controller's permissions

Cast AI's Cluster Controller component gets installed when your connected cluster moves to Phase 2, in which you can enable managed cost savings:

» kubectl get deployments -n castai-agent
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
castai-agent                1/1     1            1           43h
castai-cluster-controller   2/2     2            2           64m
castai-evictor              0/0     0            0           64m

Cluster-wide permissions used by the Cluster Controller

The Cluster Controller operates mostly on the cluster level as it performs operations required to optimize its costs:

API GroupResourcesVerbsDescription
corenamespacesget
corepods, nodesget, list
corenodespatch, updateUsed for node draining and patching.
corepods, nodesdelete
corepods/evictioncreate
certificates.k8s.iocertificatesigningrequestsget, list, delete, createUsed for creating a new certificate when adding a node to the cluster.
certificates.k8s.iocertificatesigningrequests/approvalpatch, updateUsed for creating a new certificate when adding a node to the cluster.
certificates.k8s.iosignersapproveApplicable only for kubelet.
coreeventslist, create, patch
rbac.authorization.k8s.ioroles, clusterroles, clusterrolebindingsget, patch, update, delete, escalateApplicable to all Cast AI components.
corenamespacesdeleteApplicable only to the Cast AI agent.

Namespace-wide (castai-agent) permissions used by the Cluster Controller

One of the main tasks of the Cluster Controller is to update Cast AI components. The Cluster Controller is granted with all permissions in castai-agent namespace necessary for current and future changes.

Additionally, it includes two cluster-wide permissions for managing the RBAC of Cast AI components and the ability to delete the Cast AI namespace (see above).

Evictor permissions

When onboarding the cluster to Phase 2 for automated cost optimization, Cast AI installs more components than the Cluster Controller. One such component is Evictor, which minimizes the number of nodes your cluster uses.

Cluster-wide permissions used by Evictor

When installed, Evictor handles non-Cast AI pods, so it requires a set of cluster-wide permissions:

API GroupResourcesVerbsDescription
coreeventscreate, patch
corenodesget, list, watch, patch, updateUsed to find a suitable node for eviction.
corepodsget, list, watch, patch, update, create, deleteList pods to find a suitable node for eviction and delete a stuck pod from the node.
appsreplicaSetsgetUsed to find out if it's safe to evict a pod (it belongs to RS and has replicas).
corepods/evictioncreateUsed for pod eviction.
coordination.k8s.ioleases*Used for leader election when there may be a single instance active.

Pod Pinner permissions

To function correctly, Pod Pinner requires the following cluster-wide permissions:

API GroupResourcesVerbsDescription
corenodescreate, delete, getRequired to execute pod pinning actions
corepodsdelete, get, listRequired to execute pod pinning actions
corepods/bindingcreateRequired to execute pod pinning actions
admissionregistration.k8s.iomutatingwebhookconfigurationslist, watchRequired for Webhook functionality
admissionregistration.k8s.iomutatingwebhookconfigurationswatch, get, patch, updateRequired for Webhook functionality. Scoped to pod-pinner resource only