Kubernetes permissions
Kubernetes Service Accounts and permissions used by Cast AI components.
Cast AI components running on your clusters use predefined service accounts and relevant permissions to perform functions such as sending data about the cluster state. This section discusses all required service accounts and permissions granted to Cast AI components.
Kubernetes Service Accounts used by Cast AI components
Each Cast AI component installed in your cluster uses a dedicated service account. Such a setup allows you to fine-tune permissions for each component:
» kubectl get serviceAccounts -n castai-agent
NAME SECRETS AGE
castai-agent 0 97d
castai-ai-optimizer-proxy 0 55d
castai-aibrix-controller-manager 0 55d
castai-aws-node 0 57d
castai-cluster-controller 0 97d
castai-evictor 0 97d
castai-kvisor 0 97d
castai-kvisor-controller 0 97d
castai-live-controller 0 57d
castai-live-daemon 3 57d
castai-live-patch-sa 0 16d
castai-pod-mutator 0 85d
castai-pod-pinner 0 97d
castai-spot-handler 0 97d
castai-workload-autoscaler 0 97d
castware-operator-controller-manager 0 3d16h
default 0 97dThe Cast AI Operator's permissions
The Cast AI Operator manages the installation, configuration, and updating of Cast AI components. The Operator requires permissions to install and manage components through Helm charts, which include both read permissions to understand the cluster state and write permissions to create and update resources.
Privilege escalation prevention
The Operator follows Kubernetes privilege escalation prevention mechanisms. When creating or updating roles and cluster roles, the Operator's service account must already have all permissions contained in the resource being created or updated. This prevents the Operator from granting itself or other components permissions it doesn't already have.
Roles and cluster roles can only escalate privileges if the escalate verb is explicitly specified for the roles or clusterroles resource, which the Operator does not have.
Phase 1 permissions
Phase 1 permissions allow the Operator to install and manage the castai-agent component. These permissions include all the read-only permissions the agent needs, plus additional write permissions to manage the agent's lifecycle.
Namespace-wide permissions in castai-agent
castai-agentNamespace-level permissions in the castai-agent namespace are lower risk than cluster-level permissions as they only affect Cast AI resources.
Permissions shared by both the agent and Operator:
| API Group | Resources | Verbs | Description |
|---|---|---|---|
| coordination.k8s.io | leases | create, get, list, watch, update | Required for lease election |
| core | configmaps | get, list, watch | Required for cost savings estimation features |
| apps | deployments | patch | Required for proportional vertical cluster autoscaler to adjust castai-agent requests/limits. Scoped to castai-agent deployment only |
Operator-only permissions in castai-agent namespace:
| API Group | Resources | Verbs | Description |
|---|---|---|---|
| apps | deployments | create, delete, get, list, patch, update | Required to create/update/delete agent deployment |
| core | secrets | create, delete, get, list, patch, update, watch | Required to create/update/delete secrets containing Helm release state and Operator webhook certificates |
| core | serviceaccounts | create, delete, get, list, patch, update, watch | Required to create/update/delete agent service accounts |
| core | services | create, delete, get, list, patch, update, watch | Required to create/update/delete Operator webhook services |
| policy | poddisruptionbudgets | create, delete, get, list, patch, update, watch | Required to create/update/delete agent pod disruption budget |
| rbac.authorization.k8s.io | rolebindings, roles | create, delete, get, list, patch, update, watch | Required to create/update/delete agent role and role binding |
| core | pods/log | get | Required by the Operator to read agent logs and extract cluster ID |
| core | resourcequotas | create, delete, get, list, patch, update, watch | Required to create/update/delete agent resource quota |
| core | configmaps | create, update, patch, delete | Required to create/update/delete agent configmap and webhook CA configmap |
| batch | jobs | create, delete | Required to run self-upgrade and post-uninstall jobs |
| coordination.k8s.io | leases | create, update, patch, delete | Required for Operator leader election |
| core | events | create, patch | Required for event recording |
Operator-only permissions in kube-system namespace:
| API Group | Resources | Verbs | Description |
|---|---|---|---|
| rbac.authorization.k8s.io | roles, rolebindings | delete | Required by the Operator to uninstall the agent |
Resource names are scoped to castai-agent only.
Namespace-wide permissions in kube-system
kube-system| API Group | Resources | Verbs | Description |
|---|---|---|---|
| core | configmaps | get, list, watch | Required by the agent only in kube-system |
Cluster-wide permissions
Cluster-level permissions are required by the agent to read resources in the cluster and create snapshots to send to the platform, and by the Operator to grant those permissions to the agent.
Permissions shared by both the agent and Operator (read-only):
| API Group | Resources | Verbs | Description |
|---|---|---|---|
| core | pods, nodes, replicationcontrollers, persistentvolumeclaims, persistentvolumes, services, namespaces, events, limitranges, resourcequotas | get, list, watch | Required for cost savings estimation features |
| core | namespaces | get | |
| apps | deployments, replicasets, daemonsets, statefulsets | get, list, watch | |
| storage.k8s.io | storageclasses, csinodes | get, list, watch | |
| batch | jobs, cronjobs | get, list, watch | |
| autoscaling | horizontalpodautoscalers | get, list, watch | |
| metrics.k8s.io | pods | get, list | |
| policy | poddisruptionbudgets | get, list, watch | |
| (non-resource URL) | /version | get | |
| networking.k8s.io | networkpolicies, ingresses | get, list, watch | Required for security and k8s compliance reporting |
| rbac.authorization.k8s.io | roles, rolebindings, clusterroles, clusterrolebindings | get, list, watch | Required for security and k8s compliance reporting |
| karpenter.sh | provisioners, machines, nodepools, nodeclaims, nodeoverlays | get, list, watch | Required for Karpenter resource monitoring |
| karpenter.k8s.aws | awsnodetemplates, ec2nodeclasses | get, list, watch | Required for Karpenter AWS resource monitoring |
| datadoghq.com | extendeddaemonsetreplicasets | get, list, watch | Required for Datadog resource monitoring |
| argoproj.io | rollouts | get, list, watch | Required for Argo Rollouts monitoring |
| autoscaling.cast.ai | recommendations | get, list, watch | Required for Cast AI autoscaling recommendations |
| pod-mutations.cast.ai | podmutations | get, list, watch | Required for Cast AI pod mutation monitoring |
| resource.k8s.io | deviceclasses, resourceclaims, resourceclaimtemplates, resourceslices, devicetaintrules | get, list, watch | Required for Kubernetes resource management API monitoring |
| runbooks.cast.ai | recommendationsyncs | get, list, watch | Required for Agentic AI workflows |
Operator-only cluster permissions:
| API Group | Resources | Verbs | Description |
|---|---|---|---|
| castware.cast.ai | clusters, components | create, delete, get, list, patch, update, watch | Required to create/update/delete Operator custom resources |
| castware.cast.ai | clusters/status, components/status | get, patch, update | Required to update status of Operator custom resources |
| castware.cast.ai | clusters/finalizers, components/finalizers | update | Required to add finalizers to Operator custom resources |
| apiextensions.k8s.io | customresourcedefinitions | create, delete, get, list, patch, update | Required to create/update/delete Operator CRDs during installation, runtime, and upgrades. Scoped to clusters.castware.cast.ai and components.castware.cast.ai only. The create, update, and patch permissions enable automatic CRD upgrades during Helm operations and allow the operator to self-manage CRD schemas. |
| rbac.authorization.k8s.io | clusterrolebindings, clusterroles | delete | Required to install/update/delete agent cluster role and cluster role binding if the agent is uninstalled. Scoped to castai-agent only |
| rbac.authorization.k8s.io | clusterrolebindings, clusterroles, rolebindings, roles | create, patch | Required to create RBAC resources for managed components |
| admissionregistration.k8s.io | mutatingwebhookconfigurations, validatingwebhookconfigurations | get, list, patch, update, watch | Required to create/update/delete Operator webhooks |
Extended permissions (Phase 2)
Phase 2 permissions are optional and not required to install the agent. They're disabled by default and can be enabled by setting the Helm value extendedPermissions to true. These permissions are enabled automatically when running the install script for cluster-controller with the Operator.
Phase 2 permissions match those required by the Cluster Controller because the Operator cannot escalate privileges and needs the same permissions as the components it installs.
Namespace-wide permissions in castai-agent (Phase 2)
castai-agent (Phase 2)When Phase 2 permissions are enabled, both the Cluster Controller and Operator become namespace administrators for the castai-agent namespace:
| API Group | Resources | Verbs | Description |
|---|---|---|---|
* | * | * | Full administrative permissions in castai-agent namespace |
Cluster-wide permissions (Phase 2)
Cluster-level Phase 2 permissions are granted to both the Cluster Controller and Operator:
| API Group | Resources | Verbs | Description |
|---|---|---|---|
| core | nodes, pods | get, list, watch | Read-only permissions required for autoscaler |
| core | nodes | patch, update | Write permissions required for node draining, patching, and deletion |
| core | nodes, pods | delete | Write permissions required for node draining, patching, and deletion |
| core | nodes/status | patch | Required to update node status |
| core | pods/eviction | create | Required for pod eviction |
| certificates.k8s.io | certificatesigningrequests/approval | patch, update | Read/write permissions required for CSR approval |
| certificates.k8s.io | signers | approve | Required for approving kubelet certificates. Scoped to kubernetes.io/kube-apiserver-client-kubelet and kubernetes.io/kubelet-serving |
| certificates.k8s.io | certificatesigningrequests | create, delete, get, list, watch | Required for certificate signing request management |
| core | namespaces | delete, get | Required to install/update/uninstall Cast AI components. Delete scoped to castai-llms only |
| core | namespaces | get | Required for namespace access |
| core | namespaces | delete | Required to delete Cast AI namespace. Scoped to castai-agent only |
| autoscaling.cast.ai | * | create, delete, get, list, patch, update, watch | Required for autoscaling CRD management |
| live.cast.ai | * | create, delete, get, list, patch, update, watch | Required for Live Migration |
| pod-mutations.cast.ai | * | create, delete, get, list, patch, update, watch | Required for pod mutations related custom resources |
| storage.k8s.io | volumeattachments | delete, get, list | Required to manipulate volume attachments |
The Cast AI agent's permissions (Read-only)
The Cast AI agent collects cluster operational details (snapshots) and delivers them to the central platform to assess if there is room for optimization. That's why it must get cluster-wide permissions:
| API Group | Resources | Verbs |
|---|---|---|
| core | pods, nodes, replicationcontrollers, persistentvolumeclaims, persistentvolumes, services | get, list, watch |
| core | namespaces | get |
| apps | deployments, replicasets, daemonsets, statefulsets | get, list, watch |
| storage.k8s.io | storageClasses, csinodes | get, list, watch |
| batch | jobs | get, list, watch |
The Cast AI agent's resource consumption vastly depends on the cluster size. The agent must be able to adjust resource limits proportionally to the size of your cluster. For that purpose, Cluster Proportional Vertical Autoscaler patches the Cast AI agent's deployment with re-estimated limits, which requires the following permission:
| API Group | Resources | Verbs | Description |
|---|---|---|---|
| apps | deployments | patch | Used only to patch the castai-agent deployment |
Cluster Controller's permissions
Cast AI's Cluster Controller component gets installed when your connected cluster moves to Phase 2, in which you can enable managed cost savings:
» kubectl get deployments -n castai-agent
NAME READY UP-TO-DATE AVAILABLE AGE
castai-agent 1/1 1 1 43h
castai-cluster-controller 2/2 2 2 64m
castai-evictor 0/0 0 0 64mCluster-wide permissions used by the Cluster Controller
The Cluster Controller operates mostly on the cluster level as it performs operations required to optimize its costs:
| API Group | Resources | Verbs | Description |
|---|---|---|---|
| core | namespaces | get | |
| core | pods, nodes | get, list | |
| core | nodes | patch, update | Used for node draining and patching. |
| core | pods, nodes | delete | |
| core | pods/eviction | create | |
| certificates.k8s.io | certificatesigningrequests | get, list, delete, create | Used for creating a new certificate when adding a node to the cluster. |
| certificates.k8s.io | certificatesigningrequests/approval | patch, update | Used for creating a new certificate when adding a node to the cluster. |
| certificates.k8s.io | signers | approve | Applicable only for kubelet. |
| core | events | list, create, patch | |
| rbac.authorization.k8s.io | roles, clusterroles, clusterrolebindings | get, patch, update, delete, escalate | Applicable to all Cast AI components. |
| core | namespaces | delete | Applicable only to the Cast AI agent. |
Namespace-wide (castai-agent) permissions used by the Cluster Controller
One of the main tasks of the Cluster Controller is to update Cast AI components. The Cluster Controller is granted with all permissions in castai-agent namespace necessary for current and future changes.
Additionally, it includes two cluster-wide permissions for managing the RBAC of Cast AI components and the ability to delete the Cast AI namespace (see above).
Evictor permissions
When onboarding the cluster to Phase 2 for automated cost optimization, Cast AI installs more components than the Cluster Controller. One such component is Evictor, which minimizes the number of nodes your cluster uses.
Cluster-wide permissions used by Evictor
When installed, Evictor handles non-Cast AI pods, so it requires a set of cluster-wide permissions:
| API Group | Resources | Verbs | Description |
|---|---|---|---|
| core | events | create, patch | |
| core | nodes | get, list, watch, patch, update | Used to find a suitable node for eviction. |
| core | pods | get, list, watch, patch, update, create, delete | List pods to find a suitable node for eviction and delete a stuck pod from the node. |
| apps | replicaSets | get | Used to find out if it's safe to evict a pod (it belongs to RS and has replicas). |
| core | pods/eviction | create | Used for pod eviction. |
| coordination.k8s.io | leases | * | Used for leader election when there may be a single instance active. |
Pod Pinner permissions
To function correctly, Pod Pinner requires the following cluster-wide permissions:
| API Group | Resources | Verbs | Description |
|---|---|---|---|
| core | nodes | create, delete, get | Required to execute pod pinning actions |
| core | pods | delete, get, list | Required to execute pod pinning actions |
| core | pods/binding | create | Required to execute pod pinning actions |
| admissionregistration.k8s.io | mutatingwebhookconfigurations | list, watch | Required for Webhook functionality |
| admissionregistration.k8s.io | mutatingwebhookconfigurations | watch, get, patch, update | Required for Webhook functionality. Scoped to pod-pinner resource only |
Updated 1 day ago
