When the cluster enters the mode of automated cost optimization, the central system of CAST AI can start performing operations at the Cloud Provider (AWS/GCP/Azure) level. An example of such action would be a request to add a node to a cluster.

Performing such operations requires relevant credentials and permissions specific to your Cloud Service Provider. This guide describes permission setups for AWS, GCP, and Azure.


AWS permissions with access granted using a cross-account IAM role

When enabling cost optimization for a connected cluster, you grant permissions using a cross-account IAM role.

This feature creates a dedicated cluster user in the CAST AI AWS account with a trust policy that can assume the role defined in your AWS account.

Keeping role definitions and users in separate AWS accounts enables storing the user's credentials on the CAST AI side without handing them over when running the onboarding script, which improves security levels.

You can verify the set of permissions using the following command:

aws iam list-attached-role-policies --role-name <role name>
aws iam list-role-policies --role-name <role name>

    "PolicyVersion": {
        "Document": {
            "Version": "2012-10-17",
            "Statement": [
                    "Sid": "PassRoleEC2",
                    "Action": "iam:PassRole",
                    "Effect": "Allow",
                    "Resource": "arn:aws:iam::*:role/*",
                    "Condition": {
                        "StringEquals": {
                            "iam:PassedToService": ""
                    "Sid": "NonResourcePermissions",
                    "Effect": "Allow",
                    "Action": [
                    "Resource": "*"
                    "Sid": "RunInstancesPermissions",
                    "Effect": "Allow",
                    "Action": "ec2:RunInstances",
                    "Resource": [
        "VersionId": "v83",
        "IsDefaultVersion": true,
        "CreateDate": "2022-05-12T12:49:01+00:00"

Additionally, you can create a trust relationship with the following:

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789012:user/cast-crossrole-f8f82b9c-d375-40d2-9483-123456789012"
            "Action": "sts:AssumeRole"


The GCP service account used by CAST AI

The onboarding script creates a dedicated GCP service account that CAST AI uses to request and manage GCP resources on your behalf.

The service account follows a castai-gke-<cluster-name-hash> convention. You can verify the service account with the following command:

gcloud iam service-accounts describe castai-gke-<cluster-name-hash>@<your-gcp-project>

The service account created by CAST AI includes the following roles:

Role nameDescription
castai.gkeAccessA CAST AI-managed role used to handle CAST AI add/delete node operations. You can find a full list of permissions below.
container.developerA GCP-managed role for full access to Kubernetes API objects inside the Kubernetes cluster.
iam.serviceAccountUserA GCP-managed role to allow running operations as a service account.

IAM Conditions

When creating that service account, you can enforce conditional, attribute-based access on the iam.serviceAccountUser role.
It can access and act as all other service accounts or be scoped to those used by node pools in the GKE cluster, which is more secure and recommended. By default, the onboarding script follows the more secure option.

When onboarding the cluster with Terraform, you can use the castai_gke_iam module to specify which method you want to use. You can find an example here.

List of castai.gkeAccess role permissions:

» gcloud iam roles describe --project=<your-project-name> castai.gkeAccess

description: Role to manage GKE cluster via CAST AI
etag: example-tag
- compute.addresses.use
- compute.disks.create
- compute.disks.setLabels
- compute.disks.use
- compute.images.useReadOnly
- compute.instanceGroupManagers.get
- compute.instanceGroupManagers.update
- compute.instanceGroups.get
- compute.instanceTemplates.create
- compute.instanceTemplates.delete
- compute.instanceTemplates.get
- compute.instanceTemplates.list
- compute.instances.create
- compute.instances.delete
- compute.instances.get
- compute.instances.list
- compute.instances.setLabels
- compute.instances.setMetadata
- compute.instances.setServiceAccount
- compute.instances.setTags
- compute.instances.start
- compute.instances.stop
- compute.networks.use
- compute.networks.useExternalIp
- compute.subnetworks.get
- compute.subnetworks.use
- compute.subnetworks.useExternalIp
- compute.zones.get
- compute.zones.list
- container.certificateSigningRequests.approve
- container.clusters.get
- container.clusters.update
- container.operations.get
- resourcemanager.projects.getIamPolicy
name: projects/<your-project-name>/roles/castai.gkeAccess
stage: ALPHA
title: Role to manage GKE cluster via CAST AI


An overview of Azure permissions used by CAST AI

The onboarding script creates a dedicated Azure app registration for CAST AI to request and manage Azure resources on your behalf.

App registration naming follows this convention: CAST.AI ${CLUSTER_NAME}-${CASTAI_CLUSTER_ID:0:8}".

Created CAST AI app registration has a custom role bound to it. Custom role name follows this naming convention: CastAKSRole-${CASTAI_CLUSTER_ID:0:8}.

The role only has a predefined list of permissions to manage cluster resource groups.

List of CastAKSRole role permissions:

   "Name": "'"$ROLE_NAME"'",
   "Description": "CAST.AI role used to manage '"$CLUSTER_NAME"' AKS cluster",
   "IsCustom": true,
   "Actions": [
     "AssignableScopes": [