Cloud permissions

Permissions used by Cloud Providers (AWS/GCP/Azure/Oracle)

Permissions setup used by cloud providers

Cast AI requires different permission sets depending on the integration type: cluster onboarding and automation, Cloud Connect resource discovery, or Commitments data import.

Permissions for cluster onboarding and automation

When a cluster enters automated cost optimization mode, Cast AI performs operations at the cloud provider level, such as adding or removing nodes. These operations require specific credentials and permissions.

AWS

📘

Note

Cast AI strongly advises creating a dedicated IAM role and instance profile for Cast AI-managed nodes.

Reusing IAM roles with EKS Managed Node Groups is not recommended and can lead to unexpected service outages if AWS deletes the profile and role when the last managed node group is removed.

AWS permissions with access granted using a cross-account IAM role

When enabling cost optimization for a connected cluster, you grant permissions using a cross-account IAM role.

This feature creates a dedicated cluster user in the Cast AI AWS account with a trust policy that can assume the role defined in your AWS account.

Keeping role definitions and users in separate AWS accounts enables storing the user's credentials on the Cast AI side without handing them over when running the onboarding script, which improves security levels.

You can verify the set of permissions and trust relationship on your IAM role using the following commands:

aws iam get-role --role-name <role name>
aws iam list-attached-role-policies --role-name <role name>
aws iam list-role-policies --role-name <role name>

The following AWS-managed permissions are required for Cast AI to work:

In addition, the following custom policy is necessary:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PassRoleEC2",
            "Action": "iam:PassRole",
            "Effect": "Allow",
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": "ec2.amazonaws.com"
                }
            }
        },
        {
            "Sid": "NonResourcePermissions",
            "Effect": "Allow",
            "Action": [
                "iam:CreateServiceLinkedRole",
                "ec2:CreateKeyPair",
                "ec2:DeleteKeyPair",
                "ec2:CreateTags",
                "ec2:ImportKeyPair"
            ],
            "Resource": "*"
        },
        {
            "Sid": "RunInstancesPermissions",
            "Effect": "Allow",
            "Action": "ec2:RunInstances",
            "Resource": [
                "arn:aws:ec2:*:<account-id>:network-interface/*",
                "arn:aws:ec2:*:<account-id>:security-group/*",
                "arn:aws:ec2:*:<account-id>:volume/*",
                "arn:aws:ec2:*:<account-id>:key-pair/*",
                "arn:aws:ec2:*::image/*"
            ]
        },
        {
            "Sid": "RunInstancesTagRestriction",
            "Effect": "Allow",
            "Action": "ec2:RunInstances",
            "Resource": "arn:aws:ec2:<region>:<account-id>:instance/*",
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/kubernetes.io/cluster/<cluster-name>": "owned"
                }
            }
        },
        {
            "Sid": "RunInstancesVpcRestriction",
            "Effect": "Allow",
            "Action": "ec2:RunInstances",
            "Resource": "arn:aws:ec2:<region>:<account-id>:subnet/*",
            "Condition": {
                "StringEquals": {
                    "ec2:Vpc": "arn:aws:ec2:<region>:<account-id>:vpc/<vpc-id>"
                }
            }
        },
        {
            "Sid": "InstanceActionsTagRestriction",
            "Effect": "Allow",
            "Action": [
                "ec2:TerminateInstances",
                "ec2:StartInstances",
                "ec2:StopInstances",
                "ec2:CreateTags"
            ],
            "Resource": "arn:aws:ec2:<region>:<account-id>:instance/*",
            "Condition": {
                "StringEquals": {
                    "ec2:ResourceTag/kubernetes.io/cluster/<cluster-name>": [
                        "owned",
                        "shared"
                    ]
                }
            }
        },
        {
            "Sid": "AutoscalingActionsTagRestriction",
            "Effect": "Allow",
            "Action": [
                "autoscaling:UpdateAutoScalingGroup",
                "autoscaling:SuspendProcesses",
                "autoscaling:ResumeProcesses",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
            ],
            "Resource": "arn:aws:autoscaling:<region>:<account-id>:autoScalingGroup:*:autoScalingGroupName/*",
            "Condition": {
                "StringEquals": {
                    "autoscaling:ResourceTag/kubernetes.io/cluster/<cluster-name>": [
                        "owned",
                        "shared"
                    ]
                }
            }
        },
        {
            "Sid": "EKS",
            "Effect": "Allow",
            "Action": [
                "eks:Describe*",
                "eks:List*",
                "eks:TagResource",
                "eks:UntagResource"
            ],
            "Resource": [
                "arn:aws:eks:<region>:<account-id>:cluster/<cluster-name>",
                "arn:aws:eks:<region>:<account-id>:nodegroup/<cluster-name>/*/*"
            ]
        }
    ]
}

The IAM role needs the following trust relationship to allow Cast AI to assume it:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::809060229965:user/cast-crossrole-<cluster-id>"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "<cluster-id>"
                }
            }
        }
    ]
}

EBS Volume Permissions

If your cluster uses Amazon EBS volumes, ensure that the IAM roles associated with your nodes have the necessary permissions to manage them, including the ability to attach and detach them.

To manage EBS volumes, add the following permissions to your IAM role:

{
    "Sid": "ManageEBSVolumes",
    "Effect": "Allow",
    "Action": [
        "ec2:AttachVolume",
        "ec2:DetachVolume"
    ],
    "Resource": "*"
}

You can include these permissions in your existing IAM role policies or attach the AmazonEBSCSIDriverPolicy, which includes these permissions.

Example Policy Statement:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ManageEBSVolumes",
            "Effect": "Allow",
            "Action": [
                "ec2:AttachVolume",
                "ec2:DetachVolume"
            ],
            "Resource": "*"
        }
    ]
}

Ensure that these permissions are attached to the IAM role used by the instances running in your cluster to avoid issues with persistent volumes.

IPv6 support for EKS clusters

If your cluster uses IPv6 addressing, you need an additional IAM policy to allow pods to assign IPv6 addresses.

Add the following policy to your instance profile:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "ec2:AssignIpv6Addresses",
            "Resource": "*"
        }
    ]
}
📘

Note

The Cast AI onboarding script and Terraform module automatically create this policy by default. You only need to add it manually if you're setting up permissions outside of these standard deployment methods.

GCP

The GCP service account used by Cast AI

📘

Note

To learn more about using service account impersonation on GKE, see GKE service account impersonation.

The onboarding script creates a dedicated GCP service account that Cast AI uses to request and manage GCP resources on your behalf.

The service account follows a castai-gke-<cluster-name-hash> convention. You can verify the service account with the following command:

gcloud iam service-accounts describe castai-gke-<cluster-name-hash>@<your-gcp-project>.iam.gserviceaccount.com

The service account created by Cast AI includes the following roles:

Role nameDescription
castai.gkeAccessA Cast AI-managed role used to handle Cast AI add/delete node operations. You can find a full list of permissions below.
iam.serviceAccountUserA GCP-managed role to allow running operations as a service account.
📘

Important update for existing customers

As of 2024-08-19, Cast AI no longer requires the roles/container.developer role for new cluster onboarding. This change enhances security by reducing permissions. If you onboarded your cluster before this date, you may still see the container.developer role assigned to your Cast AI service account. You can safely remove this role to align with our updated, more secure permissions model. To remove the role, use the following command:

gcloud projects remove-iam-policy-binding YOUR_PROJECT_ID \
    --member=serviceAccount:castai-gke-<cluster-name-hash>@<your-gcp-project>.iam.gserviceaccount.com \
    --role=roles/container.developer

This action will not affect Cast AI's ability to manage your cluster.

IAM Conditions

When creating that service account, you can enforce conditional, attribute-based access on the iam.serviceAccountUser role.

It can access and act as all other service accounts or be scoped to those used by node pools in the GKE cluster, which is more secure and recommended. By default, the onboarding script follows the more secure option.

When onboarding the cluster with Terraform, you can use the castai_gke_iam module to specify which method you want to use. You can find an example here.

Required APIs to be enabled for a GCP project

APIDescription
cloudresourcemanager.googleapis.comAPI to create, read, and update metadata for GCP resource containers.
serviceusage.googleapis.comAPI to list, enable, and disable GCP services.

List of castai.gkeAccess role permissions:

» gcloud iam roles describe --project=<your-project-name> castai.gkeAccess

description: Role to manage GKE cluster via CAST AI
etag: example-tag
includedPermissions:
- compute.addresses.use
- compute.disks.create
- compute.disks.setLabels
- compute.disks.use
- compute.images.useReadOnly
- compute.images.get
- compute.instanceGroupManagers.get
- compute.instanceGroupManagers.update
- compute.instanceGroups.get
- compute.instanceTemplates.create
- compute.instanceTemplates.delete
- compute.instanceTemplates.get
- compute.instanceTemplates.list
- compute.instances.create
- compute.instances.delete
- compute.instances.get
- compute.instances.list
- compute.instances.setLabels
- compute.instances.setMetadata
- compute.instances.setServiceAccount
- compute.instances.setTags
- compute.instances.start
- compute.instances.stop
- compute.networks.use
- compute.networks.useExternalIp
- compute.subnetworks.get
- compute.subnetworks.use
- compute.subnetworks.useExternalIp
- compute.zones.get
- compute.zones.list
- compute.zoneOperations.get
- compute.regionOperations.get
- container.certificateSigningRequests.approve
- container.clusters.get
- container.clusters.update
- container.operations.get
- serviceusage.services.list
- resourcemanager.projects.getIamPolicy
name: projects/<your-project-name>/roles/castai.gkeAccess
stage: ALPHA
title: Role to manage GKE cluster via Cast AI

Azure

An overview of Azure permissions used by Cast AI

The onboarding script creates a dedicated Azure app registration for Cast AI to request and manage Azure resources on your behalf.

App registration naming follows this convention: CAST.AI ${CLUSTER_NAME}-${CASTAI_CLUSTER_ID:0:8}".

Created Cast AI app registration has a custom role bound to it. Custom role name follows this naming convention: CastAKSRole-${CASTAI_CLUSTER_ID:0:8}.

The role only has a predefined list of permissions to manage cluster resource groups.

List of CastAKSRole role permissions:

ROLE_NAME="CastAKSRole-${CASTAI_CLUSTER_ID:0:8}"
ROLE_DEF='{
   "Name": "'"$ROLE_NAME"'",
   "Description": "CAST.AI role used to manage '"$CLUSTER_NAME"' AKS cluster",
   "IsCustom": true,
   "Actions": [
       "Microsoft.Compute/*/read",
       "Microsoft.Compute/virtualMachines/*",
       "Microsoft.Compute/virtualMachineScaleSets/*",
       "Microsoft.Compute/disks/write",
       "Microsoft.Compute/disks/delete",
       "Microsoft.Compute/disks/beginGetAccess/action",
       "Microsoft.Compute/galleries/write",
       "Microsoft.Compute/galleries/delete",
       "Microsoft.Compute/galleries/images/write",
       "Microsoft.Compute/galleries/images/delete",
       "Microsoft.Compute/galleries/images/versions/write",
       "Microsoft.Compute/galleries/images/versions/delete",
       "Microsoft.Compute/snapshots/write",
       "Microsoft.Compute/snapshots/delete",
       "Microsoft.Network/*/read",
       "Microsoft.Network/networkInterfaces/write",
       "Microsoft.Network/networkInterfaces/delete",
       "Microsoft.Network/networkInterfaces/join/action",
       "Microsoft.Network/networkSecurityGroups/join/action",
       "Microsoft.Network/virtualNetworks/subnets/join/action",
       "Microsoft.Network/applicationGateways/backendhealth/action",
       "Microsoft.Network/applicationGateways/backendAddressPools/join/action",
       "Microsoft.Network/applicationSecurityGroups/joinIpConfiguration/action",
       "Microsoft.Network/loadBalancers/backendAddressPools/write",
       "Microsoft.Network/loadBalancers/backendAddressPools/join/action",
       "Microsoft.ContainerService/*/read",
       "Microsoft.ContainerService/managedClusters/start/action",
       "Microsoft.ContainerService/managedClusters/stop/action",
       "Microsoft.ContainerService/managedClusters/runCommand/action",
       "Microsoft.ContainerService/managedClusters/agentPools/*",
       "Microsoft.Resources/*/read",
       "Microsoft.Resources/tags/write",
       "Microsoft.Authorization/locks/read",
       "Microsoft.Authorization/roleAssignments/read",
       "Microsoft.Authorization/roleDefinitions/read",
       "Microsoft.ManagedIdentity/userAssignedIdentities/assign/action"
     ],
     "AssignableScopes": [
       "/subscriptions/'"$SUBSCRIPTION_ID"'/resourceGroups/'"$CLUSTER_GROUP"'",
       "/subscriptions/'"$SUBSCRIPTION_ID"'/resourceGroups/'"$NODE_GROUP"'"
     ]
}'

Permissions for Cloud Connect

Cloud Connect creates read-only IAM roles or service accounts to discover and inventory cloud resources. Permission scopes can be configured during setup to control the level of access granted.

📘

Note

For AWS, discovered resources are synchronized every hour. For GCP, Cloud Connect currently performs a one-time discovery.

AWS

Default

Grants the AWS ReadOnlyAccess managed policy, providing comprehensive read-only access to all AWS services.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "*"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": [
            "ec2.amazonaws.com"
          ]
        }
      }
    }
  ]
}

Minimal permissions

Provides targeted permissions for core AWS services, including EC2, EKS, RDS, SageMaker, and commitments data.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeAddresses",
        "eks:ListClusters",
        "eks:DescribeCluster",
        "rds:DescribeDBInstances",
        "rds:DescribeDBClusters",
        "sagemaker:ListEndpoints",
        "sagemaker:DescribeEndpoint",
        "sagemaker:ListTransformJobs",
        "sagemaker:DescribeTransformJob",
        "sagemaker:ListTrainingJobs",
        "sagemaker:DescribeTrainingJob",
        "sagemaker:ListNotebookInstances",
        "sagemaker:DescribeNotebookInstance",
        "sagemaker:ListProcessingJobs",
        "sagemaker:DescribeProcessingJob",
        "cloudwatch:GetMetricData",
        "savingsplans:Describe*",
        "savingsplans:List*",
        "ec2:DescribeReservedInstances",
        "ec2:DescribeReservedInstancesListings",
        "ec2:DescribeReservedInstancesModifications",
        "ec2:DescribeReservedInstancesOfferings",
        "organizations:ListAccounts",
        "organizations:DescribeOrganization",
        "account:ListRegions"
      ],
      "Resource": "*"
    }
  ]
}

AWS AI services

Provides List*, Describe*, and Get* permissions for AWS AI and machine learning services.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:List*",
        "bedrock:Describe*",
        "bedrock:Get*",
        "cloudwatch:List*",
        "cloudwatch:Describe*",
        "cloudwatch:Get*",
        "codeguru:List*",
        "codeguru:Describe*",
        "codeguru:Get*",
        "codeguru-reviewer:List*",
        "codeguru-reviewer:Describe*",
        "codeguru-reviewer:Get*",
        "comprehend:List*",
        "comprehend:Describe*",
        "comprehend:Get*",
        "comprehendmedical:List*",
        "comprehendmedical:Describe*",
        "comprehendmedical:Get*",
        "devops-guru:List*",
        "devops-guru:Describe*",
        "devops-guru:Get*",
        "forecast:List*",
        "forecast:Describe*",
        "forecast:Get*",
        "frauddetector:List*",
        "frauddetector:Describe*",
        "frauddetector:Get*",
        "healthlake:List*",
        "healthlake:Describe*",
        "healthlake:Get*",
        "kendra:List*",
        "kendra:Describe*",
        "kendra:Get*",
        "lex:List*",
        "lex:Describe*",
        "lex:Get*",
        "lookoutequipment:List*",
        "lookoutequipment:Describe*",
        "lookoutequipment:Get*",
        "lookoutmetrics:List*",
        "lookoutmetrics:Describe*",
        "lookoutmetrics:Get*",
        "lookoutvision:List*",
        "lookoutvision:Describe*",
        "lookoutvision:Get*",
        "monitron:List*",
        "monitron:Describe*",
        "monitron:Get*",
        "personalize:List*",
        "personalize:Describe*",
        "personalize:Get*",
        "polly:List*",
        "polly:Describe*",
        "polly:Get*",
        "rekognition:List*",
        "rekognition:Describe*",
        "rekognition:Get*",
        "sagemaker:List*",
        "sagemaker:Describe*",
        "sagemaker:Get*",
        "q:List*",
        "q:Describe*",
        "q:Get*",
        "textract:List*",
        "textract:Describe*",
        "textract:Get*",
        "transcribe:List*",
        "transcribe:Describe*",
        "transcribe:Get*",
        "translate:List*",
        "translate:Describe*",
        "translate:Get*",
        "deepcomposer:List*",
        "deepcomposer:Describe*",
        "deepcomposer:Get*",
        "deepracer:List*",
        "deepracer:Describe*",
        "deepracer:Get*",
        "panorama:List*",
        "panorama:Describe*",
        "panorama:Get*"
      ],
      "Resource": "*"
    }
  ]
}

GCP

Default

Uses the standard GCP roles for comprehensive resource access:

- roles/reader
- roles/viewer

Minimal permissions

Grants specific viewer roles for essential Cast AI functionality:

- roles/compute.viewer
- roles/container.viewer
- roles/cloudsql.viewer
- roles/billing.viewer
- roles/resourcemanager.projectViewer
- roles/aiplatform.viewer

Oracle

Default

Oracle Cloud Infrastructure uses policy statements for resource access:

Allow group CastAI-Group to read compartments in tenancy
Allow group CastAI-Group to read cluster-family in tenancy  
Allow group CastAI-Group to read instance-family in tenancy
Allow group CastAI-Group to read volume-family in tenancy
Allow group CastAI-Group to read database-family in tenancy
Allow group CastAI-Group to read virtual-network-family in tenancy

Permission Configuration

Permission scopes are configurable during the Cloud Connect integration setup. Based on their security requirements, users can select from available permission levels.

Example permission scope selection for Cloud Connect:

Permissions for Commitments import

Commitments import provides access to reserved instances, savings plans, and cost commitment data.

AWS

📘

Note

For AWS, commitment data is synchronized every hour.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "savingsplans:Describe*",
        "savingsplans:List*",
        "ec2:DescribeReservedInstances",
        "ec2:DescribeReservedInstancesListings",
        "ec2:DescribeReservedInstancesModifications",
        "ec2:DescribeReservedInstancesOfferings",
        "organizations:ListAccounts",
        "organizations:DescribeOrganization",
        "account:ListRegions"
      ],
      "Resource": "*"
    }
  ]
}

Azure

Azure commitments require read access to subscription and billing data:

{
  "Name": "CastAI-Commitments-Reader",
  "Actions": [
    "Microsoft.Consumption/*/read",
    "Microsoft.Billing/*/read",
    "Microsoft.Commerce/*/read",
    "Microsoft.CostManagement/*/read",
    "Microsoft.Resources/subscriptions/read",
    "Microsoft.Resources/subscriptions/resourceGroups/read"
  ],
  "AssignableScopes": [
    "/subscriptions/{subscription-id}"
  ]
}