Cloud permissions
Permissions used by Cloud Providers (AWS/GCP/Azure/Oracle)
Permissions setup used by cloud providers
Cast AI requires different permission sets depending on the integration type: cluster onboarding and automation, Cloud Connect resource discovery, or Commitments data import.
Permissions for cluster onboarding and automation
When a cluster enters automated cost optimization mode, Cast AI performs operations at the cloud provider level, such as adding or removing nodes. These operations require specific credentials and permissions.
AWS
NoteCast AI strongly advises creating a dedicated IAM role and instance profile for Cast AI-managed nodes.
Reusing IAM roles with EKS Managed Node Groups is not recommended and can lead to unexpected service outages if AWS deletes the profile and role when the last managed node group is removed.
AWS permissions with access granted using a cross-account IAM role
When enabling cost optimization for a connected cluster, you grant permissions using a cross-account IAM role.
This feature creates a dedicated cluster user in the Cast AI AWS account with a trust policy that can assume the role defined in your AWS account.
Keeping role definitions and users in separate AWS accounts enables storing the user's credentials on the Cast AI side without handing them over when running the onboarding script, which improves security levels.
You can verify the set of permissions and trust relationship on your IAM role using the following commands:
aws iam get-role --role-name <role name>
aws iam list-attached-role-policies --role-name <role name>
aws iam list-role-policies --role-name <role name>The following AWS-managed permissions are required for Cast AI to work:
In addition, the following custom policy is necessary:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PassRoleEC2",
"Action": "iam:PassRole",
"Effect": "Allow",
"Resource": "arn:aws:iam::*:role/*",
"Condition": {
"StringEquals": {
"iam:PassedToService": "ec2.amazonaws.com"
}
}
},
{
"Sid": "NonResourcePermissions",
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole",
"ec2:CreateKeyPair",
"ec2:DeleteKeyPair",
"ec2:CreateTags",
"ec2:ImportKeyPair"
],
"Resource": "*"
},
{
"Sid": "RunInstancesPermissions",
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:<account-id>:network-interface/*",
"arn:aws:ec2:*:<account-id>:security-group/*",
"arn:aws:ec2:*:<account-id>:volume/*",
"arn:aws:ec2:*:<account-id>:key-pair/*",
"arn:aws:ec2:*::image/*"
]
},
{
"Sid": "RunInstancesTagRestriction",
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:<region>:<account-id>:instance/*",
"Condition": {
"StringEquals": {
"aws:RequestTag/kubernetes.io/cluster/<cluster-name>": "owned"
}
}
},
{
"Sid": "RunInstancesVpcRestriction",
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:<region>:<account-id>:subnet/*",
"Condition": {
"StringEquals": {
"ec2:Vpc": "arn:aws:ec2:<region>:<account-id>:vpc/<vpc-id>"
}
}
},
{
"Sid": "InstanceActionsTagRestriction",
"Effect": "Allow",
"Action": [
"ec2:TerminateInstances",
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:CreateTags"
],
"Resource": "arn:aws:ec2:<region>:<account-id>:instance/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/kubernetes.io/cluster/<cluster-name>": [
"owned",
"shared"
]
}
}
},
{
"Sid": "AutoscalingActionsTagRestriction",
"Effect": "Allow",
"Action": [
"autoscaling:UpdateAutoScalingGroup",
"autoscaling:SuspendProcesses",
"autoscaling:ResumeProcesses",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "arn:aws:autoscaling:<region>:<account-id>:autoScalingGroup:*:autoScalingGroupName/*",
"Condition": {
"StringEquals": {
"autoscaling:ResourceTag/kubernetes.io/cluster/<cluster-name>": [
"owned",
"shared"
]
}
}
},
{
"Sid": "EKS",
"Effect": "Allow",
"Action": [
"eks:Describe*",
"eks:List*",
"eks:TagResource",
"eks:UntagResource"
],
"Resource": [
"arn:aws:eks:<region>:<account-id>:cluster/<cluster-name>",
"arn:aws:eks:<region>:<account-id>:nodegroup/<cluster-name>/*/*"
]
}
]
}The IAM role needs the following trust relationship to allow Cast AI to assume it:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::809060229965:user/cast-crossrole-<cluster-id>"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<cluster-id>"
}
}
}
]
}EBS Volume Permissions
If your cluster uses Amazon EBS volumes, ensure that the IAM roles associated with your nodes have the necessary permissions to manage them, including the ability to attach and detach them.
To manage EBS volumes, add the following permissions to your IAM role:
{
"Sid": "ManageEBSVolumes",
"Effect": "Allow",
"Action": [
"ec2:AttachVolume",
"ec2:DetachVolume"
],
"Resource": "*"
}You can include these permissions in your existing IAM role policies or attach the AmazonEBSCSIDriverPolicy, which includes these permissions.
Example Policy Statement:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ManageEBSVolumes",
"Effect": "Allow",
"Action": [
"ec2:AttachVolume",
"ec2:DetachVolume"
],
"Resource": "*"
}
]
}Ensure that these permissions are attached to the IAM role used by the instances running in your cluster to avoid issues with persistent volumes.
IPv6 support for EKS clusters
If your cluster uses IPv6 addressing, you need an additional IAM policy to allow pods to assign IPv6 addresses.
Add the following policy to your instance profile:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ec2:AssignIpv6Addresses",
"Resource": "*"
}
]
}
NoteThe Cast AI onboarding script and Terraform module automatically create this policy by default. You only need to add it manually if you're setting up permissions outside of these standard deployment methods.
GCP
The GCP service account used by Cast AI
NoteTo learn more about using service account impersonation on GKE, see GKE service account impersonation.
The onboarding script creates a dedicated GCP service account that Cast AI uses to request and manage GCP resources on your behalf.
The service account follows a castai-gke-<cluster-name-hash> convention. You can verify the service account with the following command:
gcloud iam service-accounts describe castai-gke-<cluster-name-hash>@<your-gcp-project>.iam.gserviceaccount.comThe service account created by Cast AI includes the following roles:
| Role name | Description |
|---|---|
castai.gkeAccess | A Cast AI-managed role used to handle Cast AI add/delete node operations. You can find a full list of permissions below. |
iam.serviceAccountUser | A GCP-managed role to allow running operations as a service account. |
Important update for existing customersAs of 2024-08-19, Cast AI no longer requires the
roles/container.developerrole for new cluster onboarding. This change enhances security by reducing permissions. If you onboarded your cluster before this date, you may still see thecontainer.developerrole assigned to your Cast AI service account. You can safely remove this role to align with our updated, more secure permissions model. To remove the role, use the following command:gcloud projects remove-iam-policy-binding YOUR_PROJECT_ID \ --member=serviceAccount:castai-gke-<cluster-name-hash>@<your-gcp-project>.iam.gserviceaccount.com \ --role=roles/container.developerThis action will not affect Cast AI's ability to manage your cluster.
IAM Conditions
When creating that service account, you can enforce conditional, attribute-based access on the iam.serviceAccountUser role.
It can access and act as all other service accounts or be scoped to those used by node pools in the GKE cluster, which is more secure and recommended. By default, the onboarding script follows the more secure option.
When onboarding the cluster with Terraform, you can use the castai_gke_iam module to specify which method you want to use. You can find an example here.
Required APIs to be enabled for a GCP project
| API | Description |
|---|---|
cloudresourcemanager.googleapis.com | API to create, read, and update metadata for GCP resource containers. |
serviceusage.googleapis.com | API to list, enable, and disable GCP services. |
List of castai.gkeAccess role permissions:
castai.gkeAccess role permissions:» gcloud iam roles describe --project=<your-project-name> castai.gkeAccess
description: Role to manage GKE cluster via CAST AI
etag: example-tag
includedPermissions:
- compute.addresses.use
- compute.disks.create
- compute.disks.setLabels
- compute.disks.use
- compute.images.useReadOnly
- compute.images.get
- compute.instanceGroupManagers.get
- compute.instanceGroupManagers.update
- compute.instanceGroups.get
- compute.instanceTemplates.create
- compute.instanceTemplates.delete
- compute.instanceTemplates.get
- compute.instanceTemplates.list
- compute.instances.create
- compute.instances.delete
- compute.instances.get
- compute.instances.list
- compute.instances.setLabels
- compute.instances.setMetadata
- compute.instances.setServiceAccount
- compute.instances.setTags
- compute.instances.start
- compute.instances.stop
- compute.networks.use
- compute.networks.useExternalIp
- compute.subnetworks.get
- compute.subnetworks.use
- compute.subnetworks.useExternalIp
- compute.zones.get
- compute.zones.list
- compute.zoneOperations.get
- compute.regionOperations.get
- container.certificateSigningRequests.approve
- container.clusters.get
- container.clusters.update
- container.operations.get
- serviceusage.services.list
- resourcemanager.projects.getIamPolicy
name: projects/<your-project-name>/roles/castai.gkeAccess
stage: ALPHA
title: Role to manage GKE cluster via Cast AIAzure
An overview of Azure permissions used by Cast AI
The onboarding script creates a dedicated Azure app registration for Cast AI to request and manage Azure resources on your behalf.
App registration naming follows this convention: CAST.AI ${CLUSTER_NAME}-${CASTAI_CLUSTER_ID:0:8}".
Created Cast AI app registration has a custom role bound to it. Custom role name follows this naming convention: CastAKSRole-${CASTAI_CLUSTER_ID:0:8}.
The role only has a predefined list of permissions to manage cluster resource groups.
List of CastAKSRole role permissions:
CastAKSRole role permissions:ROLE_NAME="CastAKSRole-${CASTAI_CLUSTER_ID:0:8}"
ROLE_DEF='{
"Name": "'"$ROLE_NAME"'",
"Description": "CAST.AI role used to manage '"$CLUSTER_NAME"' AKS cluster",
"IsCustom": true,
"Actions": [
"Microsoft.Compute/*/read",
"Microsoft.Compute/virtualMachines/*",
"Microsoft.Compute/virtualMachineScaleSets/*",
"Microsoft.Compute/disks/write",
"Microsoft.Compute/disks/delete",
"Microsoft.Compute/disks/beginGetAccess/action",
"Microsoft.Compute/galleries/write",
"Microsoft.Compute/galleries/delete",
"Microsoft.Compute/galleries/images/write",
"Microsoft.Compute/galleries/images/delete",
"Microsoft.Compute/galleries/images/versions/write",
"Microsoft.Compute/galleries/images/versions/delete",
"Microsoft.Compute/snapshots/write",
"Microsoft.Compute/snapshots/delete",
"Microsoft.Network/*/read",
"Microsoft.Network/networkInterfaces/write",
"Microsoft.Network/networkInterfaces/delete",
"Microsoft.Network/networkInterfaces/join/action",
"Microsoft.Network/networkSecurityGroups/join/action",
"Microsoft.Network/virtualNetworks/subnets/join/action",
"Microsoft.Network/applicationGateways/backendhealth/action",
"Microsoft.Network/applicationGateways/backendAddressPools/join/action",
"Microsoft.Network/applicationSecurityGroups/joinIpConfiguration/action",
"Microsoft.Network/loadBalancers/backendAddressPools/write",
"Microsoft.Network/loadBalancers/backendAddressPools/join/action",
"Microsoft.ContainerService/*/read",
"Microsoft.ContainerService/managedClusters/start/action",
"Microsoft.ContainerService/managedClusters/stop/action",
"Microsoft.ContainerService/managedClusters/runCommand/action",
"Microsoft.ContainerService/managedClusters/agentPools/*",
"Microsoft.Resources/*/read",
"Microsoft.Resources/tags/write",
"Microsoft.Authorization/locks/read",
"Microsoft.Authorization/roleAssignments/read",
"Microsoft.Authorization/roleDefinitions/read",
"Microsoft.ManagedIdentity/userAssignedIdentities/assign/action"
],
"AssignableScopes": [
"/subscriptions/'"$SUBSCRIPTION_ID"'/resourceGroups/'"$CLUSTER_GROUP"'",
"/subscriptions/'"$SUBSCRIPTION_ID"'/resourceGroups/'"$NODE_GROUP"'"
]
}'Permissions for Cloud Connect
Cloud Connect creates read-only IAM roles or service accounts to discover and inventory cloud resources. Permission scopes can be configured during setup to control the level of access granted.
NoteFor AWS, discovered resources are synchronized every hour. For GCP, Cloud Connect currently performs a one-time discovery.
AWS
Default
Grants the AWS ReadOnlyAccess managed policy, providing comprehensive read-only access to all AWS services.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"*"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"iam:PassedToService": [
"ec2.amazonaws.com"
]
}
}
}
]
}Minimal permissions
Provides targeted permissions for core AWS services, including EC2, EKS, RDS, SageMaker, and commitments data.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeAddresses",
"eks:ListClusters",
"eks:DescribeCluster",
"rds:DescribeDBInstances",
"rds:DescribeDBClusters",
"sagemaker:ListEndpoints",
"sagemaker:DescribeEndpoint",
"sagemaker:ListTransformJobs",
"sagemaker:DescribeTransformJob",
"sagemaker:ListTrainingJobs",
"sagemaker:DescribeTrainingJob",
"sagemaker:ListNotebookInstances",
"sagemaker:DescribeNotebookInstance",
"sagemaker:ListProcessingJobs",
"sagemaker:DescribeProcessingJob",
"cloudwatch:GetMetricData",
"savingsplans:Describe*",
"savingsplans:List*",
"ec2:DescribeReservedInstances",
"ec2:DescribeReservedInstancesListings",
"ec2:DescribeReservedInstancesModifications",
"ec2:DescribeReservedInstancesOfferings",
"organizations:ListAccounts",
"organizations:DescribeOrganization",
"account:ListRegions"
],
"Resource": "*"
}
]
}AWS AI services
Provides List*, Describe*, and Get* permissions for AWS AI and machine learning services.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:List*",
"bedrock:Describe*",
"bedrock:Get*",
"cloudwatch:List*",
"cloudwatch:Describe*",
"cloudwatch:Get*",
"codeguru:List*",
"codeguru:Describe*",
"codeguru:Get*",
"codeguru-reviewer:List*",
"codeguru-reviewer:Describe*",
"codeguru-reviewer:Get*",
"comprehend:List*",
"comprehend:Describe*",
"comprehend:Get*",
"comprehendmedical:List*",
"comprehendmedical:Describe*",
"comprehendmedical:Get*",
"devops-guru:List*",
"devops-guru:Describe*",
"devops-guru:Get*",
"forecast:List*",
"forecast:Describe*",
"forecast:Get*",
"frauddetector:List*",
"frauddetector:Describe*",
"frauddetector:Get*",
"healthlake:List*",
"healthlake:Describe*",
"healthlake:Get*",
"kendra:List*",
"kendra:Describe*",
"kendra:Get*",
"lex:List*",
"lex:Describe*",
"lex:Get*",
"lookoutequipment:List*",
"lookoutequipment:Describe*",
"lookoutequipment:Get*",
"lookoutmetrics:List*",
"lookoutmetrics:Describe*",
"lookoutmetrics:Get*",
"lookoutvision:List*",
"lookoutvision:Describe*",
"lookoutvision:Get*",
"monitron:List*",
"monitron:Describe*",
"monitron:Get*",
"personalize:List*",
"personalize:Describe*",
"personalize:Get*",
"polly:List*",
"polly:Describe*",
"polly:Get*",
"rekognition:List*",
"rekognition:Describe*",
"rekognition:Get*",
"sagemaker:List*",
"sagemaker:Describe*",
"sagemaker:Get*",
"q:List*",
"q:Describe*",
"q:Get*",
"textract:List*",
"textract:Describe*",
"textract:Get*",
"transcribe:List*",
"transcribe:Describe*",
"transcribe:Get*",
"translate:List*",
"translate:Describe*",
"translate:Get*",
"deepcomposer:List*",
"deepcomposer:Describe*",
"deepcomposer:Get*",
"deepracer:List*",
"deepracer:Describe*",
"deepracer:Get*",
"panorama:List*",
"panorama:Describe*",
"panorama:Get*"
],
"Resource": "*"
}
]
}GCP
Default
Uses the standard GCP roles for comprehensive resource access:
- roles/reader
- roles/viewerMinimal permissions
Grants specific viewer roles for essential Cast AI functionality:
- roles/compute.viewer
- roles/container.viewer
- roles/cloudsql.viewer
- roles/billing.viewer
- roles/resourcemanager.projectViewer
- roles/aiplatform.viewerOracle
Default
Oracle Cloud Infrastructure uses policy statements for resource access:
Allow group CastAI-Group to read compartments in tenancy
Allow group CastAI-Group to read cluster-family in tenancy
Allow group CastAI-Group to read instance-family in tenancy
Allow group CastAI-Group to read volume-family in tenancy
Allow group CastAI-Group to read database-family in tenancy
Allow group CastAI-Group to read virtual-network-family in tenancy
Permission Configuration
Permission scopes are configurable during the Cloud Connect integration setup. Based on their security requirements, users can select from available permission levels.
Example permission scope selection for Cloud Connect:
Permissions for Commitments import
Commitments import provides access to reserved instances, savings plans, and cost commitment data.
AWS
NoteFor AWS, commitment data is synchronized every hour.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"savingsplans:Describe*",
"savingsplans:List*",
"ec2:DescribeReservedInstances",
"ec2:DescribeReservedInstancesListings",
"ec2:DescribeReservedInstancesModifications",
"ec2:DescribeReservedInstancesOfferings",
"organizations:ListAccounts",
"organizations:DescribeOrganization",
"account:ListRegions"
],
"Resource": "*"
}
]
}Azure
Azure commitments require read access to subscription and billing data:
{
"Name": "CastAI-Commitments-Reader",
"Actions": [
"Microsoft.Consumption/*/read",
"Microsoft.Billing/*/read",
"Microsoft.Commerce/*/read",
"Microsoft.CostManagement/*/read",
"Microsoft.Resources/subscriptions/read",
"Microsoft.Resources/subscriptions/resourceGroups/read"
],
"AssignableScopes": [
"/subscriptions/{subscription-id}"
]
}Updated 2 days ago
