Cloud permissions
Permissions used by Cloud Providers (AWS/GCP/Azure)
Permissions setup used by cloud providers (AWS/GCP/Azure)
When the cluster enters the mode of automated cost optimization, the central system of Cast AI can start performing operations at the Cloud Provider (AWS/GCP/Azure) level. An example of such action would be a request to add a node to a cluster.
Performing such operations requires relevant credentials and permissions specific to your Cloud Service Provider. This guide describes permission setups for AWS, GCP, and Azure.
AWS
AWS permissions with access granted using a cross-account IAM role
When enabling cost optimization for a connected cluster, you grant permissions using a cross-account IAM role.
This feature creates a dedicated cluster user in the Cast AI AWS account with a trust policy that can assume the role defined in your AWS account.
Keeping role definitions and users in separate AWS accounts enables storing the user's credentials on the Cast AI side without handing them over when running the onboarding script, which improves security levels.
You can verify the set of permissions and trust relationship on your IAM role using the following commands:
aws iam get-role --role-name <role name>
aws iam list-attached-role-policies --role-name <role name>
aws iam list-role-policies --role-name <role name>
The following AWS-managed permissions are required for Cast AI to work:
In addition, the following custom policy is necessary:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PassRoleEC2",
"Action": "iam:PassRole",
"Effect": "Allow",
"Resource": "arn:aws:iam::*:role/*",
"Condition": {
"StringEquals": {
"iam:PassedToService": "ec2.amazonaws.com"
}
}
},
{
"Sid": "NonResourcePermissions",
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole",
"ec2:CreateKeyPair",
"ec2:DeleteKeyPair",
"ec2:CreateTags",
"ec2:ImportKeyPair"
],
"Resource": "*"
},
{
"Sid": "RunInstancesPermissions",
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:<account-id>:network-interface/*",
"arn:aws:ec2:*:<account-id>:security-group/*",
"arn:aws:ec2:*:<account-id>:volume/*",
"arn:aws:ec2:*:<account-id>:key-pair/*",
"arn:aws:ec2:*::image/*"
]
},
{
"Sid": "RunInstancesTagRestriction",
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:<region>:<account-id>:instance/*",
"Condition": {
"StringEquals": {
"aws:RequestTag/kubernetes.io/cluster/<cluster-name>": "owned"
}
}
},
{
"Sid": "RunInstancesVpcRestriction",
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:<region>:<account-id>:subnet/*",
"Condition": {
"StringEquals": {
"ec2:Vpc": "arn:aws:ec2:<region>:<account-id>:vpc/<vpc-id>"
}
}
},
{
"Sid": "InstanceActionsTagRestriction",
"Effect": "Allow",
"Action": [
"ec2:TerminateInstances",
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:CreateTags"
],
"Resource": "arn:aws:ec2:<region>:<account-id>:instance/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/kubernetes.io/cluster/<cluster-name>": [
"owned",
"shared"
]
}
}
},
{
"Sid": "AutoscalingActionsTagRestriction",
"Effect": "Allow",
"Action": [
"autoscaling:UpdateAutoScalingGroup",
"autoscaling:SuspendProcesses",
"autoscaling:ResumeProcesses",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "arn:aws:autoscaling:<region>:<account-id>:autoScalingGroup:*:autoScalingGroupName/*",
"Condition": {
"StringEquals": {
"autoscaling:ResourceTag/kubernetes.io/cluster/<cluster-name>": [
"owned",
"shared"
]
}
}
},
{
"Sid": "EKS",
"Effect": "Allow",
"Action": [
"eks:Describe*",
"eks:List*",
"eks:TagResource",
"eks:UntagResource"
],
"Resource": [
"arn:aws:eks:<region>:<account-id>:cluster/<cluster-name>",
"arn:aws:eks:<region>:<account-id>:nodegroup/<cluster-name>/*/*"
]
}
]
}
The IAM role needs the following trust relationship to allow Cast AI to assume it:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::809060229965:user/cast-crossrole-<cluster-id>"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<cluster-id>"
}
}
}
]
}
EBS Volume Permissions
If your cluster uses Amazon EBS volumes, you need to ensure that the IAM roles associated with your nodes have the necessary permissions to manage them, including the ability to attach and detach them.
Permissions Required
To manage EBS volumes, add the following permissions to your IAM role:
{
"Sid": "ManageEBSVolumes",
"Effect": "Allow",
"Action": [
"ec2:AttachVolume",
"ec2:DetachVolume"
],
"Resource": "*"
}
You can include these permissions in your existing IAM role policies or attach the AmazonEBSCSIDriverPolicy
, which includes these permissions.
Example Policy Statement:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ManageEBSVolumes",
"Effect": "Allow",
"Action": [
"ec2:AttachVolume",
"ec2:DetachVolume"
],
"Resource": "*"
}
]
}
Ensure that these permissions are attached to the IAM role used by the instances running in your cluster to avoid issues with persistent volumes.
IPv6 support for EKS clusters
If your cluster uses IPv6 addressing, you need an additional IAM policy to allow pods to assign IPv6 addresses.
Add the following policy to your instance profile:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ec2:AssignIpv6Addresses",
"Resource": "*"
}
]
}
Note
The Cast AI onboarding script and Terraform module automatically create this policy by default. You only need to add it manually if you're setting up permissions outside of these standard deployment methods.
GCP
The GCP service account used by CAST AI
The onboarding script creates a dedicated GCP service account that CAST AI uses to request and manage GCP resources on your behalf.
The service account follows a castai-gke-<cluster-name-hash>
convention. You can verify the service account with the following command:
gcloud iam service-accounts describe castai-gke-<cluster-name-hash>@<your-gcp-project>.iam.gserviceaccount.com
The service account created by CAST AI includes the following roles:
Role name | Description |
---|---|
castai.gkeAccess | A CAST AI-managed role used to handle CAST AI add/delete node operations. You can find a full list of permissions below. |
iam.serviceAccountUser | A GCP-managed role to allow running operations as a service account. |
Important update for existing customers
As of 2024-08-19, CAST AI no longer requires the
roles/container.developer
role for new cluster onboarding. This change enhances security by reducing permissions.
If you onboarded your cluster before this date, you may still see thecontainer.developer
role assigned to your CAST AI service account. You can safely remove this role to align with our updated, more secure permissions model.
To remove the role, use the following command:gcloud projects remove-iam-policy-binding YOUR_PROJECT_ID \ --member=serviceAccount:castai-gke-<cluster-name-hash>@<your-gcp-project>.iam.gserviceaccount.com \ --role=roles/container.developer
This action will not affect CAST AI's ability to manage your cluster.
IAM Conditions
When creating that service account, you can enforce conditional, attribute-based access on the iam.serviceAccountUser
role.
It can access and act as all other service accounts or be scoped to those used by node pools in the GKE cluster, which is more secure and recommended. By default, the onboarding script follows the more secure option.
When onboarding the cluster with Terraform, you can use the castai_gke_iam module to specify which method you want to use. You can find an example here.
Required APIs to be enabled for a GCP project
API | Description |
---|---|
cloudresourcemanager.googleapis.com | API to create, read, and update metadata for GCP resource containers. |
serviceusage.googleapis.com | API to list, enable and disable GCP services. |
List of castai.gkeAccess
role permissions:
castai.gkeAccess
role permissions:ยป gcloud iam roles describe --project=<your-project-name> castai.gkeAccess
description: Role to manage GKE cluster via CAST AI
etag: example-tag
includedPermissions:
- compute.addresses.use
- compute.disks.create
- compute.disks.setLabels
- compute.disks.use
- compute.images.useReadOnly
- compute.images.get
- compute.instanceGroupManagers.get
- compute.instanceGroupManagers.update
- compute.instanceGroups.get
- compute.instanceTemplates.create
- compute.instanceTemplates.delete
- compute.instanceTemplates.get
- compute.instanceTemplates.list
- compute.instances.create
- compute.instances.delete
- compute.instances.get
- compute.instances.list
- compute.instances.setLabels
- compute.instances.setMetadata
- compute.instances.setServiceAccount
- compute.instances.setTags
- compute.instances.start
- compute.instances.stop
- compute.networks.use
- compute.networks.useExternalIp
- compute.subnetworks.get
- compute.subnetworks.use
- compute.subnetworks.useExternalIp
- compute.zones.get
- compute.zones.list
- compute.zoneOperations.get
- compute.regionOperations.get
- container.certificateSigningRequests.approve
- container.clusters.get
- container.clusters.update
- container.operations.get
- serviceusage.services.list
- resourcemanager.projects.getIamPolicy
name: projects/<your-project-name>/roles/castai.gkeAccess
stage: ALPHA
title: Role to manage GKE cluster via CAST AI
Azure
An overview of Azure permissions used by CAST AI
The onboarding script creates a dedicated Azure app registration for CAST AI to request and manage Azure resources on your behalf.
App registration naming follows this convention: CAST.AI ${CLUSTER_NAME}-${CASTAI_CLUSTER_ID:0:8}"
.
Created CAST AI app registration has a custom role bound to it. Custom role name follows this naming convention: CastAKSRole-${CASTAI_CLUSTER_ID:0:8}
.
The role only has a predefined list of permissions to manage cluster resource groups.
List of CastAKSRole
role permissions:
CastAKSRole
role permissions:ROLE_NAME="CastAKSRole-${CASTAI_CLUSTER_ID:0:8}"
ROLE_DEF='{
"Name": "'"$ROLE_NAME"'",
"Description": "CAST.AI role used to manage '"$CLUSTER_NAME"' AKS cluster",
"IsCustom": true,
"Actions": [
"Microsoft.Compute/*/read",
"Microsoft.Compute/virtualMachines/*",
"Microsoft.Compute/virtualMachineScaleSets/*",
"Microsoft.Compute/disks/write",
"Microsoft.Compute/disks/delete",
"Microsoft.Compute/disks/beginGetAccess/action",
"Microsoft.Compute/galleries/write",
"Microsoft.Compute/galleries/delete",
"Microsoft.Compute/galleries/images/write",
"Microsoft.Compute/galleries/images/delete",
"Microsoft.Compute/galleries/images/versions/write",
"Microsoft.Compute/galleries/images/versions/delete",
"Microsoft.Compute/snapshots/write",
"Microsoft.Compute/snapshots/delete",
"Microsoft.Network/*/read",
"Microsoft.Network/networkInterfaces/write",
"Microsoft.Network/networkInterfaces/delete",
"Microsoft.Network/networkInterfaces/join/action",
"Microsoft.Network/networkSecurityGroups/join/action",
"Microsoft.Network/virtualNetworks/subnets/join/action",
"Microsoft.Network/applicationGateways/backendhealth/action",
"Microsoft.Network/applicationGateways/backendAddressPools/join/action",
"Microsoft.Network/applicationSecurityGroups/joinIpConfiguration/action",
"Microsoft.Network/loadBalancers/backendAddressPools/write",
"Microsoft.Network/loadBalancers/backendAddressPools/join/action",
"Microsoft.ContainerService/*/read",
"Microsoft.ContainerService/managedClusters/start/action",
"Microsoft.ContainerService/managedClusters/stop/action",
"Microsoft.ContainerService/managedClusters/runCommand/action",
"Microsoft.ContainerService/managedClusters/agentPools/*",
"Microsoft.Resources/*/read",
"Microsoft.Resources/tags/write",
"Microsoft.Authorization/locks/read",
"Microsoft.Authorization/roleAssignments/read",
"Microsoft.Authorization/roleDefinitions/read",
"Microsoft.ManagedIdentity/userAssignedIdentities/assign/action"
],
"AssignableScopes": [
"/subscriptions/'"$SUBSCRIPTION_ID"'/resourceGroups/'"$CLUSTER_GROUP"'",
"/subscriptions/'"$SUBSCRIPTION_ID"'/resourceGroups/'"$NODE_GROUP"'"
]
}'
Updated about 1 month ago