Cloud permissions

Permissions setup used by cloud providers (AWS/GCP/Azure)

When the cluster enters the mode of automated cost optimization – the central system of CAST AI can start performing operations on the Cloud Provider (AWS/GCP/Azure) level. An example of such action would be a request to add a node to a cluster.

Performing such operations requires relevant credentials and permissions specific to your Cloud Service Provider. This guide describes permission setups for AWS, GCP and Azure.

AWS

AWS permissions with access granted using a cross-account IAM role

When enabling cost optimization for a connected cluster, you are granting permissions using a cross-account IAM role.

This feature creates a dedicated cluster user in the CAST AI AWS account with a trust policy able to assume the role defined in your AWS account.

Keeping role definitions and users in separate AWS accounts enables storing the user's credentials on the CAST AI side without handing them over when running the onboarding script. In turn, this improves security levels.

You can verify the set of permissions using the following command:

aws iam list-attached-role-policies --role-name <role name>
aws iam list-role-policies --role-name <role name>

{
    "PolicyVersion": {
        "Document": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "PassRoleEC2",
                    "Action": "iam:PassRole",
                    "Effect": "Allow",
                    "Resource": "arn:aws:iam::*:role/*",
                    "Condition": {
                        "StringEquals": {
                            "iam:PassedToService": "ec2.amazonaws.com"
                        }
                    }
                },
                {
                    "Sid": "NonResourcePermissions",
                    "Effect": "Allow",
                    "Action": [
                        "iam:CreateServiceLinkedRole",
                        "ec2:CreateKeyPair",
                        "ec2:DeleteKeyPair",
                        "ec2:CreateTags",
                        "ec2:ImportKeyPair"
                    ],
                    "Resource": "*"
                },
                {
                    "Sid": "RunInstancesPermissions",
                    "Effect": "Allow",
                    "Action": "ec2:RunInstances",
                    "Resource": [
                        "arn:aws:ec2:*:028075177508:network-interface/*",
                        "arn:aws:ec2:*:028075177508:security-group/*",
                        "arn:aws:ec2:*:028075177508:volume/*",
                        "arn:aws:ec2:*:028075177508:key-pair/*",
                        "arn:aws:ec2:*::image/*"
                    ]
                }
            ]
        },
        "VersionId": "v83",
        "IsDefaultVersion": true,
        "CreateDate": "2022-05-12T12:49:01+00:00"
    }
}

Additionally, you can create a trust relationship with the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789012:user/cast-crossrole-f8f82b9c-d375-40d2-9483-123456789012"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

GCP

The GCP service account used by CAST AI

The onboarding script creates a dedicated GCP service account CAST AI uses to request and manage GCP resources on your behalf.

The Service Account follows a castai-gke-<cluster-name-hash> convention. You can verify the service account by:

gcloud iam service-accounts describe castai-gke-<cluster-name-hash>@<your-gcp-project>.iam.gserviceaccount.com

The service account created by CAST AI includes the following roles:

Role nameDescription
castai.gkeAccessA CAST AI-managed role used to handle CAST AI add/delete node operations. You can find a full list of permissions below.
container.developerA GCP-managed role for full access to Kubernetes API objects inside the Kubernetes cluster.
iam.serviceAccountUserA GCP-managed role to allow running operations as a service account.

IAM Conditions

When creating that service account you can enforce conditional, attribute-based access on the iam.serviceAccountUser role.
It can access and act as all other service accounts or be scoped to the ones used by node pools in the GKE cluster, which is more secure and therefore recommended. By default the onboarding script follows the more secure option.

When onboarding the cluster with Terraform you can use the castai_gke_iam module to specify which method you want to use, you can find an example here.

List of castai.gkeAccess role permissions:

» gcloud iam roles describe --project=<your-project-name> castai.gkeAccess

description: Role to manage GKE cluster via CAST AI
etag: example-tag
includedPermissions:
- compute.addresses.use
- compute.disks.create
- compute.disks.setLabels
- compute.disks.use
- compute.images.useReadOnly
- compute.instanceGroupManagers.get
- compute.instanceGroupManagers.update
- compute.instanceGroups.get
- compute.instanceTemplates.create
- compute.instanceTemplates.delete
- compute.instanceTemplates.get
- compute.instanceTemplates.list
- compute.instances.create
- compute.instances.delete
- compute.instances.get
- compute.instances.list
- compute.instances.setLabels
- compute.instances.setMetadata
- compute.instances.setServiceAccount
- compute.instances.setTags
- compute.instances.start
- compute.instances.stop
- compute.networks.use
- compute.networks.useExternalIp
- compute.subnetworks.get
- compute.subnetworks.use
- compute.subnetworks.useExternalIp
- compute.zones.get
- compute.zones.list
- container.certificateSigningRequests.approve
- container.clusters.get
- container.clusters.update
- container.operations.get
- serviceusage.services.list
- resourcemanager.projects.getIamPolicy
name: projects/<your-project-name>/roles/castai.gkeAccess
stage: ALPHA
title: Role to manage GKE cluster via CAST AI

Azure

An overview of Azure permissions used by CAST AI

The onboarding script creates a dedicated Azure app registration for CAST AI to request and manage Azure resources on your behalf.

App registration naming follows this convention: CAST.AI ${CLUSTER_NAME}-${CASTAI_CLUSTER_ID:0:8}".

Created CAST AI app registration has a custom role bound to it. Custom role name follows this naming convention: CastAKSRole-${CASTAI_CLUSTER_ID:0:8}.

The role has a predefined list of permissions scoped only to managed cluster's resource groups.

List of CastAKSRole role permissions:

ROLE_NAME="CastAKSRole-${CASTAI_CLUSTER_ID:0:8}"
ROLE_DEF='{
   "Name": "'"$ROLE_NAME"'",
   "Description": "CAST.AI role used to manage '"$CLUSTER_NAME"' AKS cluster",
   "IsCustom": true,
   "Actions": [
       "Microsoft.Compute/*/read",
       "Microsoft.Compute/virtualMachines/*",
       "Microsoft.Compute/virtualMachineScaleSets/*",
       "Microsoft.Compute/disks/write",
       "Microsoft.Compute/disks/delete",
       "Microsoft.Compute/disks/beginGetAccess/action",
       "Microsoft.Compute/galleries/write",
       "Microsoft.Compute/galleries/delete",
       "Microsoft.Compute/galleries/images/write",
       "Microsoft.Compute/galleries/images/delete",
       "Microsoft.Compute/galleries/images/versions/write",
       "Microsoft.Compute/galleries/images/versions/delete",
       "Microsoft.Compute/snapshots/write",
       "Microsoft.Compute/snapshots/delete",
       "Microsoft.Network/*/read",
       "Microsoft.Network/networkInterfaces/write",
       "Microsoft.Network/networkInterfaces/delete",
       "Microsoft.Network/networkInterfaces/join/action",
       "Microsoft.Network/networkSecurityGroups/join/action",
       "Microsoft.Network/virtualNetworks/subnets/join/action",
       "Microsoft.Network/applicationGateways/backendhealth/action",
       "Microsoft.Network/applicationGateways/backendAddressPools/join/action",
       "Microsoft.Network/applicationSecurityGroups/joinIpConfiguration/action",
       "Microsoft.Network/loadBalancers/backendAddressPools/write",
       "Microsoft.Network/loadBalancers/backendAddressPools/join/action",
       "Microsoft.ContainerService/*/read",
       "Microsoft.ContainerService/managedClusters/start/action",
       "Microsoft.ContainerService/managedClusters/stop/action",
       "Microsoft.ContainerService/managedClusters/runCommand/action",
       "Microsoft.ContainerService/managedClusters/agentPools/*",
       "Microsoft.Resources/*/read",
       "Microsoft.Resources/tags/write",
       "Microsoft.Authorization/locks/read",
       "Microsoft.Authorization/roleAssignments/read",
       "Microsoft.Authorization/roleDefinitions/read",
       "Microsoft.ManagedIdentity/userAssignedIdentities/assign/action"
     ],
     "AssignableScopes": [
       "/subscriptions/'"$SUBSCRIPTION_ID"'/resourceGroups/'"$CLUSTER_GROUP"'",
       "/subscriptions/'"$SUBSCRIPTION_ID"'/resourceGroups/'"$NODE_GROUP"'"
     ]
}'