Enable automation

How it works

CAST AI needs to access your cluster to enable automated cost optimization and security reporting. The following section describes the steps to onboard a cluster to CAST AI, together with the actions the onboarding script performs in your account.

📘

Important!

Before starting to onboard a cluster, make sure that the read-only agent is already running.

Cluster onboarding is carried out by the automated script you can get by clicking the Enable CAST AI button or via the API.

The following sections describe the prerequisites for cluster onboarding for each of the supported cloud providers and the actions the onboarding script performs.

EKS

Prerequisites

  • AWS CLI– a command-line tool for working with AWS services using commands in your command-line shell. For more details, see Installing AWS CLI.

  • jq – a lightweight command-line JSON processor. For more details, click here.

  • kubectl – a Kubernetes command-line tool that allows running commands against Kubernetes clusters. For more details, see kubectl.

  • helm – a command-line tool that simplifies deploying applications and services to Kubernetes clusters. For more details, see helm.

  • IAM permissions – the IAM security principal you're using must have permission to work with AWS EKS, AWS IAM, and related resources. Additionally, you should have access to the EKS cluster that you wish to onboard into the CAST AI console.

  • The CAST AI agent must be running on the cluster. Learn more about Installing the CAST AI agent

Here's an example of a policy of least privilege for the administrator account, with permissions needed to run the onboarding script, used once per cluster during its onboarding.

{
    "Action": [
        "iam:CreateRole",
        "iam:CreatePolicy",
        "iam:GetPolicy",
        "iam:ListPolicyVersions",
        "iam:PutRolePolicy",
        "iam:AttachRolePolicy",
        "iam:CreateInstanceProfile",
        "iam:GetInstanceProfile",
        "iam:AddRoleToInstanceProfile",
        "iam:UpdateAssumeRolePolicy",
        "ec2:CreateSecurityGroup",
        "ec2:AuthorizeSecurityGroupEgress",
        "ec2:AuthorizeSecurityGroupIngress"
    ]
}

Actions the onboarding script performs

The script performs the following actions:

  1. Create a cast-eks-*cluster-name*-cluster-role IAM role with the permissions required to manage the cluster:

    • AmazonEC2ReadOnlyAccess
    • IAMReadOnlyAccess
    • Manage instances in a specified cluster restricted to cluster VPC.
    • Manage autoscaling groups in the specified cluster.
    • Manage EKS Node Groups in the specified cluster.
  2. Create a CastEKSPolicy policy used to manage EKS clusters. The policy contains the following permissions:

    • Create & delete instance profiles.
    • Create & manage roles.
    • Create & manage EC2 security groups, key pairs, and tags.
    • Run EC2 instances.
  3. Create the following roles:

    • cast-*cluster-name*-eks-####### used by instance profile with the following AWS-managed permission policies applied:
      • AmazonEKSWorkerNodePolicy
      • AmazonEC2ContainterRegistryReadOnly
      • AmazonEKS_CNI_Policy
      • Role ARN is printed and sent to the CAST AI console. The platform then uses it to assume the role when making AWS programmatic calls.

📘

The scope of permissions

All Write permissions are scoped to the specified EKS cluster without access to the resources of any other clusters in the AWS account.

You can grant extra permissions to CAST AI roles if you need access to resources and actions not included in the onboarding script's policies.

Manual credential onboarding

You can complete the steps listed above manually without using the script. However, keep in mind that when you create an Amazon EKS cluster, the IAM entity user or role (e.g., a federated user that creates the cluster) automatically gets system:masters permissions in the cluster's RBAC configuration in the control plane.

To allow additional AWS users or roles to interact with your cluster, you need to edit the aws-auth ConfigMap in Kubernetes. For more information, see Managing users or IAM roles for your cluster.

Using AWS services

CAST AI relies on the agent running inside your cluster. The operation consumes the following services:

  • A portion of the EC2 node resources from your cluster. The CAST AI agent uses the cluster proportional vertical autoscaler to consume the minimum required resources depending on the cluster size.
  • Low amount of network traffic to communicate with the CAST AI SaaS.
  • EC2 instances, their storage, and intra-cluster network traffic to manage Kubernetes cluster and perform autoscaling.
  • IAM resources as detailed in the onboarding section

You can find a full overview of permissions used by the CAST AI-created IAM role here.

GKE

Prerequisites

  • gcloud – A command-line tool for working with GKE services using commands in your command-line shell. For more details, see Installing gcloud.

  • jq – a lightweight command-line JSON processor. For more details, click here.

  • IAM permissions – The IAM security principal that you use to onboard the cluster must include:

    • Access to the project where the cluster is created.
    • Permission to work with IAM, GKE, and compute resources.
  • kubectl – a Kubernetes command-line tool allowing to run commands against Kubernetes clusters. For more details, see kubectl.

  • helm – a command-line tool that simplifies the deployment of applications and services to Kubernetes clusters helm.

  • The CAST AI agent has to be running on the cluster. Learn more about installing CAST AI agent

Here's an example of the least privilege permissions for the administrator account needed to run the onboarding script, used once per cluster during its onboarding:

- serviceusage.services.enable
- servicemanagement.services.bind
- container.clusters.get
- container.clusters.list
- iam.serviceAccounts.get
- iam.serviceAccounts.create
- iam.roles.get
- iam.roles.update
- iam.roles.create
- iam.serviceAccountKeys.create

Actions the onboarding script performs

Phase 2 onboarding script performs several actions to get the permissions required to manage GKE and GCP resources on your behalf:

  • Enables the necessary GCP services and APIs for the project.
  • Creates the IAM service account and assigns the required roles to it.
  • Generates an IAM service account key that CAST AI components use to manage GKE and GCP resources on your behalf.
  • Enables the following GCP services and APIs for the project in which the GKE cluster is running:
GCP Service / API GroupDescription
serviceusage.googleapis.comAPI to list, enable and disable GCP services.
iam.googleapis.comAPI to manage identity and access control for GCP resources
cloudresourcemanager.googleapis.comAPI to create, read, and update metadata for GCP resource containers
container.googleapis.comAPI to manage GKE
compute.googleapis.comAPI to manage GCP virtual machines
  • Creates a dedicated GCP service account castai-gke-<cluster-name-hash> for CAST AI to request and manage GCP resources on your behalf.

  • Creates a custom role castai.gkeAccess with the following permissions.

  • Attaches required roles to the castai-gke-<cluster-name-hash> service account.

  • Installs Kubernetes components required for a successful experience with CAST AI:

$ kubectl get deployments.apps   -n castai-agent
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
castai-agent                1/1     1            1           15m
castai-agent-cpvpa          1/1     1            1           15m
castai-cluster-controller   2/2     2            2           15m
castai-evictor              0/0     0            0           15m
castai-kvisor               1/1     1            1           15m

$ kubectl get daemonsets.apps -n castai-agent
NAME                   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                            AGE
castai-spot-handler    0         0         0       0            0           scheduling.cast.ai/spot=true             15m

You can find a full overview of hosted components here.

For a full overview of permissions a CAST AI-created service account uses, click here.

GKE node pools created by CAST AI

After the cluster is onboarded, CAST AI will create two GKE node pools:

  • castpool – gathers necessary data required for creating CAST AI-managed GKE x86 nodes.
  • castpool-arm – collects the data required for creating CAST AI-managed GKE ARM64 nodes. It's created only if the cluster region supports ARM64 VMs.

AKS

Prerequisites

  • az CLI – a command-line tool for working with Azure services using commands in your command-line shell. For more details, see Installing az CLI.

  • jq – a lightweight command-line JSON processor. For more details, click here.

  • Azure AD permissions – the Azure identity that you use to onboard the cluster must include:

    • Access to the subscription where the cluster was created.
    • Permission to work with AKS and compute resources.
    • Permission to create an App registration.
  • kubectl – a Kubernetes command-line tool that allows running commands against Kubernetes clusters. For more details, see kubectl.

  • helm – a command-line tool that simplifies the deployment of applications and services to Kubernetes clusters helm.

  • The CAST AI agent has to be running on the cluster. Learn more about Installing CAST AI agent

Actions the onboarding script performs

The script performs the following actions:

  • Create a CASTAKSRole-${CASTAI_CLUSTER_ID:0:8} role to manage onboarded AKS Cluster.

  • Create an app registration CAST.AI ${CLUSTER_NAME}-${CASTAI_CLUSTER_ID:0:8}" that uses the role CastAKSRole-${CASTAI_CLUSTER_ID:0:8}.

📘

The scope of permissions

All Write permissions are scoped to the resource group where your cluster runs. The script won't access the resources of any other clusters in your Azure subscription.

  • Install Kubernetes components required for a successful operation of CAST AI:
$ kubectl get deployments.apps   -n castai-agent
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
castai-agent                1/1     1            1           3h26m
castai-agent-cpvpa          1/1     1            1           3h26m
castai-cluster-controller   2/2     2            2           3h26m
castai-evictor              0/0     0            0           3h26m
$ kubectl get daemonsets.apps -n castai-agent
NAME                   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                            AGE
castai-aks-init-data   0         0         0       0            0           provisioner.cast.ai/aks-init-data=true   3h26m
castai-spot-handler    0         0         0       0            0           scheduling.cast.ai/spot=true             3h26m

You can find a full overview of hosted components here.

For a full overview of permissions used by CAST AI-created service accounts, click here.

Azure agent pools created by CAST AI

After the cluster is onboarded, CAST AI will create two AKS agent pools:

  • castpool– is used to schedule a node required for CAST AI AKS image creation. CAST AI AKS images are re-created after every AKS control plane upgrade or every 30 days. While the image creation process is in progress, the castpool node with the name aks-castpool-xxxxxxxx-vmssxxxxxx can be seen in the "Not Ready" state for some time.
  • castworkers– is used as a container for CAST AI-managed AKS nodes. Removing this agent pool results in removing all CAST AI-created nodes.