GKE service account impersonation

Cast AI supports GKE service account impersonation as an alternative to using service account keys. This method enhances security by eliminating the need for key management while allowing Cast AI to access your GCP resources.

How impersonation works

Cast AI uses a two-tier service account architecture for GKE impersonation:

Organization-level impersonation service account: Cast AI creates one impersonation service account per organization, shared across all clusters within that organization. This service account uses the naming pattern cast-gke-<hash>@prod-cast-identity.iam.gserviceaccount.com.

Cluster-specific service accounts: Each cluster maintains its own dedicated GCP service account in your project, following the castai-gke-<cluster-name-hash> naming convention for direct resource management.

Prerequisites

Before setting up impersonation, ensure you have:

  • An existing GCP service account with the necessary permissions for your GKE cluster
  • The Cast AI API token and cluster ID
  • The gcloud CLI configured with appropriate permissions
  • The jq command-line JSON processor installed

Setup process

Using the Terraform module (optional)

Cast AI provides a Terraform module for GKE IAM impersonation that automates the IAM policy configuration process. This module handles the complex IAM policy bindings and conditions required for impersonation.

If you use the Terraform module, you'll still need to call the /gcp-create-sa API endpoint to complete the registration with Cast AI's system. This API call is required for Cast AI's internal cluster linking and cannot be replaced by infrastructure provisioning alone.

Step 1: Register the impersonation service account

Call the /gcp-create-sa API endpoint to register your service account for impersonation. This step is mandatory for every cluster, even when managing service accounts through infrastructure-as-code.

curl -X POST \
  -H "X-API-Key: $CASTAI_API_TOKEN" \
  "$CASTAI_API_URL/v1/kubernetes/external-clusters/$CASTAI_CLUSTER_ID/gcp-create-sa" \
  -d '{
    "gke": {
      "project_id": "'"$PROJECT_ID"'",
      "gke_sa_impersonate": "'"$SERVICE_ACCOUNT_EMAIL"'"
    }
  }'
📘

Note

This API call returns the same impersonation service account for all clusters in your organization. While the service account remains consistent, each API call is required to:

  • Link the GCP project to the specific cluster in Cast AI's system
  • Associate the impersonated service account with the cluster configuration
  • Enable proper permissions mapping for cluster operations

Step 2: Configure service account permissions

Grant the following permissions to your service account:

PermissionPurpose
roles/iam.serviceAccountUserAllows Cast AI to act as the service account
roles/iam.serviceAccountTokenCreatorEnables token generation for impersonation
compute.subnetworks.useExternalIpRequired for network operations
compute.networks.useExternalIpRequired for network operations

Step 3: Set configuration variables

Configure your environment with the following settings:

  • Set CASTAI_IMPERSONATE=true in your environment variables
  • Include both gkeSaImpersonate and projectId fields in your GKE configuration block

Implementation example

The following script demonstrates the complete impersonation setup process:

if [[ -n $CASTAI_IMPERSONATE ]]; then
  echo "Registering service account for impersonation: $SERVICE_ACCOUNT_EMAIL"
  
  # Register service account with Cast AI
  RESPONSE=$(curl -sSL --write-out "HTTP_STATUS:%{http_code}" \
    -X POST -H "X-API-Key: $CASTAI_API_TOKEN" \
    "$CASTAI_API_URL/v1/kubernetes/external-clusters/$CASTAI_CLUSTER_ID/gcp-create-sa" \
    -d '{"gke":{"project_id":"'$PROJECT_ID'","gke_sa_impersonate":"'$SERVICE_ACCOUNT_EMAIL'"}}')
  
  RESPONSE_STATUS=$(echo "$RESPONSE" | tr -d '\n' | sed -e 's/.*HTTP_STATUS://')
  RESPONSE_BODY=$(echo "$RESPONSE" | sed -e 's/HTTP_STATUS\:.*//g')
  
  if [[ $RESPONSE_STATUS != "200" ]]; then
    echo "Failed to register service account for impersonation. HTTP status: $RESPONSE_STATUS"
    echo $RESPONSE_BODY
    exit 1
  fi

  # Extract Cast AI service account from response
  CASTAI_SERVICE_ACCOUNT=$(echo "$RESPONSE_BODY" | jq -r '.serviceAccountEmail')
  echo "Cast AI impersonation service account: $CASTAI_SERVICE_ACCOUNT"
  
  if [[ "$CASTAI_SERVICE_ACCOUNT" == "" || "$CASTAI_SERVICE_ACCOUNT" == "null" ]]; then
    echo "Failed to retrieve Cast AI service account from response"
    echo $RESPONSE_BODY
    exit 1
  fi

  # Clean up existing IAM policy bindings
  echo "Removing existing IAM policy bindings"
  gcloud projects remove-iam-policy-binding $SERVICE_ACCOUNT_EMAIL \
    --member="serviceAccount:$CASTAI_SERVICE_ACCOUNT" \
    --project $PROJECT_ID \
    --role='roles/iam.serviceAccountUser' \
    --all --no-user-output-enabled >/dev/null 2>&1
    
  gcloud projects remove-iam-policy-binding $SERVICE_ACCOUNT_EMAIL \
    --member="serviceAccount:$CASTAI_SERVICE_ACCOUNT" \
    --project $PROJECT_ID \
    --role='roles/iam.serviceAccountTokenCreator' \
    --all --no-user-output-enabled >/dev/null 2>&1
  
  # Grant token creator permissions
  echo "Configuring impersonation permissions"
  gcloud iam service-accounts add-iam-policy-binding $SERVICE_ACCOUNT_EMAIL \
    --member="serviceAccount:$CASTAI_SERVICE_ACCOUNT" \
    --role="roles/iam.serviceAccountTokenCreator" \
    --condition="title=AlwaysTrueCondition,description=This condition is always true,expression=true" \
    --project $PROJECT_ID

  # Grant impersonation permissions with conditional access
  gcloud iam service-accounts add-iam-policy-binding $SERVICE_ACCOUNT_EMAIL \
    --member="serviceAccount:$CASTAI_SERVICE_ACCOUNT" \
    --role="roles/iam.serviceAccountUser" \
    --condition="title=SpecificServiceAccountCondition,description=Allow impersonation only for Cast AI service account,expression=request.auth.claims.email == \"$CASTAI_SERVICE_ACCOUNT\"" \
    --project $PROJECT_ID
    
  echo "Waiting for IAM permissions to propagate (180 seconds)"
  sleep 180

  # Update cluster configuration
  echo "Updating cluster configuration with impersonation settings"
  RESPONSE=$(curl -sSL --write-out "HTTP_STATUS:%{http_code}" \
    -X POST -H "X-API-Key: $CASTAI_API_TOKEN" \
    "$CASTAI_API_URL/v1/kubernetes/external-clusters/$CASTAI_CLUSTER_ID" \
    -d '{"credentials":"{}"}')
    
  RESPONSE_STATUS=$(echo "$RESPONSE" | tr -d '\n' | sed -e 's/.*HTTP_STATUS://')
  RESPONSE_BODY=$(echo "$RESPONSE" | sed -e 's/HTTP_STATUS\:.*//g')

  if [[ $RESPONSE_STATUS -eq 200 ]]; then
    echo "Impersonation setup completed successfully"
  else
    echo "Failed to update cluster configuration with impersonation settings"
    echo "Error details: HTTP $RESPONSE_STATUS - $RESPONSE_BODY"
    exit 1
  fi
fi

Troubleshooting

HTTP 500 Internal Server Error during cluster update

Symptoms: The cluster update fails with a 500 Internal Server Error after successfully configuring IAM permissions.

Common causes:

  • The /gcp-create-sa API endpoint was not called before attempting the cluster update
  • Mixing impersonation and non-impersonation authentication methods in the same script
  • Insufficient IAM permissions on the service account
  • IAM permission changes have not yet propagated

Resolution steps:

  1. Verify that you called /gcp-create-sa before updating the cluster configuration
  2. Ensure your script uses either impersonation or key-based authentication consistently
  3. Confirm all required permissions are properly configured on your service account
  4. Wait at least 3 minutes after configuring IAM permissions before updating the cluster