TPU Instances (GKE)

Configure Cast AI Autoscaler to scale your GKE cluster using TPU-optimized instances with Google Cloud TPU v5 support.

Autoscaling using TPU instances

The Cast AI Autoscaler can scale GKE clusters using TPU-optimized instances. This guide describes the steps needed to configure your cluster so that TPU nodes can join it and workloads that request TPU resources are scheduled correctly.

Supported providers

ProviderTPU versions supported
GCP GKEv5e, v5p
📘

Note

TPU instances are only supported on GCP GKE. AWS EKS and Azure AKS are not supported.

How does it work?

Once a pending pod requests google.com/tpu resources, Cast AI's Autoscaler detects the workload and provisions a TPU node that satisfies the request.

Cast AI supports single-host TPU slices only. Multi-host (pod slice) topologies are not supported.

To enable provisioning of TPU nodes, you need:

  • A pod that requests google.com/tpu resources
  • A toleration for the google.com/tpu=true:NoSchedule taint
  • Node selectors that specify the TPU version and topology

Cast AI automatically selects the correct TPU instance type based on the node selectors in your pod specification — no device plugins or additional configuration are required. GKE handles TPU driver installation automatically on TPU nodes.

TPU node labels

Cast AI applies the following labels to TPU nodes, which you can use in node selectors or affinities:

LabelExample valueDescription
cloud.google.com/gke-tpu-acceleratortpu-v5-lite-podsliceTPU version and type
cloud.google.com/gke-tpu-topology2x2Physical chip topology of the TPU slice
cloud.google.com/gke-accelerator-count4Number of TPU chips on the node

Supported TPU versions and accelerator labels

TPU versionGKE machine type prefixgke-tpu-accelerator label valueTopology formatChips per host
TPU v5e (Lite)ct5lp-hightpu-{N}ttpu-v5-lite-podslice2D (e.g. 2x2)4
TPU v5pct5p-hightpu-{N}ttpu-v5p-slice3D (e.g. 2x2x1)4

Supported single-host topologies

TPU v5e (2D topologies)

ChipsTopology
11x1
42x2
82x4
164x4
324x8
648x8
1288x16
25616x16

TPU v5p (3D topologies)

ChipsTopologies (most balanced first)
11x1x1
42x2x1
82x2x2, 2x4x1
162x2x4, 4x2x2, 4x4x1
322x4x4, 4x4x2, 4x8x1
644x4x4, 8x8x1
1284x4x8, 8x8x2, 4x8x4
2564x8x8, 8x8x4

When multiple topologies are valid for a given chip count, Cast AI prefers the most balanced (closest to cube/square) topology.

Tainting of TPU nodes

TPU nodes will have the google.com/tpu=true:NoSchedule taint applied automatically. Your TPU workloads must include a matching toleration; without it, the pod will not be scheduled onto a TPU node, and Cast AI will not provision one.

Workload configuration examples

The following examples show common patterns for configuring TPU workloads.

TPU v5e — 4 chips (2x2 topology)

spec:
  nodeSelector:
    cloud.google.com/gke-tpu-accelerator: tpu-v5-lite-podslice
    cloud.google.com/gke-tpu-topology: 2x2
  tolerations:
    - key: "google.com/tpu"
      operator: Exists
      effect: NoSchedule
  containers:
    - image: my-tpu-image
      name: tpu-workload
      resources:
        requests:
          cpu: 4
          memory: 8Gi
          google.com/tpu: 4
        limits:
          cpu: 4
          memory: 8Gi
          google.com/tpu: 4

TPU v5p — 4 chips (2x2x1 topology)

spec:
  nodeSelector:
    cloud.google.com/gke-tpu-accelerator: tpu-v5p-slice
    cloud.google.com/gke-tpu-topology: 2x2x1
  tolerations:
    - key: "google.com/tpu"
      operator: Exists
      effect: NoSchedule
  containers:
    - image: my-tpu-image
      name: tpu-workload
      resources:
        requests:
          cpu: 4
          memory: 8Gi
          google.com/tpu: 4
        limits:
          cpu: 4
          memory: 8Gi
          google.com/tpu: 4

TPU v5e — 8 chips (2x4 topology)

spec:
  nodeSelector:
    cloud.google.com/gke-tpu-accelerator: tpu-v5-lite-podslice
    cloud.google.com/gke-tpu-topology: 2x4
  tolerations:
    - key: "google.com/tpu"
      operator: Exists
      effect: NoSchedule
  containers:
    - image: my-tpu-image
      name: tpu-workload
      resources:
        requests:
          cpu: 8
          memory: 16Gi
          google.com/tpu: 8
        limits:
          cpu: 8
          memory: 16Gi
          google.com/tpu: 8

Required pod fields

The following fields in your pod specification are required for Cast AI to correctly provision a TPU node:

FieldRequiredDescription
spec.nodeSelector["cloud.google.com/gke-tpu-accelerator"]YesSelects the TPU version. Must match a supported accelerator label value.
spec.nodeSelector["cloud.google.com/gke-tpu-topology"]YesSelects the chip topology. Must be a valid single-host topology for the chosen TPU version.
spec.tolerations with key google.com/tpuYesRequired to allow scheduling onto TPU nodes, which are tainted google.com/tpu=true:NoSchedule.
resources.limits["google.com/tpu"]YesNumber of TPU chips to request. Must match the chip count implied by the topology.
📘

Note

The google.com/tpu resource must be set in both requests and limits, and both values must be equal.

No device plugin required

Unlike GPU instances, TPU nodes on GKE do not require an additional device plugin. GKE manages TPU drivers and the google.com/tpu extended resource automatically on TPU nodes. Cast AI does not validate for any TPU device plugin before autoscaling.

Further reading