Autoscaling using TPU instances

The Cast AI Autoscaler can scale GKE clusters using TPU-optimized instances. This guide describes the steps needed to configure your cluster so that TPU nodes can join it and workloads that request TPU resources are scheduled correctly.

Supported providers

Provider	TPU versions supported
GCP GKE	v5e, v5p

📘
Note
TPU instances are only supported on GCP GKE. AWS EKS and Azure AKS are not supported.

How does it work?

Once a pending pod requests google.com/tpu resources, Cast AI's Autoscaler detects the workload and provisions a TPU node that satisfies the request.

Cast AI supports single-host TPU slices only. Multi-host (pod slice) topologies are not supported.

To enable provisioning of TPU nodes, you need:

A pod that requests google.com/tpu resources
A toleration for the google.com/tpu=true:NoSchedule taint
Node selectors that specify the TPU version and topology

Cast AI automatically selects the correct TPU instance type based on the node selectors in your pod specification — no device plugins or additional configuration are required. GKE handles TPU driver installation automatically on TPU nodes.

TPU node labels

Cast AI applies the following labels to TPU nodes, which you can use in node selectors or affinities:

Label	Example value	Description
`cloud.google.com/gke-tpu-accelerator`	`tpu-v5-lite-podslice`	TPU version and type
`cloud.google.com/gke-tpu-topology`	`2x2`	Physical chip topology of the TPU slice
`cloud.google.com/gke-accelerator-count`	`4`	Number of TPU chips on the node

Supported TPU versions and accelerator labels

TPU version	GKE machine type prefix	`gke-tpu-accelerator` label value	Topology format	Chips per host
TPU v5e (Lite)	`ct5lp-hightpu-{N}t`	`tpu-v5-lite-podslice`	2D (e.g. `2x2`)	4
TPU v5p	`ct5p-hightpu-{N}t`	`tpu-v5p-slice`	3D (e.g. `2x2x1`)	4

Supported single-host topologies

TPU v5e (2D topologies)

Chips	Topology
1	`1x1`
4	`2x2`
8	`2x4`
16	`4x4`
32	`4x8`
64	`8x8`
128	`8x16`
256	`16x16`

TPU v5p (3D topologies)

Chips	Topologies (most balanced first)
1	`1x1x1`
4	`2x2x1`
8	`2x2x2`, `2x4x1`
16	`2x2x4`, `4x2x2`, `4x4x1`
32	`2x4x4`, `4x4x2`, `4x8x1`
64	`4x4x4`, `8x8x1`
128	`4x4x8`, `8x8x2`, `4x8x4`
256	`4x8x8`, `8x8x4`

When multiple topologies are valid for a given chip count, Cast AI prefers the most balanced (closest to cube/square) topology.

Tainting of TPU nodes

TPU nodes will have the google.com/tpu=true:NoSchedule taint applied automatically. Your TPU workloads must include a matching toleration; without it, the pod will not be scheduled onto a TPU node, and Cast AI will not provision one.

Workload configuration examples

The following examples show common patterns for configuring TPU workloads.

TPU v5e — 4 chips (2x2 topology)

spec:
  nodeSelector:
    cloud.google.com/gke-tpu-accelerator: tpu-v5-lite-podslice
    cloud.google.com/gke-tpu-topology: 2x2
  tolerations:
    - key: "google.com/tpu"
      operator: Exists
      effect: NoSchedule
  containers:
    - image: my-tpu-image
      name: tpu-workload
      resources:
        requests:
          cpu: 4
          memory: 8Gi
          google.com/tpu: 4
        limits:
          cpu: 4
          memory: 8Gi
          google.com/tpu: 4

TPU v5p — 4 chips (2x2x1 topology)

spec:
  nodeSelector:
    cloud.google.com/gke-tpu-accelerator: tpu-v5p-slice
    cloud.google.com/gke-tpu-topology: 2x2x1
  tolerations:
    - key: "google.com/tpu"
      operator: Exists
      effect: NoSchedule
  containers:
    - image: my-tpu-image
      name: tpu-workload
      resources:
        requests:
          cpu: 4
          memory: 8Gi
          google.com/tpu: 4
        limits:
          cpu: 4
          memory: 8Gi
          google.com/tpu: 4

TPU v5e — 8 chips (2x4 topology)

spec:
  nodeSelector:
    cloud.google.com/gke-tpu-accelerator: tpu-v5-lite-podslice
    cloud.google.com/gke-tpu-topology: 2x4
  tolerations:
    - key: "google.com/tpu"
      operator: Exists
      effect: NoSchedule
  containers:
    - image: my-tpu-image
      name: tpu-workload
      resources:
        requests:
          cpu: 8
          memory: 16Gi
          google.com/tpu: 8
        limits:
          cpu: 8
          memory: 16Gi
          google.com/tpu: 8

Required pod fields

The following fields in your pod specification are required for Cast AI to correctly provision a TPU node:

Field	Required	Description
`spec.nodeSelector["cloud.google.com/gke-tpu-accelerator"]`	Yes	Selects the TPU version. Must match a supported accelerator label value.
`spec.nodeSelector["cloud.google.com/gke-tpu-topology"]`	Yes	Selects the chip topology. Must be a valid single-host topology for the chosen TPU version.
`spec.tolerations` with key `google.com/tpu`	Yes	Required to allow scheduling onto TPU nodes, which are tainted `google.com/tpu=true:NoSchedule`.
`resources.limits["google.com/tpu"]`	Yes	Number of TPU chips to request. Must match the chip count implied by the topology.

📘
Note
The google.com/tpu resource must be set in both requests and limits, and both values must be equal.

No device plugin required

Unlike GPU instances, TPU nodes on GKE do not require an additional device plugin. GKE manages TPU drivers and the google.com/tpu extended resource automatically on TPU nodes. Cast AI does not validate for any TPU device plugin before autoscaling.

TPU Instances (GKE)

Autoscaling using TPU instances

Supported providers

Note

How does it work?

TPU node labels

Supported TPU versions and accelerator labels

Supported single-host topologies

Tainting of TPU nodes

Workload configuration examples

TPU v5e — 4 chips (2x2 topology)

TPU v5p — 4 chips (2x2x1 topology)

TPU v5e — 8 chips (2x4 topology)

Required pod fields

Note

No device plugin required

Further reading