Autoscaling using GPU instances

The Cast AI Autoscaler can scale the cluster using GPU-optimized instances. This guide describes the steps needed to configure the cluster so that GPU nodes can join it.

Supported providers

Provider	GPUs supported
AWS EKS	NVIDIA
GCP GKE	NVIDIA
Azure AKS *	NVIDIA

* - Please contact Cast AI support to enable this feature for your organization.

How does it work?

Once activated, Cast AI's Autoscaler detects workloads requiring GPU resources and starts provisioning them.

To enable the provisioning of GPU nodes, you need a few things:

Choose a GPU instance type or attach a GPU to the instance type
Install GPU drivers
Expose the GPU to Kubernetes as a consumable resource.

Cast AI ensures that the correct GPU instance type is selected - all you have to do is define GPU resources and add a GPU or a node template toleration. You can also target specific GPU characteristics using node selectors or affinities to the GPU labels.

Label	Value Example	Description
`nvidia.com/gpu`	`true`	Node has an NVIDIA GPU attached
`nvidia.com/gpu.name`	`nvidia-tesla-t4`	Attached GPU type
`nvidia.com/gpu.count`	`1`	Attached GPU count
`nvidia.com/gpu.memory`	`15258`	Avaialble single GPU memory in Mib

📘
Tainting of GPU nodes
A GPU Node added using the "default-by-castai" Node template will have the taint nvidia.com/gpu=true:NoSchedule applied to it. On the contrary, a GPU node added using a custom Node template will not be tainted unless specified in the template definition.

Workload configuration examples

spec:
  tolerations:
    - key: "nvidia.com/gpu"
      operator: Exists
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

spec:
  nodeSelector:
    scheduling.cast.ai/node-template: "gpu-node-template"
  tolerations:
    - key: "gpu-node-template"
      value: "template-affinity"
      operator: "Equal"
      effect: "NoSchedule"
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

spec:
  nodeSelector:
    nvidia.com/gpu.name: "nvidia-tesla-t4"
  tolerations:
    - key: "nvidia.com/gpu"
      operator: Exists
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

spec:
  nodeSelector:
    scheduling.cast.ai/node-template: "gpu-node-template"
    nvidia.com/gpu.name: "nvidia-tesla-p4"
  tolerations:
    - key: "scheduling.cast.ai/node-template"
      value: "gpu-node-template"
      operator: "Equal"
      effect: "NoSchedule"
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: nvidia.com/gpu.memory
            operator: Gt
            values:
            - "10000"
  tolerations:
    - key: "nvidia.com/gpu"
      operator: Exists
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

# No changes to pod specifications are required for GPU sharing
# Pods continue to request nvidia.com/gpu: 1
spec:
  nodeSelector:
    scheduling.cast.ai/node-template: "gpu-sharing-template"
  tolerations:
    - key: "gpu-sharing-template"
      value: "template-affinity"
      operator: "Equal"
      effect: "NoSchedule"
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

GPU sharing (time-slicing)

Cast AI supports GPU sharing through time-slicing, which allows multiple workloads to share a single physical GPU by rapidly switching between processes. This feature enables better GPU utilization and cost optimization for workloads that don't require dedicated GPU access.

Monitor your GPU sharing efficiency with GPU utilization metrics once configured.

What is GPU time-slicing?

GPU time-slicing is achieved through rapid context switching, where:

Each process gets an equal share of GPU time
Compute resources are assigned to one process at a time
GPU memory is shared between all processes
The NVIDIA device plugin manages the sharing configuration

For more information on GPU time-slicing, see:

Supported configurations

Provider	GPU sharing support	Notes
AWS EKS	✓ (Bottlerocket only)	Available with time-slicing configuration
GCP GKE	✓	-
Azure AKS	Not yet supported	-

How it works

Configuration: Enable GPU sharing in your node template with time-slicing parameters
Resource calculation: Cast AI calculates extended GPU capacity as GPU_COUNT * SHARED_CLIENTS_PER_GPU
Node provisioning: The autoscaler provisions nodes in adherence to the specified sharing configuration
Workload scheduling: Pods continue to request nvidia.com/gpu: 1 - no changes to pod specifications required

Configuring GPU sharing

GPU sharing can be configured through multiple methods.

Console UI

GPU sharing is configured through node templates in the Cast AI console:

Create or edit a node template
Enable Use GPU time sharing
Configure sharing parameters:
- Default shared clients per GPU: The default number of workloads that can share each GPU (1-48)
- Sharing configuration per GPU type: Override defaults for specific GPU models

The maximum number of shared clients per physical GPU is 48.

API

Use the Node Templates API to configure GPU sharing programmatically. Include the gpu object in your node template configuration:

{
  "gpu": {
    "enableTimeSharing": true,
    "defaultSharedClientsPerGpu": 10,
    "sharingConfiguration": {
      "nvidia-tesla-t4": {
        "sharedClientsPerGpu": 8
      },
      "nvidia-tesla-a100": {
        "sharedClientsPerGpu": 16
      }
    }
  }
}

Terraform

Configure GPU sharing using the Cast AI Terraform provider. Add the gpu block to your node template resource:

resource "castai_node_template" "example" {
  # ... other configuration

  gpu {
    enable_time_sharing             = true
    default_shared_clients_per_gpu  = 10
    
    sharing_configuration = {
      "nvidia-tesla-t4" = {
        shared_clients_per_gpu = 8
      }
      "nvidia-tesla-a100" = {
        shared_clients_per_gpu = 16
      }
    }
  }
}

GPU drivers

After creating a node of an instance type with a GPU, the node becomes part of the cluster, but GPU resources are not immediately usable. To make GPUs accessible to Kubernetes, you need to install GPU drivers on the node.

GPU driver plugins help achieve this goal. The installation of GPU driver plugins on the cluster/node varies depending on the cloud provider or desired behavior.

Cast AI validates to ensure that the driver exists on a cluster before performing any kind of autoscaling. If it doesn't detect the driver, it creates a pod event with details on solving the problem.

Driver detection

Cast AI assumes that the GPU driver plugin is installed if it finds a daemonset that matches plugin characteristics, and a pod created from that daemonset can run on a node.

Cast AI supports all default GPU driver plugins that match specific name patterns. Moreover, it also allows tagging all custom plugins as supported with the label nvidia-device-plugin: "true"

Daemonset name matching one of the patterns is considered a known official GPU driver plugin:

*nvidia-device-plugin*
*nvidia-gpu-device-plugin*
*nvidia-driver-installer*

GPU drivers on AWS (EKS)

By default, EKS clusters come without a GPU device plugin installed on a cluster. There are several ways to add a GPU plugin to a cluster:

During Cast AI onboarding

Cast AI allows you to enable the GPU device plugin at the cluster onboarding stage. By ticking the checkbox in the UI, you can install plugins automatically during onboarding.

Manually installing device plugin

Alternatively, you can manually install the plugin from the NVIDIA Helm repository.

helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update

noglob helm upgrade -i nvdp nvdp/nvidia-device-plugin -n castai-agent \
    --set-string nodeSelector."nvidia\.com/gpu"=true \
    --set \
tolerations[0].key=CriticalAddonsOnly,tolerations[0].operator=Exists,\
tolerations[1].effect=NoSchedule,tolerations[1].key="nvidia\.com/gpu",tolerations[1].operator=Exists,\
tolerations[2].key="scheduling\.cast\.ai/spot",tolerations[2].operator=Exists,\
tolerations[3].key="scheduling\.cast\.ai/scoped-autoscaler",tolerations[3].operator=Exists,\
tolerations[4].key="scheduling\.cast\.ai/node-template",tolerations[4].operator=Exists

Custom GPU device plugin

Cast AI assumes a custom plugin controls the driver installation process and node management. If you wish to autoscale GPU nodes using a custom plugin, it must be detectable to Cast AI.

NVIDIA drivers and Amazon Machine Images (AMIs)

The NVIDIA device plugin requires that NVIDIA drivers and the nvidia-container-toolkit already exist on the machine, or it will fail to start or expose GPU resources properly. By default, Cast AI detects GPU-enabled nodes and uses the EKS-optimized and GPU-enabled AMI by Amazon, which already bundles these. However, if custom AMIs are used, then the installation of these prerequisites must also be included in the AMI building process or the node's user data scripts.

See Node Configuration documentation for more details on AMI choice.

🚧
Known issues
NVIDIA dropped support for Kepler architecture GPUs after driver version 470. Since Cast AI uses AMIs that bundle newer driver versions, these AMIs cannot be used with GPU instances that utilize such GPUs (P2 instance types). In order to use those instance types, an older GPU-enabled AMI or custom AMI must be set in the node configuration.

GPU drivers on GCP (GKE)

The GKE cluster, by default, has preinstalled NVIDIA driver plugins. If Cast AI finds default plugins, it will use them, instructing to install default NVIDIA drivers version based on the cluster version, GPU, and instance type.

Alternatively, you can manually install driver plugins or use custom drivers. If Cast AI finds a custom or manually installed plugin, its priority will be higher than preinstalled drivers.

Manually installing driver plugin

A manually installed driver plugin will be used to install GPU drivers, but a preinstalled GPU plugin manages both the GPU and the node. Use this command to install the drivers:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml

Custom GPU driver plugin

Cast AI operates under the assumption that a custom plugin possesses complete control over driver installation and compatibility with pre-installed plugins. In order to allow Cast AI to autoscale GPU nodes using a custom plugin, it needs to be detectable to Cast AI.

GPU drivers on Azure (AKS)

Install the device plugin daemonset:

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/main/deployments/static/nvidia-device-plugin.yml

Verify that the pods of the daemonset are up and running on GPU nodes.

GPU can be verified by running an Nvidia plugin job or an Azure GPU job.

GPU instances

Autoscaling using GPU instances

Supported providers

How does it work?

📘
Tainting of GPU nodes

Workload configuration examples

GPU sharing (time-slicing)

What is GPU time-slicing?

Supported configurations

How it works

Configuring GPU sharing

Console UI

API

Terraform

GPU drivers

Driver detection

GPU drivers on AWS (EKS)

During Cast AI onboarding

Manually installing device plugin

Custom GPU device plugin

NVIDIA drivers and Amazon Machine Images (AMIs)

🚧
Known issues

GPU drivers on GCP (GKE)

Manually installing driver plugin

Custom GPU driver plugin

GPU drivers on Azure (AKS)

Autoscaling using GPU instances

Supported providers

How does it work?

📘Tainting of GPU nodes

Workload configuration examples

GPU sharing (time-slicing)

What is GPU time-slicing?

Supported configurations

How it works

Configuring GPU sharing

Console UI

API

Terraform

GPU drivers

Driver detection

GPU drivers on AWS (EKS)

During Cast AI onboarding

Manually installing device plugin

Custom GPU device plugin

NVIDIA drivers and Amazon Machine Images (AMIs)

🚧Known issues

GPU drivers on GCP (GKE)

Manually installing driver plugin

Custom GPU driver plugin

GPU drivers on Azure (AKS)

📘
Tainting of GPU nodes

🚧
Known issues