Autoscaling using GPU instances

The Cast AI Autoscaler can scale the cluster using GPU-optimized instances. This guide describes the steps needed to configure the cluster so that GPU nodes can join it.

Supported providers

Provider	GPUs supported
AWS EKS	NVIDIA
GCP GKE	NVIDIA
Azure AKS *	NVIDIA

* - Please contact Cast AI support to enable this feature for your organization.

How does it work?

Once activated, Cast AI's Autoscaler detects workloads requiring GPU resources and starts provisioning them.

To enable the provisioning of GPU nodes, you need a few things:

Choose a GPU instance type or attach a GPU to the instance type
Install NVIDIA device plugin (needed for EKS and AKS)
Expose the GPU to Kubernetes as a consumable resource.

Cast AI ensures that the correct GPU instance type is selected - all you have to do is define GPU resources and add a GPU or a node template toleration. You can also target specific GPU characteristics using node selectors or affinities to the GPU labels.

Label	Value Example	Description
`nvidia.com/gpu`	`true`	Node has an NVIDIA GPU attached
`nvidia.com/gpu.name`	`nvidia-tesla-t4`	Attached GPU type
`nvidia.com/gpu.count`	`1`	Attached GPU count
`nvidia.com/gpu.memory`	`15258`	Avaialble single GPU memory in Mib

📘
Tainting of GPU nodes
GPU nodes will have the nvidia.com/gpu=true:NoSchedule taint applied automatically, except when the Node Template has GPU constraints configured or limits instance types to GPU instances only. In such cases, the taint is not applied since the template is specifically designed for only GPU workloads.

GPU sharing

Cast AI supports GPU sharing to enable multiple workloads to utilize GPU resources more efficiently. Multiple sharing methods are available, each optimized for different use cases:

Time-slicing - Share GPUs through rapid context switching
Multi-Instance GPU (MIG) - Partition GPUs with hardware-level isolation
Fractional GPUs - Pre-partitioned GPU portions from AWS (1/8 to full GPU)

For a detailed comparison and guidance on choosing the right method, see the GPU sharing overview.

NVIDIA device plugin

After creating a node of an instance type with a GPU, the node becomes part of the cluster, but GPU resources are not immediately usable. To make GPUs accessible to Kubernetes, you must install the NVIDIA device plugin on the node.

Cast AI validates to ensure that the NIDIA device plugin exists on a cluster before performing any kind of autoscaling. If it doesn't detect the NVIDIA device plugin, it creates a pod event with details on how to resolve the problem.

NVIDIA device plugin detection

Cast AI assumes that the NVIDIA device plugin is installed if it finds a DaemonSet that matches the plugin's characteristics, and a pod created from that DaemonSet can run on a node.

Cast AI supports all default NVIDIA device plugin that match specific name patterns. Moreover, it also allows tagging all custom plugins as supported with the label nvidia-device-plugin: "true"

Daemonset name matching one of the patterns is considered a known official NVIDIA device plugin:

*nvidia-device-plugin*
*nvidia-gpu-device-plugin*

📘
NVIDIA device plugin requires successful pod scheduling
Cast AI considers a NVIDIA device plugin present only if a pod from the matching DaemonSet can actually run on the GPU node. A DaemonSet that exists in the cluster but whose pods cannot be scheduled onto a node — for example, because the node has a taint that the DaemonSet does not tolerate — will not satisfy NVIDIA device plugin detection. This is a common source of the NVIDIA Device Plugin is required error message.
If your Node Template includes custom taints, the NVIDIA device plugin DaemonSet must have a matching toleration for each of those taints. Without it, the device plugin pod will not be scheduled on newly provisioned GPU nodes, and Cast AI will report a detection failure even if the plugin is otherwise correctly installed.
If you see pod events with error message NVIDIA Device Plugin is required, check that the NVIDIA device plugin DaemonSet has tolerations for:

nvidia.com/gpu:NoSchedule (applied automatically by Cast AI on most GPU nodes)

All Cast AI system taints: scheduling.cast.ai/spot, scheduling.cast.ai/scoped-autoscaler, scheduling.cast.ai/node-template

Any custom taints defined in your Node Templates

Example: If your Node Template has the taint team: ml-workloads:NoSchedule, add a specific toleration:
tolerations:
  - key: "team"
    operator: "Equal"
    value: "ml-workloads"
    effect: "NoSchedule"
Alternatively, you can use a wildcard toleration that matches all taints:
tolerations:
  - operator: Exists  # Tolerates ALL taints
Note that the wildcard is permissive — it allows the pod to be scheduled on any node regardless of its taints. Use specific tolerations in production where possible.

NVIDIA device plugin

By default, EKS and AKS clusters do not come with a NVIDIA device plugin installed. There are several ways to add a NVIDIA device plugin to a cluster:

📘
GKE clusters come with the NVIDIA device plugin pre-installed by default. No manual installation is required unless you want to use a custom NVIDIA device plugin.

During Cast AI onboarding (only for EKS)

Cast AI allows you to enable the GPU device plugin at the cluster onboarding stage. By ticking the checkbox in the UI, you can install plugins automatically during onboarding.

Manually installing NVIDIA device plugin

Alternatively, you can manually install the plugin from the NVIDIA Helm repository.

helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update

Install the plugin using a values file. Copy the command below, which pipes the values directly via stdin:

helm upgrade -i nvdp nvdp/nvidia-device-plugin \
  --namespace castai-agent \
  --create-namespace \
  -f - <<'EOF'
nodeSelector:
  nvidia.com/gpu: "true"

tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
  - key: scheduling.cast.ai/spot
    operator: Exists
  - key: scheduling.cast.ai/scoped-autoscaler
    operator: Exists
  - key: scheduling.cast.ai/node-template
    operator: Exists

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        # Discrete GPU nodes (PCI vendor ID 10de = NVIDIA)
        - matchExpressions:
            - key: feature.node.kubernetes.io/pci-10de.present
              operator: In
              values:
                - "true"
            - key: nvidia.com/gpu.dra
              operator: NotIn
              values:
                - "true"
        # Tegra-based systems (CPU vendor NVIDIA)
        - matchExpressions:
            - key: feature.node.kubernetes.io/cpu-model.vendor_id
              operator: In
              values:
                - "NVIDIA"
            - key: nvidia.com/gpu.dra
              operator: NotIn
              values:
                - "true"
        # Manually labeled GPU nodes
        - matchExpressions:
            - key: nvidia.com/gpu.present
              operator: In
              values:
                - "true"
            - key: nvidia.com/gpu.dra
              operator: NotIn
              values:
                - "true"
EOF

Custom GPU device plugin

Cast AI assumes a custom plugin controls the driver installation process and node management. If you wish to autoscale GPU nodes using a custom plugin, it must be detectable to Cast AI.

NVIDIA drivers and Amazon Machine Images (AMIs)

The NVIDIA device plugin requires that NVIDIA drivers and the nvidia-container-toolkit already exist on the machine, or it will fail to start, or expose GPU resources properly. By default, Cast AI detects GPU-enabled nodes and uses the EKS-optimized and GPU-enabled AMI by Amazon, which already bundles these. However, if custom AMIs are used, then the installation of these prerequisites must also be included in the AMI building process or the node's user data scripts.

Amazon Linux 2023 AMI

Amazon Linux 2023 (AL2023) AMIs support GPU time-slicing. When using AL2023 for GPU nodes with time-slicing enabled, Cast AI automatically configures the required NVIDIA device plugin settings.

See the GPU sharing with time-slicing documentation for detailed AL2023 configuration instructions.

Bottlerocket AMI considerations

Bottlerocket AMIs come with pre-installed NVIDIA device plugin. Do not install additional NVIDIA device plugins on clusters where you plan to use Bottlerocket GPU nodes, as they may conflict with the pre-installed NVIDIA device plugin and fail to start. If you do, you need to ensure these NVIDIA device plugin daemonsets do not run on those Bottlerocket nodes through tolerations.

When using Bottlerocket for GPU nodes, the OS handles driver management automatically. For more information, see Bottlerocket support for NVIDIA GPUs.

See Node Configuration documentation for more details on AMI choice.

🚧
Known issues
NVIDIA dropped support for Kepler architecture GPUs after driver version 470. Since Cast AI uses AMIs that bundle newer driver versions, these AMIs cannot be used with GPU instances that utilize such GPUs (P2 instance types). In order to use those instance types, an older GPU-enabled AMI or custom AMI must be set in the node configuration.

Workload configuration examples

The following examples show common patterns for configuring GPU workloads. For GPU sharing-specific configurations, see the time-slicing and MIG documentation.

spec:
  tolerations:
    - key: "nvidia.com/gpu"
      operator: Exists
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

spec:
  nodeSelector:
    scheduling.cast.ai/node-template: "gpu-node-template"
  tolerations:
    - key: "gpu-node-template"
      value: "template-affinity"
      operator: "Equal"
      effect: "NoSchedule"
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

spec:
  nodeSelector:
    nvidia.com/gpu.name: "nvidia-tesla-t4"
  tolerations:
    - key: "nvidia.com/gpu"
      operator: Exists
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

spec:
  nodeSelector:
    scheduling.cast.ai/node-template: "gpu-node-template"
    nvidia.com/gpu.name: "nvidia-tesla-p4"
  tolerations:
    - key: "scheduling.cast.ai/node-template"
      value: "gpu-node-template"
      operator: "Equal"
      effect: "NoSchedule"
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: nvidia.com/gpu.memory
            operator: Gt
            values:
            - "10000"
  tolerations:
    - key: "nvidia.com/gpu"
      operator: Exists
  containers:
    - image: my-image
      name: gpu-test
      resources:
        requests:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1
        limits:
          cpu: 1
          memory: 1Gi
          nvidia.com/gpu: 1