AWS Neuron Instances (EKS)

Configure Cast AI Autoscaler to scale your EKS cluster using AWS Neuron accelerator instances — including AWS Inferentia and AWS Trainium — for machine learning inference and training workloads.

Autoscaling using Neuron instances

The Cast AI Autoscaler can scale EKS clusters using AWS Neuron-powered instances. This guide describes what is required for Neuron nodes to join the cluster and for Neuron workloads to be scheduled correctly.

Supported providers

ProviderSupported
AWS EKSYes
GCP GKENo
Azure AKSNo

Supported instance families

Cast AI supports all AWS EC2 instance families equipped with Neuron devices:

Instance familyHardwarePrimary use
inf1AWS InferentiaInference
inf2AWS Inferentia2Inference
trn1 / trn1nAWS TrainiumTraining
trn2AWS Trainium2Training & inference
trn3AWS Trainium3Training & inference

How does it work?

Once a pending pod requests aws.amazon.com/neuron, aws.amazon.com/neuroncore, or aws.amazon.com/neurondevice resources, Cast AI detects the workload and provisions a Neuron-capable node.

To enable provisioning of Neuron nodes, you need:

  • A pod that requests Neuron resources in its container limits
  • A toleration for the aws.amazon.com/neuron=true:NoSchedule taint
  • The Neuron device plugin DaemonSet installed and running in the cluster

Cast AI validates that the Neuron device plugin DaemonSet is present before provisioning any Neuron node. If it is not detected, Cast AI emits a pod event with the message Neuron daemon set driver is required and does not attempt to scale.

Neuron node resources

Cast AI exposes the following extended resources on Neuron nodes:

Resource nameDescription
aws.amazon.com/neuronNumber of Neuron devices on the node
aws.amazon.com/neurondeviceAlias for the number of Neuron devices (equivalent to aws.amazon.com/neuron)
aws.amazon.com/neuroncoreNumber of individual NeuronCores on the node

Use aws.amazon.com/neuron or aws.amazon.com/neurondevice to request whole devices. Use aws.amazon.com/neuroncore to request individual NeuronCores for finer-grained allocation.

Tainting of Neuron nodes

Neuron nodes have the aws.amazon.com/neuron=true:NoSchedule taint applied automatically. Your Neuron workloads must include a matching toleration; without it, pods will not be scheduled onto Neuron nodes and Cast AI will not provision one.

Neuron device plugin

The Neuron device plugin is required for Neuron resources to be exposed to Kubernetes. Cast AI checks for a DaemonSet whose pod name contains neuron-device-plugin before performing any Neuron node autoscaling.

Install the Neuron Helm Chart, which includes the device plugin:

helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart

To install the device plugin only (without the Node Problem Detector):

helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \
    --set "npd.enabled=false"

Verify the device plugin DaemonSet is running:

kubectl get ds neuron-device-plugin -n kube-system

Verify that Neuron resources are visible on nodes:

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,NeuronDevice:.status.allocatable.aws\.amazon\.com/neuron"
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,NeuronCore:.status.allocatable.aws\.amazon\.com/neuroncore"

Workload configuration examples

Request whole Neuron devices (aws.amazon.com/neuron)

spec:
  tolerations:
    - key: "aws.amazon.com/neuron"
      operator: Exists
      effect: NoSchedule
  containers:
    - image: my-neuron-image
      name: neuron-workload
      resources:
        requests:
          cpu: 4
          memory: 8Gi
          aws.amazon.com/neuron: 1
        limits:
          cpu: 4
          memory: 8Gi
          aws.amazon.com/neuron: 1

Request individual NeuronCores (aws.amazon.com/neuroncore)

spec:
  tolerations:
    - key: "aws.amazon.com/neuron"
      operator: Exists
      effect: NoSchedule
  containers:
    - image: my-neuron-image
      name: neuron-workload
      resources:
        requests:
          cpu: 4
          memory: 8Gi
          aws.amazon.com/neuroncore: 2
        limits:
          cpu: 4
          memory: 8Gi
          aws.amazon.com/neuroncore: 2

Request multiple devices

spec:
  tolerations:
    - key: "aws.amazon.com/neuron"
      operator: Exists
      effect: NoSchedule
  containers:
    - image: my-neuron-image
      name: neuron-workload
      resources:
        requests:
          cpu: 8
          memory: 16Gi
          aws.amazon.com/neuroncore: 8
        limits:
          cpu: 8
          memory: 16Gi
          aws.amazon.com/neuroncore: 8

Required pod fields

FieldRequiredDescription
spec.tolerations with key aws.amazon.com/neuronYesRequired to allow scheduling onto Neuron nodes, which carry the aws.amazon.com/neuron=true:NoSchedule taint.
resources.limits["aws.amazon.com/neuron"] or ["aws.amazon.com/neuroncore"] or ["aws.amazon.com/neurondevice"]Yes (at least one)Amount of Neuron resources to allocate. Must be set in both requests and limits with equal values.
📘

Note

AWS requires that requests and limits are equal for Neuron resources. Cast AI enforces this and will emit an error event if they differ.

Further reading