Reference of all labels and taints that Cast AI automatically applies to nodes it manages.
This document lists all labels and taints that Cast AI automatically applies to nodes it manages.
🚧
Important: Reserved Labels and Taints
Cast AI reserves the labels and taints listed in this document. If you configure any of these in a Node Template, Cast AI will overwrite your values with the ones
it determines based on the node's instance type, cloud provider, and feature configuration.
Do not configure these reserved labels or taints in Node Templates. Doing so will not produce the expected result, and may cause pods to not be scheduled correctly.
Labels
Node Identity & Management
Label
Value
When Applied
provisioner.cast.ai/node-id
UUID
Set at node creation. Unique identifier for the node assigned by Cast AI.
provisioner.cast.ai/managed-by
cast.ai
Set at node creation. Marks the node as managed by Cast AI. Used by controllers to filter Cast AI-owned nodes.
provisioner.cast.ai/node-configuration-name
config name
Set at node creation. References the node configuration name used to provision the node.
provisioner.cast.ai/node-configuration-id
config UUID
Set at node creation. References the node configuration ID used to provision the node.
provisioner.cast.ai/hyper-threading-disabled
true
Set when the node configuration disables hyper-threading. Used to distinguish nodes where HT has been turned off at the OS level.
charts.cast.ai/managed
varies
Set on nodes managed via Helm chart deployments.
Lifecycle / Instance Type
Label
Value
When Applied
scheduling.cast.ai/spot
true
Set when the node is a spot/preemptible instance. Used for spot-aware scheduling.
scheduling.cast.ai/on-demand
true
Set when the node is an on-demand instance. Mutually exclusive with scheduling.cast.ai/spot.
scheduling.cast.ai/spot-fallback
true
Set when the node is a spot-fallback instance (on-demand used as fallback for a spot configuration).
scheduling.cast.ai/spot-reliability
score (0–100)
Reliability score of the spot instance type, used by the scheduler to prefer more stable spot instances.
scheduling.cast.ai/interrupted
true
Set when a spot node receives an interruption notice. Signals the node is being drained due to spot preemption.
For details on configuring workloads for spot instances, including fallback and diversity settings, see Spot Instances.
Node Templates
Label
Value
When Applied
scheduling.cast.ai/node-template
template name
Set when the node is provisioned using a specific node template.
scheduling.cast.ai/node-template-version
version string
Set when a node template is applied or updated. Tracks the version of the node template configuration.
For information on creating and configuring node templates, including custom labels and workload targeting, see Node Templates.
Compute / Storage Optimization
Label
Value
When Applied
scheduling.cast.ai/compute-optimized
true
Set when the node is compute-optimized (high CPU-to-memory ratio). Used to schedule compute-intensive workloads.
scheduling.cast.ai/storage-optimized
true
Set when the node is storage-optimized (high local storage).
scheduling.cast.ai/cpu-manufacturer
e.g. Intel, AMD
Set based on the instance type CPU vendor. Enables CPU-manufacturer-aware scheduling.
scheduling.cast.ai/premium-storage
true
Set on AKS nodes that support premium storage (Premium_LRS, PremiumV2_LRS, Premium_ZRS).
GPU / Accelerators
Label
Value
When Applied
nvidia.com/gpu
true
Set by the autoscaler during GPU node provisioning. Applied when the instance type has an NVIDIA GPU device.
nvidia.com/gpu.present
true
Set by the autoscaler alongside nvidia.com/gpu during GPU node provisioning. Acts as an alternative indicator of GPU presence.
nvidia.com/gpu.name
GPU model
GPU model name (e.g. A100). Set by the autoscaler based on the instance type's GPU device metadata.
nvidia.com/gpu.count
integer
Number of physical GPUs on the node. Set by the autoscaler based on instance type GPU count.
nvidia.com/gpu.memory
MiB
Memory per single GPU in MiB. Set by the autoscaler based on instance type GPU memory.
nvidia.com/gpu.total-memory
MiB
Total GPU memory across all GPUs on the node.
nvidia.com/gpu.mig
true
Set when MIG (Multi-Instance GPU) partitioning is enabled.
nvidia.com/gpu.mig-partition-{size}
true
Set per MIG partition size (e.g. nvidia.com/gpu.mig-partition-1g.10gb). Indicates available MIG slice profiles.
nvidia.com/gpu.dra
true
Set when the node uses the NVIDIA DRA (Dynamic Resource Allocation) driver instead of the NVIDIA device plugin.
scheduling.cast.ai/gpu.count
integer
Cast AI internal GPU count used during scheduling simulation. Set alongside NVIDIA labels.
scheduling.cast.ai/gpu-shared
integer
Set when GPU time-sharing or MPS is configured. Value is the number of max shared clients per GPU.
scheduling.cast.ai/gpu-sharing-strategy
time-sharing or mps
Set alongside scheduling.cast.ai/gpu-shared to indicate the GPU sharing strategy configured for the node.
Set on green nodes created as part of a rebalancing plan. Used to associate the node with its rebalancing operation.
rebalancing.cast.ai/operation-id
UUID
Set on green nodes to identify the specific operation within a rebalancing plan.
scheduling.cast.ai/delete-reason
plugin name
Set when the autoscaler marks a node for deletion.
autoscaling.cast.ai/draining
reason string
Set on nodes being drained by the rebalancer or evictor. Possible values: rebalancing, aws-rebalance-recommendation, spot-prediction, spot-fallback, spot-interruption, evictor. Applied alongside TaintNodeDraining.
Set on a node or pod to prevent the rebalancer from evicting pods from that node.
autoscaling.cast.ai/removal-disabled-until
Unix timestamp (seconds)
Set to temporarily prevent removal until the given timestamp. Applied during green node initialization to protect it from premature rebalancing.
autoscaling.cast.ai/live-migration-disabled
true
Pod-level label/annotation. Disables live migration for pods that cannot tolerate it.
kubectl: Query removal-protected nodes and pods
# List nodes protected from removal
kubectl get nodes -l autoscaling.cast.ai/removal-disabled=true
# List pods protected from removal
kubectl get pods -A -l autoscaling.cast.ai/removal-disabled=true
# List pods opted out of live migration
kubectl get pods -A -l autoscaling.cast.ai/live-migration-disabled=true
Set by Cast AI to indicate CLM (Cluster Live Migration) component installation status on the node.
live.cast.ai/migration-enabled
true
Set by the LIVE daemonset once all LIVE components are installed and operational on the node. Indicates the node is eligible as both migration source and destination.
kubectl: Check CLM status
# List nodes with Container Live Migration enabled
kubectl get nodes -l live.cast.ai/migration-enabled=true
# List pods eligible for live migration
kubectl get pods -A -l live.cast.ai/migration-enabled=true
# Monitor active migrations
kubectl get migrations -A -w
Set on nodes that were rebalanced due to an ML-predicted spot interruption. Specifies how many minutes the replacement node should be kept alive after the original was evacuated.
Volume Support
Label
Value
When Applied
volume.scheduling.cast.ai/{volume-name}
true
Set per storage volume/class to indicate the node supports that volume type.
Topology
Label
Value
When Applied
topology.cast.ai/csp
aws, gcp, azure
Set at node creation. Identifies the cloud service provider.
topology.cast.ai/subnet-id
subnet ID
Set at node creation. Identifies the subnet the node was provisioned in.
topology.cast.ai/pod-subnet-id
subnet ID
Set on AKS nodes. Identifies the subnet used for pod IP allocation.
topology.cast.ai/resource-group
resource group
Set on AKS nodes. Azure resource group containing the node.
topology.cast.ai/virtual-network
vnet name
Set on AKS nodes. Azure virtual network name.
topology.cast.ai/subscription-id
subscription ID
Set on AKS nodes. Azure subscription ID.
topology.disk.csi.azure.com/zone
AZ name
Set on Azure nodes. Availability zone for Azure CSI disk topology.
topology.ebs.csi.aws.com/zone
AZ name
Set on AWS nodes. Availability zone for EBS CSI disk topology.
network-tag.gcp.cast.ai/{tag-name}
true
Set on GCP nodes for each network tag associated with the node. Prefix-based, one label per tag.
topology.kubernetes.io/region
region
Standard Kubernetes label. Set at node creation with the cloud region.
topology.kubernetes.io/zone
AZ name
Standard Kubernetes label. Set at node creation with the availability zone.
topology.gke.io/zone
AZ name
GKE-specific zone label.
For information on configuring pod placement by topology, see Pod placement. For subnet configuration, see Subnets.
Cloud Provider Specific
AKS (Azure)
Label
Value
When Applied
kubernetes.azure.com/agentpool
pool name
Set to castai for Cast AI provisioned nodes. Required by AKS for node pool membership.
kubernetes.azure.com/cluster
cluster name
Set to identify the AKS cluster.
kubernetes.azure.com/mode
system or user
Set to system on system node pools.
kubernetes.azure.com/role
agent
Set on all AKS agent nodes.
agentpool(deprecated)
pool name
Deprecated AKS agent pool label (pre k8s 1.24). Replaced by kubernetes.azure.com/agentpool.
kubernetes.io/role(deprecated)
role
Deprecated AKS role label (pre k8s 1.24).
EKS (AWS)
Label
Value
When Applied
eks.amazonaws.com/compute-type
fargate
Set on Fargate nodes. Used to distinguish Fargate from EC2 node types.
Standard Kubernetes Labels (set by autoscaler)
Label
Value
When Applied
kubernetes.io/arch
amd64, arm64
Set at node creation based on instance architecture.
kubernetes.io/os
linux, windows
Set at node creation based on instance OS.
beta.kubernetes.io/arch
amd64, arm64
Legacy beta arch label, set alongside kubernetes.io/arch for backwards compatibility.
beta.kubernetes.io/os
linux, windows
Legacy beta OS label.
node.kubernetes.io/instance-type
instance type name
Standard label set to the cloud provider instance type (e.g. m5.xlarge).
kubernetes.io/hostname
hostname
Standard hostname label.
Resource Offering
Label
Value
When Applied
autoscaling.cast.ai/provisioned-resource-offering
offering type
Set at node creation to record the resource offering type used for provisioning.
OMNI Edge
Label
Value
When Applied
virtual-node.omni.cast.ai/not-allowed
true
Set on nodes that are not allowed to run workloads in the OMNI edge context.
Taints
Rebalancing Taints
Key
Value
Effect
When Applied
rebalancing.cast.ai/preparing
(none)
NoSchedule
Applied to green (replacement) nodes during rebalancing plan execution. Prevents new pods from being scheduled until the node is fully prepared. Removed once the node is ready to receive workloads.
scheduling.cast.ai/pod-pinning-preparing
(none)
NoSchedule
Applied during rebalancing when pod pinning is enabled. Prevents scheduling until pod pinning setup is complete.
autoscaling.cast.ai/draining
true
NoSchedule
Applied to blue (old) nodes when they are being drained during rebalancing or spot interruption handling. Applied alongside the autoscaling.cast.ai/draining label.
provisioner.cast.ai/uninitialized
(none)
NoSchedule
Applied to newly provisioned nodes before initialization is complete. Prevents pod scheduling until the node is fully set up.
For details on how rebalancing operates and the blue/green node replacement process, see Rebalancing. For paused drain configuration, see Paused drain configuration.
Lifecycle / Spot Taint
Key
Value
Effect
When Applied
scheduling.cast.ai/spot
(none)
NoSchedule
Applied to spot nodes when the lifecycle taint feature is enabled in node template configuration. Requires pods to explicitly tolerate spot instances. Only applied when both spot and on-demand nodes exist in the cluster.
For workload configuration patterns using spot tolerations and node selectors, see Spot Instances.
Node Template Taint
Key
Value
Effect
When Applied
scheduling.cast.ai/node-template
template name (e.g. default-by-castai)
NoSchedule
Applied to all nodes provisioned via a node template.
Scoped Autoscaler Taint
Key
Value
Effect
When Applied
scheduling.cast.ai/scoped-autoscaler
(none)
NoSchedule
Applied when the scoped autoscaler feature is enabled for a node template. Restricts scheduling to workloads explicitly intended for the scoped autoscaler.
Storage Optimization Taint
Key
Value
Effect
When Applied
scheduling.cast.ai/storage-optimized
(none)
NoSchedule
Applied to storage-optimized nodes. Requires workloads to explicitly tolerate storage-optimized instances.
GPU / Accelerators Taints
Key
Value
Effect
When Applied
nvidia.com/gpu
true
NoSchedule
Applied by the autoscaler during GPU node provisioning. Restricts scheduling to GPU-tolerant workloads, preventing non-GPU pods from consuming GPU nodes.
nvidia.com/gpu.mig
true
NoSchedule
Applied by the autoscaler when provisioning nodes with MIG (Multi-Instance GPU) partitioning enabled. Restricts to MIG-compatible workloads.
aws.amazon.com/neuron
true
NoSchedule
Applied to AWS Inferentia/Trainium nodes. Restricts scheduling to workloads that explicitly require Neuron accelerators.
For GPU workload configuration examples and toleration patterns, see GPU Instances.
Architecture Taint
Key
Value
Effect
When Applied
kubernetes.io/arch
arm64
NoSchedule
Applied to ARM64 nodes. Requires pods to tolerate ARM64 architecture, preventing x86-only images from being scheduled on ARM nodes.
Eviction Taints (applied by Evictor)
Key
Value
Effect
When Applied
evictor.cast.ai/evicting
(none)
varies
Applied by the evictor when it starts draining a node. Signals that the node is being evacuated.
evictor.cast.ai/evicted
(none)
varies
Applied by the evictor after node eviction is complete. Signals the node has been fully drained.
For Evictor operating modes, override rules, and advanced configuration, see Evictor.
OMNI Edge Taint
Key
Value
Effect
When Applied
virtual-node.omni.cast.ai/not-allowed
true
NoExecute
Applied to OMNI virtual nodes that are not allowed to run workloads. Evicts any existing pods in addition to blocking new scheduling.
Annotations (related, node-level)
These annotations are not labels or taints but are closely related and set on nodes:
Annotation
Value
Purpose
autoscaling.cast.ai/removal-delay-seconds
integer
Sets a removal delay in seconds for a node before it can be deleted.
autoscaling.cast.ai/paused-draining-until
timestamp
Pauses the draining process on a node until the given timestamp.
rebalancing.cast.ai/status
drain-failed
Set on a node when a drain operation during rebalancing has failed.
evictor.cast.ai/eviction-status
varies
Tracks the current eviction status of a node set by the evictor.
predictions.cast.ai/remove-after
timestamp
Set on nodes rebalanced due to ML spot interruption predictions. Indicates when the node should be removed.
kubectl: Query nodes by annotation
# Find nodes with a removal delay configured
kubectl get nodes -o json | jq -r '.items[] | select(.metadata.annotations["autoscaling.cast.ai/removal-delay-seconds"] != null) | .metadata.name'
# Find nodes with paused draining
kubectl get nodes -o json | jq -r '.items[] | select(.metadata.annotations["autoscaling.cast.ai/paused-draining-until"] != null) | "\(.metadata.name) paused until \(.metadata.annotations["autoscaling.cast.ai/paused-draining-until"])"'
# Find nodes with failed drain operations
kubectl get nodes -o json | jq -r '.items[] | select(.metadata.annotations["rebalancing.cast.ai/status"] == "drain-failed") | .metadata.name'