This document lists all labels and taints that Cast AI automatically applies to nodes it manages.

🚧
Important: Reserved Labels and Taints
Cast AI reserves the labels and taints listed in this document. If you configure any of these in a Node Template, Cast AI will overwrite your values with the ones
it determines based on the node's instance type, cloud provider, and feature configuration.
Do not configure these reserved labels or taints in Node Templates. Doing so will not produce the expected result, and may cause pods to not be scheduled correctly.

Labels

Node Identity & Management

Label	Value	When Applied
`provisioner.cast.ai/node-id`	UUID	Set at node creation. Unique identifier for the node assigned by Cast AI.
`provisioner.cast.ai/managed-by`	`cast.ai`	Set at node creation. Marks the node as managed by Cast AI. Used by controllers to filter Cast AI-owned nodes.
`provisioner.cast.ai/node-configuration-name`	config name	Set at node creation. References the node configuration name used to provision the node.
`provisioner.cast.ai/node-configuration-id`	config UUID	Set at node creation. References the node configuration ID used to provision the node.
`provisioner.cast.ai/hyper-threading-disabled`	`true`	Set when the node configuration disables hyper-threading. Used to distinguish nodes where HT has been turned off at the OS level.
`charts.cast.ai/managed`	varies	Set on nodes managed via Helm chart deployments.

Lifecycle / Instance Type

Label	Value	When Applied
`scheduling.cast.ai/spot`	`true`	Set when the node is a spot/preemptible instance. Used for spot-aware scheduling.
`scheduling.cast.ai/on-demand`	`true`	Set when the node is an on-demand instance. Mutually exclusive with `scheduling.cast.ai/spot`.
`scheduling.cast.ai/spot-fallback`	`true`	Set when the node is a spot-fallback instance (on-demand used as fallback for a spot configuration).
`scheduling.cast.ai/spot-reliability`	score (0–100)	Reliability score of the spot instance type, used by the scheduler to prefer more stable spot instances.
`scheduling.cast.ai/interrupted`	`true`	Set when a spot node receives an interruption notice. Signals the node is being drained due to spot preemption.

For details on configuring workloads for spot instances, including fallback and diversity settings, see Spot Instances.

Node Templates

Label	Value	When Applied
`scheduling.cast.ai/node-template`	template name	Set when the node is provisioned using a specific node template.
`scheduling.cast.ai/node-template-version`	version string	Set when a node template is applied or updated. Tracks the version of the node template configuration.

For information on creating and configuring node templates, including custom labels and workload targeting, see Node Templates.

Compute / Storage Optimization

Label	Value	When Applied
`scheduling.cast.ai/compute-optimized`	`true`	Set when the node is compute-optimized (high CPU-to-memory ratio). Used to schedule compute-intensive workloads.
`scheduling.cast.ai/storage-optimized`	`true`	Set when the node is storage-optimized (locally attached NVMe/SSD disks). See Storage-optimized nodes.
`scheduling.cast.ai/cpu-manufacturer`	e.g. `Intel`, `AMD`	Set based on the instance type CPU vendor. Enables CPU-manufacturer-aware scheduling.
`scheduling.cast.ai/premium-storage`	`true`	Set on AKS nodes that support premium storage (Premium_LRS, PremiumV2_LRS, Premium_ZRS).

GPU / Accelerators

Label	Value	When Applied
`nvidia.com/gpu`	`true`	Set by the autoscaler during GPU node provisioning. Applied when the instance type has an NVIDIA GPU device.
`nvidia.com/gpu.present`	`true`	Set by the autoscaler alongside `nvidia.com/gpu` during GPU node provisioning. Acts as an alternative indicator of GPU presence.
`nvidia.com/gpu.name`	GPU model	GPU model name (e.g. `A100`). Set by the autoscaler based on the instance type's GPU device metadata.
`nvidia.com/gpu.count`	integer	Number of physical GPUs on the node. Set by the autoscaler based on instance type GPU count.
`nvidia.com/gpu.memory`	MiB	Memory per single GPU in MiB. Set by the autoscaler based on instance type GPU memory.
`nvidia.com/gpu.total-memory`	MiB	Total GPU memory across all GPUs on the node.
`nvidia.com/gpu.mig`	`true`	Set when MIG (Multi-Instance GPU) partitioning is enabled.
`nvidia.com/gpu.mig-partition-{size}`	`true`	Set per MIG partition size (e.g. `nvidia.com/gpu.mig-partition-1g.10gb`). Indicates available MIG slice profiles.
`nvidia.com/gpu.dra`	`true`	Set when the node uses the NVIDIA DRA (Dynamic Resource Allocation) driver instead of the NVIDIA device plugin.
`scheduling.cast.ai/gpu.count`	integer	Cast AI internal GPU count used during scheduling simulation. Set alongside NVIDIA labels.
`scheduling.cast.ai/gpu-shared`	integer	Set when GPU time-sharing or MPS is configured. Value is the number of max shared clients per GPU.
`scheduling.cast.ai/gpu-sharing-strategy`	`time-sharing` or `mps`	Set alongside `scheduling.cast.ai/gpu-shared` to indicate the GPU sharing strategy configured for the node.
`scheduling.cast.ai/nvidia-device-plugin-static-pod`	`true`	Set on AWS AL2023 nodes with GPU sharing enabled. Signals that the NVIDIA device plugin is deployed as a static pod rather than a DaemonSet.
`scheduling.cast.ai/gpu-partition-size`	partition size (e.g. `1g.5gb`)	Set on nodes where MIG is configured in single mode. Indicates the active MIG partition size.
`scheduling.cast.ai/bottlerocket-gpu-partition-size`	partition size (e.g. `1g.5gb`)	Set on AWS Bottlerocket nodes with MIG configured in single mode.
`scheduling.cast.ai/preinstalled-nvidia-driver`	`true`	Set on nodes that come with a pre-installed NVIDIA driver (e.g. Bottlerocket).
`aws.amazon.com/neuron`	`true`	Set on AWS instances with Inferentia/Trainium (Neuron) accelerators. Used together with `TaintNeuron`.
`cloud.google.com/gke-accelerator`	GPU type	Set on GKE GPU nodes by the autoscaler. Indicates the accelerator type (e.g. `nvidia-tesla-t4`).
`cloud.google.com/gke-gpu-driver-version`	version	Set on GKE nodes when using the default NVIDIA device plugin. Specifies the GPU driver version to install.
`cloud.google.com/gke-gpu-sharing-strategy`	`time-sharing` or `mps`	Set on GKE nodes when GPU sharing is configured. GKE-specific counterpart to `scheduling.cast.ai/gpu-sharing-strategy`.
`cloud.google.com/gke-max-shared-clients-per-gpu`	integer	Set on GKE nodes when GPU sharing is configured. GKE-specific counterpart to `scheduling.cast.ai/gpu-shared`.
`gke-no-default-nvidia-gpu-device-plugin`	`true`	Set on GKE nodes when Cast AI manages GPU plugin deployment instead of the default GKE NVIDIA DaemonSet.
`cloud.google.com/gke-gpu-partition-size`	partition size	Set on GKE nodes with MIG enabled. Indicates the MIG partition profile for the node.

For GPU provisioning setup, workload configuration examples, and sharing strategies, see GPU Instances. For MIG partitioning, see GPU sharing with MIG. For time-slicing, see GPU sharing with time-slicing.

Rebalancing

Label	Value	When Applied
`rebalancing.cast.ai/plan-id`	UUID	Set on green nodes created as part of a rebalancing plan. Used to associate the node with its rebalancing operation.
`rebalancing.cast.ai/operation-id`	UUID	Set on green nodes to identify the specific operation within a rebalancing plan.
`scheduling.cast.ai/delete-reason`	plugin name	Set when the autoscaler marks a node for deletion.
`autoscaling.cast.ai/draining`	reason string	Set on nodes being drained by the rebalancer or evictor. Possible values: `rebalancing`, `aws-rebalance-recommendation`, `spot-prediction`, `spot-fallback`, `spot-interruption`, `evictor`. Applied alongside `TaintNodeDraining`.

For information on rebalancing operations and how to prepare workloads, see Rebalancing and Workload preparation. To understand how Evictor and Rebalancer work together, see Evictor vs. Rebalancer.

Removal Control

Label	Value	When Applied
`autoscaling.cast.ai/removal-disabled`	`true`	Set on a node or pod to prevent the rebalancer from evicting pods from that node.
`autoscaling.cast.ai/removal-disabled-until`	Unix timestamp (seconds)	Set to temporarily prevent removal until the given timestamp. Applied during green node initialization to protect it from premature rebalancing.
`autoscaling.cast.ai/live-migration-disabled`	`true`	Pod-level label/annotation. Disables live migration for pods that cannot tolerate it.

kubectl: Query removal-protected nodes and pods

  # List nodes protected from removal
  kubectl get nodes -l autoscaling.cast.ai/removal-disabled=true

  # List pods protected from removal
  kubectl get pods -A -l autoscaling.cast.ai/removal-disabled=true

  # List pods opted out of live migration
  kubectl get pods -A -l autoscaling.cast.ai/live-migration-disabled=true

For details on Evictor override rules and advanced configuration, see Evictor. For Container Live Migration opt-out behavior, see CLM Labels, Annotations, and Events.

Live Migration (CLM / LIVE)

Label	Value	When Applied
`live.cast.ai/install`	varies	Set by Cast AI to indicate CLM (Cluster Live Migration) component installation status on the node.
`live.cast.ai/migration-enabled`	`true`	Set by the LIVE daemonset once all LIVE components are installed and operational on the node. Indicates the node is eligible as both migration source and destination.

kubectl: Check CLM status

  # List nodes with Container Live Migration enabled
  kubectl get nodes -l live.cast.ai/migration-enabled=true

  # List pods eligible for live migration
  kubectl get pods -A -l live.cast.ai/migration-enabled=true

  # Monitor active migrations
  kubectl get migrations -A -w

For an overview of Container Live Migration, see Container Live Migration. For setup instructions, see Getting started with CLM. For the full CLM labels and annotations reference, see CLM Labels, Annotations, and Events.

Predictions / ML

Label	Value	When Applied
`predictions.cast.ai/ttl-minutes`	integer	Set on nodes that were rebalanced due to an ML-predicted spot interruption. Specifies how many minutes the replacement node should be kept alive after the original was evacuated.

Volume Support

Label	Value	When Applied
`volume.scheduling.cast.ai/{volume-name}`	`true`	Set per storage volume/class to indicate the node supports that volume type.

Topology

Label	Value	When Applied
`topology.cast.ai/csp`	`aws`, `gcp`, `azure`	Set at node creation. Identifies the cloud service provider.
`topology.cast.ai/subnet-id`	subnet ID	Set at node creation. Identifies the subnet the node was provisioned in.
`topology.cast.ai/pod-subnet-id`	subnet ID	Set on AKS nodes. Identifies the subnet used for pod IP allocation.
`topology.cast.ai/resource-group`	resource group	Set on AKS nodes. Azure resource group containing the node.
`topology.cast.ai/virtual-network`	vnet name	Set on AKS nodes. Azure virtual network name.
`topology.cast.ai/subscription-id`	subscription ID	Set on AKS nodes. Azure subscription ID.
`topology.disk.csi.azure.com/zone`	AZ name	Set on Azure nodes. Availability zone for Azure CSI disk topology.
`topology.ebs.csi.aws.com/zone`	AZ name	Set on AWS nodes. Availability zone for EBS CSI disk topology.
`network-tag.gcp.cast.ai/{tag-name}`	`true`	Set on GCP nodes for each network tag associated with the node. Prefix-based, one label per tag.
`topology.kubernetes.io/region`	region	Standard Kubernetes label. Set at node creation with the cloud region.
`topology.kubernetes.io/zone`	AZ name	Standard Kubernetes label. Set at node creation with the availability zone.
`topology.gke.io/zone`	AZ name	GKE-specific zone label.

For information on configuring pod placement by topology, see Pod placement. For subnet configuration, see Subnets.

Cloud Provider Specific

AKS (Azure)

Label	Value	When Applied
`kubernetes.azure.com/agentpool`	pool name	Set to `castai` for Cast AI provisioned nodes. Required by AKS for node pool membership.
`kubernetes.azure.com/cluster`	cluster name	Set to identify the AKS cluster.
`kubernetes.azure.com/mode`	`system` or `user`	Set to `system` on system node pools.
`kubernetes.azure.com/role`	`agent`	Set on all AKS agent nodes.
`agentpool` (deprecated)	pool name	Deprecated AKS agent pool label (pre k8s 1.24). Replaced by `kubernetes.azure.com/agentpool`.
`kubernetes.io/role` (deprecated)	role	Deprecated AKS role label (pre k8s 1.24).

EKS (AWS)

Label	Value	When Applied
`eks.amazonaws.com/compute-type`	`fargate`	Set on Fargate nodes. Used to distinguish Fargate from EC2 node types.

Standard Kubernetes Labels (set by autoscaler)

Label	Value	When Applied
`kubernetes.io/arch`	`amd64`, `arm64`	Set at node creation based on instance architecture.
`kubernetes.io/os`	`linux`, `windows`	Set at node creation based on instance OS.
`beta.kubernetes.io/arch`	`amd64`, `arm64`	Legacy beta arch label, set alongside `kubernetes.io/arch` for backwards compatibility.
`beta.kubernetes.io/os`	`linux`, `windows`	Legacy beta OS label.
`node.kubernetes.io/instance-type`	instance type name	Standard label set to the cloud provider instance type (e.g. `m5.xlarge`).
`kubernetes.io/hostname`	hostname	Standard hostname label.

Resource Offering

Label	Value	When Applied
`autoscaling.cast.ai/provisioned-resource-offering`	offering type	Set at node creation to record the resource offering type used for provisioning.

OMNI Edge

Label	Value	When Applied
`virtual-node.omni.cast.ai/not-allowed`	`true`	Set on nodes that are not allowed to run workloads in the OMNI edge context.

Taints

Rebalancing Taints

Key	Value	Effect	When Applied
`rebalancing.cast.ai/preparing`	(none)	`NoSchedule`	Applied to green (replacement) nodes during rebalancing plan execution. Prevents new pods from being scheduled until the node is fully prepared. Removed once the node is ready to receive workloads.
`scheduling.cast.ai/pod-pinning-preparing`	(none)	`NoSchedule`	Applied during rebalancing when pod pinning is enabled. Prevents scheduling until pod pinning setup is complete.
`autoscaling.cast.ai/draining`	`true`	`NoSchedule`	Applied to blue (old) nodes when they are being drained during rebalancing or spot interruption handling. Applied alongside the `autoscaling.cast.ai/draining` label. Also present on nodes that fail to drain when graceful eviction is enabled. Removed automatically when the node is uncordoned.
`provisioner.cast.ai/uninitialized`	(none)	`NoSchedule`	Applied to newly provisioned nodes before initialization is complete. Prevents pod scheduling until the node is fully set up.

For details on how rebalancing operates and the blue/green node replacement process, see Rebalancing. For paused drain configuration, see Paused drain configuration.

Lifecycle / Spot Taint

Key	Value	Effect	When Applied
`scheduling.cast.ai/spot`	(none)	`NoSchedule`	Applied to spot nodes when the lifecycle taint feature is enabled in node template configuration. Requires pods to explicitly tolerate spot instances. Only applied when both spot and on-demand nodes exist in the cluster.

For workload configuration patterns using spot tolerations and node selectors, see Spot Instances.

Node Template Taint

Key	Value	Effect	When Applied
`scheduling.cast.ai/node-template`	template name (e.g. `default-by-castai`)	`NoSchedule`	Applied to all nodes provisioned via a node template.

Scoped Autoscaler Taint

Key	Value	Effect	When Applied
`scheduling.cast.ai/scoped-autoscaler`	(none)	`NoSchedule`	Applied when the scoped autoscaler feature is enabled for a node template. Restricts scheduling to workloads explicitly intended for the scoped autoscaler.

Storage Optimization Taint

Key	Value	Effect	When Applied
`scheduling.cast.ai/storage-optimized`	(none)	`NoSchedule`	Applied to storage-optimized nodes. Requires workloads to explicitly tolerate storage-optimized instances. See Storage-optimized nodes.

GPU / Accelerators Taints

Key	Value	Effect	When Applied
`nvidia.com/gpu`	`true`	`NoSchedule`	Applied by the autoscaler during GPU node provisioning. Restricts scheduling to GPU-tolerant workloads, preventing non-GPU pods from consuming GPU nodes.
`nvidia.com/gpu.mig`	`true`	`NoSchedule`	Applied by the autoscaler when provisioning nodes with MIG (Multi-Instance GPU) partitioning enabled. Restricts to MIG-compatible workloads.
`aws.amazon.com/neuron`	`true`	`NoSchedule`	Applied to AWS Inferentia/Trainium nodes. Restricts scheduling to workloads that explicitly require Neuron accelerators.

For GPU workload configuration examples and toleration patterns, see GPU Instances.

Architecture Taint

Key	Value	Effect	When Applied
`kubernetes.io/arch`	`arm64`	`NoSchedule`	Applied to ARM64 nodes. Requires pods to tolerate ARM64 architecture, preventing x86-only images from being scheduled on ARM nodes.

Eviction Taints (applied by Evictor)

Key	Value	Effect	When Applied
`evictor.cast.ai/evicting`	(none)	`NoSchedule`	Applied by Evictor when it begins draining a node. Blocks new pod scheduling while the node is being evacuated.
`evictor.cast.ai/evicted`	(none)	`NoSchedule` / `PreferNoSchedule`	Applied by Evictor after the node has been fully drained. With the default hard cordon flow, the effect is `NoSchedule`. When soft tainting is enabled, the effect is `PreferNoSchedule`, allowing pods to schedule onto the node if no other capacity is available, until the node is deleted.

For Evictor operating modes, override rules, and advanced configuration, see Evictor.

OMNI Edge Taint

Key	Value	Effect	When Applied
`virtual-node.omni.cast.ai/not-allowed`	`true`	`NoExecute`	Applied to OMNI virtual nodes that are not allowed to run workloads. Evicts any existing pods in addition to blocking new scheduling.

Annotations (related, node-level)

These annotations are not labels or taints but are closely related and set on nodes:

Annotation	Value	Purpose
`autoscaling.cast.ai/removal-delay-seconds`	integer	Sets a removal delay in seconds for a node before it can be deleted.
`autoscaling.cast.ai/paused-draining-until`	timestamp	Pauses the draining process on a node until the given timestamp.
`rebalancing.cast.ai/status`	`drain-failed`	Set on a node when a drain operation during rebalancing has failed.
`rebalancing.cast.ai/uncordon-after`	timestamp	Set on drain-failed nodes when graceful eviction is enabled. Records the time after which the node will be automatically uncordoned. Default: 3 hours after the drain failure.
`evictor.cast.ai/eviction-status`	varies; `done` when soft tainting is enabled	Tracks the current eviction status of a node. When soft tainting is enabled, Evictor sets this to `done` after draining the node. The Cast AI Autoscaler uses this value to identify nodes ready for deletion.
`predictions.cast.ai/remove-after`	timestamp	Set on nodes rebalanced due to ML spot interruption predictions. Indicates when the node should be removed.

kubectl: Query nodes by annotation

  # Find nodes with a removal delay configured
  kubectl get nodes -o json | jq -r '.items[] | select(.metadata.annotations["autoscaling.cast.ai/removal-delay-seconds"] != null) | .metadata.name'

  # Find nodes with paused draining
  kubectl get nodes -o json | jq -r '.items[] | select(.metadata.annotations["autoscaling.cast.ai/paused-draining-until"] != null) | "\(.metadata.name) paused until \(.metadata.annotations["autoscaling.cast.ai/paused-draining-until"])"'

  # Find nodes with failed drain operations
  kubectl get nodes -o json | jq -r '.items[] | select(.metadata.annotations["rebalancing.cast.ai/status"] == "drain-failed") | .metadata.name'

For configuring paused drain behavior during rebalancing, see Paused drain configuration.