Configure pod placement by topology

This guide describes placing pods on a particular node, zone, region, cloud, etc., using labels and advanced Kubernetes scheduling features.

Kubernetes supports this step with the following methods:

All these methods require using specific labels on each Kubernetes node.

Supported labels

Cast AI supports the following labels:

Label	Type	Description	Example(s)
`kubernetes.io/arch` and `beta.kubernetes.io/arch`	well-known	Node CPU architecture	`amd64`, `arm64`
`kubernetes.io/os` and `beta.kubernetes.io/os`	well-known	Node Operating System	`linux`, `windows`
`kubernetes.io/hostname`	well-known	Node Hostname	`ip-192-168-32-94.eu-central-1.compute.internal`, `testcluster-31qd-gcp-3ead`
`topology.kubernetes.io/region` and `failure-domain.beta.kubernetes.io/region`	well-known	Node region in the CSP	`eu-central-1`, `europe-central1`
`topology.kubernetes.io/zone` and `failure-domain.beta.kubernetes.io/zone`	well-known	Node zone of the region in the CSP	`eu-central-1a`,`europe-central1-a`
`provisioner.cast.ai/managed-by`	Cast-specific	Cast AI managed node	`cast.ai`
`provisioner.cast.ai/node-id`	Cast-specific	Cast AI node ID	`816d634e-9fd5-4eed-b13d-9319933c9ef0`
`scheduling.cast.ai/spot`	Cast-specific	Node resource offering type - spot	`true`
`scheduling.cast.ai/spot-fallback`	Cast-specific	A fallback for Spot Instance	`true`
`topology.cast.ai/subnet-id`	Cast-specific	Node subnet ID	`subnet-006a6d1f18fc5d390`
`scheduling.cast.ai/storage-optimized`	Cast-specific	Local SSD attached node	`true`
`scheduling.cast.ai/compute-optimized`	Cast-specific	A compute-optimized instance	`true`

Scheduling using pod affinities

For pod affinity, Cast AI supports the following labels:

Label	Type	Description
`topology.kubernetes.io/zone` and `failure-domain.beta.kubernetes.io/zone`	well-known	Node zone of the region in the CSP
`topology.gke.io/zone`	GCP specific	Availability zone of a persistent disk
`topology.disk.csi.azure.com/zone`	Azure specific	Availability zone of a persistent disk
`topology.ebs.csi.aws.com/zone`	AWS specific	Availability zone of a persistent disk

🚧
Important!
Currently, Cast AI does not support pod affinity using the kubernetes.io/hostname topology key.

Example

Let's consider an example of three pods (app=frontend, app=backend, app=cache) that should run in the same zone (any zone) but not on the same node.

If you want the pods to run in any zone, you can not just specify the nodeSelector, but also create a dependency chain between them.

First, you select one pod that will be the leading one (no affinities) so it can schedule freely. Let's say it's app=backend.

Then, you will need to add pod affinities for the pod app=cache:

podAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: app
        operator: In
        values:
        - backend
    topologyKey: topology.kubernetes.io/zone

and for the app=frontend:

podAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: app
        operator: In
        values:
        - backend #to have the ability to run without cache
        - cache
    topologyKey: topology.kubernetes.io/zone

With this setup, all three pods will always run in the same zone.

Pod app=backend will choose a node and zone, and the other two will follow.

However, all these pods can still be scheduled on the same node. To fix this, you must specify their pod anti-affinity for each other.

Example for app=frontend:

podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: app
        operator: In
        values:
        - backend
        - cache
    topologyKey: kubernetes.io/hostname

Highly-available pod scheduling

You can schedule pods in a highly available way using the topology spread constraints feature. Cast AI supports the following topology keys:

topology.kubernetes.io/zone - enables your pods to be spread between availability zones, taking advantage of cloud redundancy.
kubernetes.io/hostname - enables your pods to be spread evenly between available hosts.

📘
Note
Cast AI will only create nodes in different fault domains when the whenUnstatisfiable property equals DoNotSchedule. The value ScheduleAnyway means that the spread is just a preference, so the autoscaler will keep binpacking those pods, which might result in scheduling them all on the same fault domain.

🚧
Important limitations

When multiple topology spread constraints are specified:

Cast AI prioritizes zonal topology keys (like topology.kubernetes.io/zone) for node provisioning decisions. If multiple zonal constraints exist, the last one in the list will be used.

Non-zonal topology constraints (like kubernetes.io/hostname) influence node provisioning indirectly. Cast AI uses the Kubernetes scheduler to validate if pods can fit on prospective nodes, for example, so these constraints may trigger additional node creation if the scheduler cannot place pods.

Cast AI does not currently support the minDomains option for topology spread constraints.

The deployment below will be spread and scheduled across all availability zones supported by your cluster:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: az-topology-spread
  name: az-topology-spread
spec:
  replicas: 30
  selector:
    matchLabels:
      app: az-topology-spread
  template:
    metadata:
      labels:
        app: az-topology-spread
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - az-topology-spread
      containers:
        - image: nginx
          name: nginx

Scheduling on nodes with locally attached SSD

Storage-optimized nodes have local SSDs backed by NVMe drivers, which provide higher throughput and lower latency than standard disks. That's why they're ideal for workloads that require efficient local storage.

Pods can request a storage-optimized node by defining a nodeSelector (or a required node affinity) and toleration for the label scheduling.cast.ai/storage-optimized.

Furthermore, pods can control the amount of available storage by specifying ephemeral storage resource requests. If you don't specify resource requests, Cast AI will still provision a storage-optimized node, but the available storage amount will be the lowest possible based on the cloud offering.

📘
Note
For AKS, the created nodes can be different from those specified in the Azure Storage Optimized VMs documentation as all instance types supporting attached SSDs will be considered storage-optimized.

This pod will be scheduled on a node with locally attached SSD disks:

apiVersion: v1
kind: Pod
metadata:
  name: demopod
spec:
  nodeSelector:
    scheduling.cast.ai/storage-optimized: "true"
  tolerations:
    - key: scheduling.cast.ai/storage-optimized
      operator: Exists
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        ephemeral-storage: "2Gi"
    volumeMounts:
    - name: ephemeral
      mountPath: "/tmp"
  volumes:
    - name: ephemeral
      emptyDir: {}

Scheduling on compute-optimized nodes

Compute-optimized instances are ideal for compute-bound applications that benefit from high-performance processors. They offer the highest consistent performance per core to support real-time application performance.

This pod will be scheduled on a compute-optimized instance:

apiVersion: v1
kind: Pod
metadata:
  name: demopod
spec:
  nodeSelector:
    scheduling.cast.ai/compute-optimized: "true"
  containers:
  - name: app
    image: nginx

Scheduling on ARM nodes

ARM processors are designed to deliver the best price-performance for your workloads.

This pod will be scheduled on an ARM instance:

apiVersion: v1
kind: Pod
metadata:
  name: demo-arm-pod
spec:
  nodeSelector:
    kubernetes.io/arch: "arm64"
  containers:
  - name: app
    image: nginx

When utilizing a multi-architecture Node template, also use the nodeSelector kubernetes.io/arch: "arm64" to ensure that the pod lands on an ARM node.

📘
Note
Cast AI added ARM nodes do not have any taint applied by default.

Scheduling on Windows nodes

📘
Note
Autoscaling using Windows nodes is supported only for AKS clusters and has to be first enabled for the organization in order to leverage it. Contact the support team for more details.

By default, Cast AI creates Windows Server 2019 nodes when a workload has the nodeSelector kubernetes.io/os: windows. However, if multiple versions of Windows need to be handled, an additional nodeSelector node.kubernetes.io/windows-build has to be provided to ensure pods are scheduled on the nodes with the correct Windows Server version.

To schedule a pod on a Windows node with the required version, use the following nodeSelectors:

nodeSelector:
  kubernetes.io/os: windows
  node.kubernetes.io/windows-build: '10.0.20348'

nodeSelector:
  kubernetes.io/os: windows
  node.kubernetes.io/windows-build: '10.0.17763'

How to isolate specific workloads

It's best practice to set workload requests and limits identically and distribute various workloads among all the nodes in the cluster so that the Law of Averages can provide the best performance and availability.

However, in some edge cases, it may be better to isolate volatile workloads to their nodes and avoid mixing them with other workloads in the same clusters. That's where you can use affinity.podAntiAffinity:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: kubernetes.io/hostname
      labelSelector:
        matchExpressions:
        - key: <any POD label>
          operator: DoesNotExist

Pod example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worklow-jobs
  labels:
    app: workflows
spec:
  replicas: 2
  selector:
    matchLabels:
      app: workflows
  template:
    metadata:
      labels:
        app: workflows
        no-requests-workflows: "true"
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: kubernetes.io/hostname
            labelSelector:
              matchExpressions:
              - key: no-requests-workflows
                operator: DoesNotExist
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 300m