Configure pod placement by topology

This guide describes how to place pods on a particular node, zone, region, cloud, etc., using labels and advanced Kubernetes scheduling features.

Kubernetes supports this step with the following methods:

All these methods require using specific labels on each Kubernetes node.

Supported labels

CAST AI supports the following labels:

Label	Type	Description	Example(s)
`kubernetes.io/arch` and `beta.kubernetes.io/arch`	well-known	Node CPU architecture	`amd64`, `arm64`
`kubernetes.io/os` and `beta.kubernetes.io/os`	well-known	Node Operating System	`linux`
`kubernetes.io/hostname`	well-known	Node Hostname	`ip-192-168-32-94.eu-central-1.compute.internal`, `testcluster-31qd-gcp-3ead`
`topology.kubernetes.io/region` and `failure-domain.beta.kubernetes.io/region`	well-known	Node region in the CSP	`eu-central-1`, `europe-central1`
`topology.kubernetes.io/zone` and `failure-domain.beta.kubernetes.io/zone`	well-known	Node zone of the region in the CSP	`eu-central-1a`,`europe-central1-a`
`provisioner.cast.ai/managed-by`	CAST AI specific	CAST AI managed node	`cast.ai`
`provisioner.cast.ai/node-id`	CAST AI specific	CAST AI node ID	`816d634e-9fd5-4eed-b13d-9319933c9ef0`
`scheduling.cast.ai/spot`	CAST AI specific	Node lifecycle type - spot	`true`
`scheduling.cast.ai/spot-fallback`	CAST AI specific	A fallback for spot instance	`true`
`topology.cast.ai/subnet-id`	CAST AI specific	Node subnet ID	`subnet-006a6d1f18fc5d390`
`scheduling.cast.ai/storage-optimized`	CAST AI specific	Local SSD attached node	`true`
`scheduling.cast.ai/compute-optimized`	CAST AI specific	A compute-optimized instance	`true`

Scheduling using pod affinities

For pod affinity, CAST AI supports the following labels:

Label	Type	Description
`topology.kubernetes.io/zone` and `failure-domain.beta.kubernetes.io/zone`	well-known	Node zone of the region in the CSP
`topology.gke.io/zone`	GCP specific	Availability zone of a persistent disk
`topology.disk.csi.azure.com/zone`	Azure specific	Availability zone of a persistent disk
`topology.ebs.csi.aws.com/zone`	AWS specific	Availability zone of a persistent disk

🚧
Important!
Currently, CAST AI does not support pod affinity using the kubernetes.io/hostname topology key

Example

Let's consider an example of three pods (app=frontend, app=backend, app=cache) that should run in the same zone (any zone), but not on the same node.

If you want the pods to run in any zone, you can not just specify the nodeSelector, but also create a dependency chain between them.

First, you select one pod that will be the leading one (no affinities), so it can schedule freely. Let's say it's app=backend.

Then, you will need to add pod affinities for the pod app=cache:

podAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: app
        operator: In
        values:
        - backend
    topologyKey: topology.kubernetes.io/zone

and for the app=frontend:

podAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: app
        operator: In
        values:
        - backend #to have the ability to run without cache
        - cache
    topologyKey: topology.kubernetes.io/zone

With this setup, all three pods will always run in the same zone.

Pod app=backend will choose a node and zone, and the other two will follow.

However, all these pods can still schedule on the same node. To fix this, you need to specify their pod anti-affinity for each other.

Example for app=frontend:

podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: app
        operator: In
        values:
        - backend
        - cache
    topologyKey: kubernetes.io/hostname

Highly-available pod scheduling

You can schedule pods in a highly-available way using the topology spread constraints feature. CAST AI supports zone topology key:

topology.kubernetes.io/zone - enables your pods to be spread between availability zones, taking advantage of cloud redundancy.

📘
CAST AI will only create nodes in different fault domains when the whenUnstatisfiable property equals DoNotSchedule. The value ScheduleAnyway means that the spread is just a preference, so the autoscaler will keep binpacking those pods, which might result in scheduling them all on the same fault domain.

The deployment below will be spread and scheduled in all availability zones supported by your cluster:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: az-topology-spread
  name: az-topology-spread
spec:
  replicas: 30
  selector:
    matchLabels:
      app: az-topology-spread
  template:
    metadata:
      labels:
        app: az-topology-spread
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - az-topology-spread
      containers:
        - image: nginx
          name: nginx

Scheduling on nodes with locally attached SSD

Storage-optimized nodes have local SSDs backed by NVMe drivers providing higher throughput and lower latency than standard disks. That's why it's ideal for workloads that require efficient local storage.

Pods can request a storage-optimized node by defining a nodeSelector (or a required node affinity) and toleration for label scheduling.cast.ai/storage-optimized.

Furthermore, pods can control the amount of available storage by specifying ephemeral storage resource requests. If you don't specify resource requests, CAST AI will still provision a storage-optimized node, but the available storage amount will be the lowest possible based on the cloud offering.

📘
For AKS, the created nodes can be different from those specified in the Azure Storage Optimized VMs documentation as all instance types supporting attached SSDs will be considered storage-optimized.

This pod will be scheduled on a node with locally attached SSD disks:

apiVersion: v1
kind: Pod
metadata:
  name: demopod
spec:
  nodeSelector:
    scheduling.cast.ai/storage-optimized: "true"
  tolerations:
    - key: scheduling.cast.ai/storage-optimized
      operator: Exists
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        ephemeral-storage: "2Gi"
    volumeMounts:
    - name: ephemeral
      mountPath: "/tmp"
  volumes:
    - name: ephemeral
      emptyDir: {}

Scheduling on compute-optimized nodes

Compute-optimized instances are ideal for compute-bound applications that benefit from high-performance processors. They offer the highest consistent performance per core to support real-time application performance.

This pod will be scheduled on a compute-optimized instance:

apiVersion: v1
kind: Pod
metadata:
  name: demopod
spec:
  nodeSelector:
    scheduling.cast.ai/compute-optimized: "true"
  containers:
  - name: app
    image: nginx

Scheduling on ARM nodes

ARM processors are designed to deliver the best price-performance for your workloads.

CAST AI currently supports ARM nodes on the following clouds:

AWS
GCP
Azure

This pod will be scheduled on an ARM instance:

apiVersion: v1
kind: Pod
metadata:
  name: demo-arm-pod
spec:
  nodeSelector:
    kubernetes.io/arch: "arm64"
  containers:
  - name: app
    image: nginx

When utilizing a multi-architecture Node template, also use the nodeSelector kubernetes.io/arch: "arm64" to ensure that the pod lands on an ARM node.

📘
Please note that CAST AI added ARM nodes do not have any taint applied by default.

How to isolate specific workloads

It's best practice to set workload requests and limits identically and distribute various workloads among all the nodes in the cluster, so that the Law of Averages can provide the best performance and availability.

However, in some edge cases, it may be better to isolate volatile workloads to their nodes and avoid mixing them with other workloads in the same clusters. That's where you can use affinity.podAntiAffinity:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: kubernetes.io/hostname
      labelSelector:
        matchExpressions:
        - key: <any POD label>
          operator: DoesNotExist

Pod example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worklow-jobs
  labels:
    app: workflows
spec:
  replicas: 2
  selector:
    matchLabels:
      app: workflows
  template:
    metadata:
      labels:
        app: workflows
        no-requests-workflows: "true"
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: kubernetes.io/hostname
            labelSelector:
              matchExpressions:
              - key: no-requests-workflows
                operator: DoesNotExist
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 300m

Pod placement

Configure pod placement by topology

Supported labels

Scheduling using pod affinities

🚧
Important!

Example

Highly-available pod scheduling

📘

Scheduling on nodes with locally attached SSD

📘

Scheduling on compute-optimized nodes

Scheduling on ARM nodes

📘

How to isolate specific workloads

Configure pod placement by topology

Supported labels

Scheduling using pod affinities

🚧Important!

Example

Highly-available pod scheduling

📘

Scheduling on nodes with locally attached SSD

📘

Scheduling on compute-optimized nodes

Scheduling on ARM nodes

📘

How to isolate specific workloads

🚧
Important!