Pod placement
Learn how to place pods using labels and other Kubernetes scheduling features
Configure pod placement by topology
This guide describes placing pods on a particular node, zone, region, cloud, etc., using labels and advanced Kubernetes scheduling features.
Kubernetes supports this step with the following methods:
- nodeSelector
- node affinity and anti-affinity
- topology spread constraints
- pod affinity and anti-affinity
All these methods require using specific labels on each Kubernetes node.
Supported labels
CAST AI supports the following labels:
Label | Type | Description | Example(s) |
---|---|---|---|
kubernetes.io/arch and beta.kubernetes.io/arch | well-known | Node CPU architecture | amd64 , arm64 |
kubernetes.io/os and beta.kubernetes.io/os | well-known | Node Operating System | linux , windows |
kubernetes.io/hostname | well-known | Node Hostname | ip-192-168-32-94.eu-central-1.compute.internal , testcluster-31qd-gcp-3ead |
topology.kubernetes.io/region and failure-domain.beta.kubernetes.io/region | well-known | Node region in the CSP | eu-central-1 , europe-central1 |
topology.kubernetes.io/zone and failure-domain.beta.kubernetes.io/zone | well-known | Node zone of the region in the CSP | eu-central-1a ,europe-central1-a |
provisioner.cast.ai/managed-by | CAST AI specific | CAST AI managed node | cast.ai |
provisioner.cast.ai/node-id | CAST AI specific | CAST AI node ID | 816d634e-9fd5-4eed-b13d-9319933c9ef0 |
scheduling.cast.ai/on-demand | CAST AI specific | Node resource offering type - on-demand | true |
scheduling.cast.ai/spot | CAST AI specific | Node resource offering type - spot | true |
scheduling.cast.ai/spot-fallback | CAST AI specific | A fallback for spot instance | true |
topology.cast.ai/subnet-id | CAST AI specific | Node subnet ID | subnet-006a6d1f18fc5d390 |
scheduling.cast.ai/storage-optimized | CAST AI specific | Local SSD attached node | true |
scheduling.cast.ai/compute-optimized | CAST AI specific | A compute-optimized instance | true |
Scheduling using pod affinities
For pod affinity, CAST AI supports the following labels:
Label | Type | Description |
---|---|---|
topology.kubernetes.io/zone and failure-domain.beta.kubernetes.io/zone | well-known | Node zone of the region in the CSP |
topology.gke.io/zone | GCP specific | Availability zone of a persistent disk |
topology.disk.csi.azure.com/zone | Azure specific | Availability zone of a persistent disk |
topology.ebs.csi.aws.com/zone | AWS specific | Availability zone of a persistent disk |
Important!
Currently, CAST AI does not support pod affinity using the
kubernetes.io/hostname
topology key.
Example
Let's consider an example of three pods (app=frontend
, app=backend
, app=cache
) that should run in the same zone (any zone) but not on the same node.
If you want the pods to run in any zone, you can not just specify the nodeSelector
, but also create a dependency chain between them.
First, you select one pod that will be the leading one (no affinities) so it can schedule freely. Let's say it's app=backend
.
Then, you will need to add pod affinities for the pod app=cache
:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: topology.kubernetes.io/zone
and for the app=frontend
:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend #to have the ability to run without cache
- cache
topologyKey: topology.kubernetes.io/zone
With this setup, all three pods will always run in the same zone.
Pod app=backend
will choose a node and zone, and the other two will follow.
However, all these pods can still be scheduled on the same node. To fix this, you must specify their pod anti-affinity for each other.
Example for app=frontend
:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
- cache
topologyKey: kubernetes.io/hostname
Highly-available pod scheduling
You can schedule pods in a highly available way using the topology spread constraints feature. CAST AI supports the following topology keys:
topology.kubernetes.io/zone
- enables your pods to be spread between availability zones, taking advantage of cloud redundancy.kubernetes.io/hostname
- enables your pods to be spread evenly between available hosts.
Note
CAST AI will only create nodes in different fault domains when the
whenUnstatisfiable
property equalsDoNotSchedule
. The valueScheduleAnyway
means that the spread is just a preference, so the autoscaler will keep binpacking those pods, which might result in scheduling them all on the same fault domain.
Important limitations
- CAST AI does not currently support multiple topology spread constraint terms in a single pod specification
- CAST AI does not currently support the
minDomains
option for topology spread constraints
The deployment below will be spread and scheduled in all availability zones supported by your cluster:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: az-topology-spread
name: az-topology-spread
spec:
replicas: 30
selector:
matchLabels:
app: az-topology-spread
template:
metadata:
labels:
app: az-topology-spread
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- az-topology-spread
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- az-topology-spread
containers:
- image: nginx
name: nginx
Scheduling on nodes with locally attached SSD
Storage-optimized nodes have local SSDs backed by NVMe drivers providing higher throughput and lower latency than standard disks. That's why it's ideal for workloads that require efficient local storage.
Pods can request a storage-optimized node by defining a nodeSelector
(or a required node affinity) and toleration for the label scheduling.cast.ai/storage-optimized
.
Furthermore, pods can control the amount of available storage by specifying ephemeral storage resource requests. If you don't specify resource requests, CAST AI will still provision a storage-optimized node, but the available storage amount will be the lowest possible based on the cloud offering.
Note
For AKS, the created nodes can be different from those specified in the Azure Storage Optimized VMs documentation as all instance types supporting attached SSDs will be considered storage-optimized.
This pod will be scheduled on a node with locally attached SSD disks:
apiVersion: v1
kind: Pod
metadata:
name: demopod
spec:
nodeSelector:
scheduling.cast.ai/storage-optimized: "true"
tolerations:
- key: scheduling.cast.ai/storage-optimized
operator: Exists
containers:
- name: app
image: nginx
resources:
requests:
ephemeral-storage: "2Gi"
volumeMounts:
- name: ephemeral
mountPath: "/tmp"
volumes:
- name: ephemeral
emptyDir: {}
Scheduling on compute-optimized nodes
Compute-optimized instances are ideal for compute-bound applications that benefit from high-performance processors. They offer the highest consistent performance per core to support real-time application performance.
This pod will be scheduled on a compute-optimized instance:
apiVersion: v1
kind: Pod
metadata:
name: demopod
spec:
nodeSelector:
scheduling.cast.ai/compute-optimized: "true"
containers:
- name: app
image: nginx
Scheduling on ARM nodes
ARM processors are designed to deliver the best price-performance for your workloads.
This pod will be scheduled on an ARM instance:
apiVersion: v1
kind: Pod
metadata:
name: demo-arm-pod
spec:
nodeSelector:
kubernetes.io/arch: "arm64"
containers:
- name: app
image: nginx
When utilizing a multi-architecture Node template, also use the nodeSelector
kubernetes.io/arch: "arm64"
to ensure that the pod lands on an ARM node.
Note
CAST AI added ARM nodes do not have any taint applied by default.
Scheduling on Windows nodes
Note
Autoscaling using Windows nodes is supported only for AKS clusters and has to be first enabled for the organization in order to leverage it. Contact the support team for more details.
By default, CAST AI creates Windows Server 2019 nodes when a workload has the nodeSelector
kubernetes.io/os: windows
. However, if multiple versions of Windows need to be handled, an additional nodeSelector
node.kubernetes.io/windows-build
has to be provided to ensure pods are scheduled on the nodes with the correct Windows Server version.
To schedule a pod on a Windows node with the required version, use the following nodeSelectors
:
nodeSelector:
kubernetes.io/os: windows
node.kubernetes.io/windows-build: '10.0.20348'
nodeSelector:
kubernetes.io/os: windows
node.kubernetes.io/windows-build: '10.0.17763'
How to isolate specific workloads
It's best practice to set workload requests and limits identically and distribute various workloads among all the nodes in the cluster so that the Law of Averages can provide the best performance and availability.
However, in some edge cases, it may be better to isolate volatile workloads to their nodes and avoid mixing them with other workloads in the same clusters. That's where you can use affinity.podAntiAffinity
:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: <any POD label>
operator: DoesNotExist
Pod example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: worklow-jobs
labels:
app: workflows
spec:
replicas: 2
selector:
matchLabels:
app: workflows
template:
metadata:
labels:
app: workflows
no-requests-workflows: "true"
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: no-requests-workflows
operator: DoesNotExist
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
resources:
requests:
cpu: 300m
Updated about 1 month ago