Node Templates
What is it?
Node Templates are part of the Autoscaler component. They allow you to define virtual buckets of constraints such as:
- instance types to be used;
- lifecycle of the nodes to add;
- node provisioning configurations;
- and other various properties.
CAST AI will respect all the above settings when creating nodes for your workloads to run on.
UI is available at CAST AI console and you can find it in: Cluster --> Autoscaler --> Node Templates
Main attributes
Attribute | Description | Availability per Cloud provider |
---|---|---|
Custom labels and taints | Choose this if custom labels or taints should be applied on the nodes created by CAST AI. Based on the selected properties, NodeSelector and Toleration must be applied to the workload to trigger the autoscaling. | All |
Node configuration link | By default, node templates mostly focus on the type of resources that need to be scheduled. To learn how to configure the scheduling of your resources and how to use other kinds of attributes on provisioned nodes, see Node configuration | All |
Lifecycle | This attribute tells the Autoscaler which node lifecycle (spot or on-demand) it should create. There is an option to configure both lifecycles, in which case workloads targeted to run on spot nodes must be marked accordingly. Check this guide. Additional configuration settings available when you want the Autoscaler to use spot nodes: Spot fallback – when spot capacity is unavailable in the cloud, CAST AI will create temporary on-demand fallback nodes. Interruption prediction model - this feature works only for AWS customers: CAST AI can react to AWS rebalancing notifications or its own ML model to predict spot interruptions and rebalance affected nodes proactively. See this guide for more details. Diversified spot instances - by default, CAST AI seeks the most cost-effective instances without assessing your cluster's current composition. To limit the impact of a potential mass spot reclaim, you can instruct the Autoscaler to evaluate and enhance the diversity of spot nodes in the cluster, but this may increase your costs. Read more | The interruption prediction model is only available for AWS |
Processor architecture | You can select x86_64, ARM64, or both architecture nodes to be created by CAST AI. When using a multi-architecture Node Template, also use the NodeSelector kubernetes.io/arch: "arm64" to ensure that the Pod lands on an ARM node. | AWS, GCP |
GPU-enabled instances | Choose this attribute to run workloads on GPU-enabled instances only. Once you select it, Instance constraints get enhanced with GPU-related properties. | AWS, GCP |
Apply Instance constraints | Apply additional constraints on instances to be selected, such as: - Instance Family; - Min/Max CPU; - Min/Max Memory; - Compute-optimized; - Storage-optimized; - GPU manufacturer, name, count. | All |
Create a Node Template
-
You have the following options to create node templates:
- Create Node Template through the API
- Create Node Template through the UI ( Cluster --> Autoscaler --> Node Templates )
- Terraform
-
Sometimes, when you create a Node Template, you may want to associate it with custom node configurations to be used when provisioning nodes. You can achieve this by linking the template with Node configuration
Using the shouldTaint
flag
shouldTaint
flagWhile creating a Node Template, you can choose if the nodes created by the CAST AI Autoscaler should be tainted or not. This is controlled through shouldTaint
property in the API payload.
When shouldTaint is set to false
Since no taints will be applied on the nodes created by CAST AI, any pods being deployed to the cluster, even the ones without nodeSelector, might get scheduled on these nodes. This effect might not always be desirable.
When shouldTaint
is set to true
shouldTaint
is set to true
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
nodeSelector:
scheduling.cast.ai/node-template: spark-jobs
tolerations:
- key: scheduling.cast.ai/node-template
value: spark-jobs
operator: Equal
effect: NoSchedule
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
When shouldTaint
is set to false
shouldTaint
is set to false
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
nodeSelector:
scheduling.cast.ai/node-template: spark-jobs
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
Using nodeSelector
nodeSelector
You can use nodeSelector
to schedule pods on the nodes created using the template. By default, you construct nodeSelector
using the template name. However, you may choose to use a custom label to fit your use case better.
Using a node template name in nodeSelector
nodeSelector
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
nodeSelector:
scheduling.cast.ai/node-template: spark-jobs
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
Using multiple custom labels in nodeSelector
nodeSelector
In case you have a node template with multiple custom labels custom-label-key-1=custom-label-value-1
and custom-label-key-2=custom-label-value-2
. You can schedule your pods on a node created using that node template by providing nodeSelector
with all the custom labels as described below:
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
nodeSelector:
custom-label-key-1: custom-label-value-1
custom-label-key-2: custom-label-value-2
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
Using nodeAffinity
nodeAffinity
You can use nodeAffinity
to schedule pods on the nodes created using the template. By default, you construct nodeAffinity
using the template name. However, you may choose to use a custom label to fit your use case better. The only supported nodeAffinity
operator is In
.
Using the node template name in nodeAffinity
nodeAffinity
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: scheduling.cast.ai/node-template
operator: In
values:
- "spark-jobs"
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
Using a node template's custom labels in nodeAffinity
nodeAffinity
In case you have a node template with multiple custom labels custom-label-key-1=custom-label-value-1
and custom-label-key-2=custom-label-value-2
. You can schedule your pods on a node created using that node template by providing nodeAffinity
with all the custom labels as described below:
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: custom-label-key-1
operator: In
values:
- "custom-label-value-1"
- key: custom-label-key-2
operator: In
values:
- "custom-label-value-2"
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
Using a mix of Affinity and NodeSelectors
In case you have a node template with multiple custom labels custom-label-key-1=custom-label-value-1
and custom-label-key-2=custom-label-value-2
, you can schedule your pods on a node created using that node template by providing nodeAffinity
and nodeSelector
with all the custom labels as described below:
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: custom-label-key-1
operator: In
values:
- "custom-label-value-1"
custom-label-key-2: custom-label-value-2
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
More Specific Requirements/Constraints
Templates also support further instance type selection refinement via additional affinities/nodeSelectors. This can be done if the additional constraints aren't in conflict with the template constraints.
Here's an example with some assumptions:
- The template has no constraints;
- The template is named my-template;
- The template has custom labels enabled and they are as follows:
product: "my-product"
team: "my-team"
Here's an example deployment that would further specify what a pod supports/needs:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "team"
operator: In
values: ["my-team"]
# Pick only nodes that are compute-optimized
- key: "scheduling.cast.ai/compute-optimized"
operator: In
values: ["true"]
# Pick only nodes that are storage-optimized
- key: "scheduling.cast.ai/storage-optimized"
operator: In
values: ["true"]
nodeSelector:
# template selector (can also be in affinities)
product: "my-product"
team: "my-team"
# Storage optimized nodes will also have a taint, so we need to tolerate it.
- key: "scheduling.cast.ai/storage-optimized"
operator: Exists
# toleration for the template
- key: "scheduling.cast.ai/node-template"
value: "my-template"
operator: "Equal"
effect: "NoSchedule"
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
resources:
requests:
cpu: 700m
memory: 700Mi
limits:
memory: 700Mi
Using this example, you'd get a storage-optimized, compute-optimized node on a template that doesn't have those requirements for the whole pool.
Note: If a template has one of those constraints (on the template itself), there is currently no ability to loosen requirements for some pods based on their affinity/selectors.
Updated about 1 month ago