Node Templates
What are node templates?
Node templates are a key part of the Autoscaler component. They allow users to define virtual buckets of constraints and properties for nodes to be added to the cluster during upscaling. Users can:
- Specify various constraints to limit the inventory of instance types to be used.
- Select a preferred instance resource offering (e.g., build a spot-only cluster without the need for specific tolerations or selectors).
- Define how CAST AI should manage spot instance interruptions.
- Specify labels and taints to be applied on the nodes.
- Link a Node configuration to be applied when creating new nodes.
When a user enables CAST AI, the default Node template is created in the cluster. This template is essential and serves as the default Autoscaler behavior. In addition to the default template, users can set up multiple templates to suit their specific use case.
Feature | Default Node Template | Custom Node Template |
---|---|---|
Name | default-by-castai . | Custom (user-defined). |
Creation | Mandatory. Created by CAST AI during cluster onboarding. | Optional. Created by the user. |
Deletion | Can't be deleted or otherwise removed. | Can be deleted. |
Activation | On by default. Can be disabled. | Can be disabled. |
Usage in Autoscaling | Used when workloads are not explicitly designated to trigger cluster upscaling through other Node templates. | Activated by workloads utilizing autoscaling labels that are also defined in the Node template. |
Renaming | Can't be renamed. | Can't be renamed after creation. |
Setting as default | Is the default by design. Cannot be changed to non-default. Disabling it is recommended if you no longer wish to use it. | Cannot be set as default. Attempts to set isDefault=true via API will not update the template. |
Note
For the CAST AI autoscaler to function, the Unscheduled Pods policy and at least one Node template (default or not) must be enabled.
Main attributes
The table below outlines the Node template attributes and their availability across different cloud providers.
Attribute | Description | Availability per Cloud provider |
---|---|---|
Custom labels and taints | Choose this if custom labels or taints should be applied on the nodes created by CAST AI. Based on the selected properties, nodeSelector and toleration must be applied to the workload to trigger the autoscaling. | All |
Node configuration link | By default, Node templates mostly focus on the type of resources that need to be scheduled. To learn how to configure the scheduling of your resources and how to use other kinds of attributes on provisioned nodes, see Node configuration | All |
Resource offering | This attribute tells the Autoscaler which node resource offering (spot or on-demand) it should create. There is an option to configure both offerings, in which case workloads targeted to run on spot nodes must be marked accordingly. Check this guide. Additional configuration settings available when you want the Autoscaler to use spot nodes: Spot fallback: When spot capacity is unavailable in the cloud, CAST AI will create temporary on-demand fallback nodes. Interruption prediction model — This feature works only for AWS customers: CAST AI can react to AWS rebalancing notifications or its own ML model to predict spot interruptions and proactively rebalance affected nodes. See this guide for more details. Diversified spot instances — by default, CAST AI seeks the most cost-effective instances without assessing your cluster's current composition. To limit the impact of a potential mass spot reclaim, you can instruct the Autoscaler to evaluate and enhance the diversity of spot nodes in the cluster, but this may increase your costs. Read more | The interruption prediction model is only available for AWS |
Processor architecture | You can select x86_64 , ARM64 , or both architecture nodes to be created by CAST AI. When using a multi-architecture Node template, also use the nodeSelector kubernetes.io/arch: "arm64" to ensure that the Pod lands on an ARM node.Azure only. To provision ARM nodes, ensure that they are supported in the region and quota for Standard_DS2_v2 VMs is available. Contact the support team, as this feature has to be enabled for the organization. | All |
GPU-enabled instances | Choose this attribute to run workloads only on GPU-enabled instances. Once you select it, Instance constraints get enhanced with GPU-related properties. | AWS, GCP |
Apply Instance constraints | Apply additional constraints on instances to be selected, such as: - Instance Family; - Min/Max CPU; - Min/Max Memory; - Compute-optimized; - Storage-optimized; - GPU manufacturer, name, count. - Bare metal. AWS-only feature. - Burstable instances. AWS-only feature. - The list of AZ names to consider for the node template. If the list is empty or not set, all AZs are considered. When the subnets defined in the node configuration are zonal (e.g., in AWS or Azure), the effective AZs are determined as the intersection of the subnet AZs and the AZs specified in the node template. - OS selection (Windows 2019/2022 or Linux). Azure-only feature. Ensure that the quota for Standard_DS2_v2 VMs is available. Contact the support team, as this feature has to be enabled for the organization.- Customer-specific. Azure-only feature. When enabled, the inventory will include customer-specific (preview) instances; otherwise, they will be excluded. Contact the support team, as this feature must be enabled for the organization. | All |
Set up instance family prioritization | In cases where it is preferred for the Autoscaler to prioritize certain instance families when creating new nodes, a custom priority order can be defined. The priority queue is organized into tiers, each representing a set of instance families. The Autoscaler will attempt to find an instance type from the highest available tier before moving down to the next one. Instance families within the same priority tier are treated equally. When no instance families from the priority queue are available, the Autoscaler will fall back to its default behavior. | All |
Custom instances | The Autoscaler will consider GCP custom VM types in addition to predefined machines. The extended memory setting will allow the Autoscaler to determine whether a custom VM with increased memory per vCPU would be the most optimal choice. | GCP |
Create a Node template
-
You have the following options to create Node templates:
- Create Node template through the API
- Create Node template through the UI ( Cluster --> Autoscaler --> Node templates )
- Terraform
-
Sometimes, when you create a Node template, you may want to associate it with custom Node configurations for provisioning nodes. You can achieve this by linking the template with a custom Node configuration.
Using the shouldTaint
flag
shouldTaint
flagWhile creating a Node template, you can choose if the nodes created by the CAST AI Autoscaler should be tainted or not. This is controlled through shouldTaint
property in the API payload.
When shouldTaint is set to false
Since no taints will be applied on the nodes created by CAST AI, any pods being deployed to the cluster, even the ones without
nodeSelector
, might get scheduled on these nodes. This effect might not always be desirable.
When shouldTaint
is set to true
shouldTaint
is set to true
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
nodeSelector:
scheduling.cast.ai/node-template: spark-jobs
tolerations:
- key: scheduling.cast.ai/node-template
value: spark-jobs
operator: Equal
effect: NoSchedule
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
When shouldTaint
is set to false
shouldTaint
is set to false
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
nodeSelector:
scheduling.cast.ai/node-template: spark-jobs
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
Using nodeSelector
nodeSelector
You can use nodeSelector
to schedule pods on the nodes created using the template. By default, you construct nodeSelector
using the template name. However, you can opt for a custom label if it suits your use case better.
Using a Node template name in nodeSelector
nodeSelector
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
nodeSelector:
scheduling.cast.ai/node-template: spark-jobs
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
Using multiple custom labels in nodeSelector
nodeSelector
In case you have a Node template with multiple custom labels custom-label-key-1=custom-label-value-1
and custom-label-key-2=custom-label-value-2
. You can schedule your pods on a node created using that Node template by providing nodeSelector
with all the custom labels as described below:
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
nodeSelector:
custom-label-key-1: custom-label-value-1
custom-label-key-2: custom-label-value-2
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
Using nodeAffinity
nodeAffinity
You can use nodeAffinity
to schedule pods on the nodes created using the template. By default, you construct nodeAffinity
using the template name. However, you may use a custom label to better fit your use case. The only supported nodeAffinity
operator is In
.
Using the Node template name in nodeAffinity
nodeAffinity
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: scheduling.cast.ai/node-template
operator: In
values:
- "spark-jobs"
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
Using a Node template's custom labels in nodeAffinity
nodeAffinity
In case you have a Node template with multiple custom labels custom-label-key-1=custom-label-value-1
and custom-label-key-2=custom-label-value-2
. You can schedule your pods on a node created using that Node template by providing nodeAffinity
with all the custom labels as described below:
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: custom-label-key-1
operator: In
values:
- "custom-label-value-1"
- key: custom-label-key-2
operator: In
values:
- "custom-label-value-2"
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
Using a mix of nodeAffinity
and nodeSelector
nodeAffinity
and nodeSelector
In case you have a Node template with multiple custom labels custom-label-key-1=custom-label-value-1
and custom-label-key-2=custom-label-value-2
, you can schedule your pods on a node created using that Node template by providing nodeAffinity
and nodeSelector
with all the custom labels as described below:
apiVersion: v1
kind: Pod
metadata:
name: busybox-sleep
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: custom-label-key-1
operator: In
values:
- "custom-label-value-1"
nodeSelector:
custom-label-key-2: custom-label-value-2
containers:
- name: busybox
image: busybox:1.28
args:
- sleep
- "1200"
More Specific Requirements/Constraints
Templates also support further instance type selection refinement via additional nodeAffinity
/ nodeSelector
values. This can be done if the additional constraints don't conflict with the template constraints.
Here's an example with some assumptions:
- The template has no constraints;
- The template is named my-template;
- The template has custom labels enabled, and they are as follows:
product: "my-product"
team: "my-team"
Here's an example deployment that would further specify what a pod supports/needs:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "team"
operator: In
values: ["my-team"]
# Pick only nodes that are compute-optimized
- key: "scheduling.cast.ai/compute-optimized"
operator: In
values: ["true"]
# Pick only nodes that are storage-optimized
- key: "scheduling.cast.ai/storage-optimized"
operator: In
values: ["true"]
nodeSelector:
# template selector (can also be in affinities)
product: "my-product"
team: "my-team"
# Storage optimized nodes will also have a taint, so we need to tolerate it.
tolerations:
- key: "scheduling.cast.ai/storage-optimized"
operator: Exists
# toleration for the template
- key: "scheduling.cast.ai/node-template"
value: "my-template"
operator: "Equal"
effect: "NoSchedule"
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
resources:
requests:
cpu: 700m
memory: 700Mi
limits:
memory: 700Mi
Using this example, you'd get a storage-optimized, compute-optimized node on a template that doesn't have those requirements for the whole pool.
Note: If a template has one of those constraints (on the template itself), there is currently no ability to loosen requirements for some pods based on their nodeAffinity
/ nodeSelector
values.
Node template matching
By default, workloads must specify node selectors or affinity that match all terms defined in the Node template's Custom labels
. If at least one term is not matched, then the Node template is not considered.
Multiple Node templates can be matched by a workload. In such cases, the templates are scored primarily by the number of terms that the Node template contains (there are additional scoring criteria). The greater the number of terms, the higher the score.
Partial Node template matching functionality
Partial matching allows workloads to specify at least one of all terms of the Node template for it to be considered when provisioning a node for the workload. To enable partial Node template matching, go to Autoscaler settings, navigate to Unscheduled pods policy, and select Partial node template matching.
When partial matching is enabled, the Unscheduled Pods Policy Applied
event will include information on how the Node template was selected. This can be used to troubleshoot autoscaler decisions.
Node template selection
When multiple Node templates are matched, they are sorted, and the first one is selected to create a node for the workload. The sorting criteria, in order of their priority, are:
- Node templates where all terms were matched are sorted higher.
- The count of matched terms, in descending order.
- The average price per CPU of instance types within the Node template, in descending order.
- The name of the template, in alphabetical order.
Support of Dedicated (Sole tenant) nodes
This is a new feature currently supported only on GKE clusters
Please note that the sole tenancy node group must be shared with the GKE cluster's project or organization.
Dedicated nodes (also known as Sole tenant nodes) are dedicated hosts in the customer's cloud account that can host VMs that are used in the Kubernetes clusters. The setup for GKE clusters can be found in the Google Cloud documentation.
CAST AI Autoscaler can be instructed to prefer dedicated hosts until capacity is available when the following parameters are set in the Node template:
Attribute | Description |
---|---|
Affinity | The affinity rules required for choosing the node dedicated note group. It is constructed using: Key Value Operator: IN : In valuesNotIn : Not in valuesExists : Just existDoesNotExist : Values do not existGt : Greater thenLt : Lower then |
azName | The availability zone of the dedicated node group |
InstanceTypes | Instance types of the dedicated node group |
Example of configuration setup in the API response:
"dedicatedNodeAffinity": [
{
"instanceTypes": [
"c2-node-60-240"
],
"name": "test-nodegroup",
"affinity": [
{
"operator": "IN",
"key": "stnodegroup",
"values": [
"test-group"
]
},
{
"operator": "IN",
"key": "compute.googleapis.com/project",
"values": [
"test-123"
]
}
],
"azName": "europe-west1-b"
Please note that all constraints configured in the node template will be respected (e.g., min CPU set to 16).
Once the dedicated host capacity is saturated, CAST AI will not add additional dedicated hosts to the node group. Instead, it will return to multi-tenant instances configured in the Node template. During the fallback to multi-tenant instances, the same availability zone will be selected as configured in the dedicatedNodeAffinity
.
Currently, the Rebalancer does not simulate the deletion of instances running on dedicated nodes; therefore, rebalancing the whole cluster might produce some suboptimal results.
Updated about 2 months ago