What are node templates?

Node templates are a key part of the Autoscaler component. They allow users to define virtual buckets of constraints and properties for nodes to be added to the cluster during upscaling. Users can:

Specify various constraints to limit the inventory of instance types to be used.
Select a preferred instance resource offering (e.g., build a spot-only cluster without the need for specific tolerations or selectors).
Define how Cast AI should manage Spot Instance interruptions.
Specify labels and taints to be applied on the nodes.
Link a Node configuration to be applied when creating new nodes.

When a user enables Cast AI, the default Node template is created in the cluster. This template is essential and serves as the default Autoscaler behavior. In addition to the default template, users can set up multiple templates to suit their specific use case.

Feature	Default Node Template	Custom Node Template
Name	`default-by-castai`.	Custom (user-defined).
Creation	Mandatory. Created by Cast AI during cluster onboarding.	Optional. Created by the user.
Deletion	Can't be deleted or otherwise removed.	Can be deleted.
Activation	On by default. Can be disabled.	Can be disabled.
Usage in Autoscaling	Used when workloads are not explicitly designated to trigger cluster upscaling through other Node templates.	Activated by workloads utilizing autoscaling labels that are also defined in the Node template.
Renaming	Can't be renamed.	Can't be renamed after creation.
Setting as default	Is the default by design. Cannot be changed to non-default. Disabling it is recommended if you no longer wish to use it.	Cannot be set as default. Attempts to set `isDefault=true` via API will not update the template.

📘
Note
For the Cast AI autoscaler to function, the Unscheduled Pods policy and at least one Node template (default or not) must be enabled.

Main attributes

The table below outlines the Node template attributes and their availability across different cloud providers.

Attribute	Description	Availability per Cloud provider
Custom labels and taints	Choose this if custom labels or taints should be applied on the nodes created by Cast AI. Based on the selected properties, `nodeSelector` and toleration must be applied to the workload to trigger the autoscaling.	All
Node configuration link	By default, Node templates mostly focus on the type of resources that need to be scheduled. To learn how to configure the scheduling of your resources and how to use other kinds of attributes on provisioned nodes, see Node configuration	All
Resource offering	This attribute tells the Autoscaler which node resource offering (spot or on-demand) it should create. There is an option to configure both offerings, in which case workloads targeted to run on spot nodes must be marked accordingly. Check this guide. Additional configuration settings available when you want the Autoscaler to use spot nodes: Spot fallback: Cast AI will create temporary on-demand fallback nodes when spot capacity is unavailable in the cloud. Interruption prediction model — This feature works only for AWS customers: Cast AI can react to AWS rebalancing notifications or its own ML model to predict spot interruptions and proactively rebalance affected nodes. See this guide for more details. Diversified Spot Instances — by default, Cast AI seeks the most cost-effective instances without assessing your cluster's current composition. To limit the impact of a potential mass spot reclaim, you can instruct the Autoscaler to evaluate and enhance the diversity of spot nodes in the cluster, but this may increase your costs. Read more	The interruption prediction model is only available for AWS
Processor architecture	You can select `x86_64`, `ARM64`, or both architecture nodes to be created by Cast AI. When using a multi-architecture Node template, also use the `nodeSelector` `kubernetes.io/arch: "arm64"` to ensure that the Pod lands on an ARM node. Azure only. To provision ARM nodes, ensure that they are supported in the region and quota for `Standard_DS2_v2` VMs is available. Contact the support team, as this feature has to be enabled for the organization.	All
GPU-enabled instances	Choose this attribute to run workloads only on GPU-enabled instances. Once you select it, Instance constraints get enhanced with GPU-related properties.	AWS, GCP
Apply Instance constraints	Apply additional constraints on instances to be selected, such as: - Instance Family; - Min/Max CPU; - Min/Max Memory; - Compute-optimized; - Storage-optimized; - GPU manufacturer, name, count. - Bare metal. AWS-only feature. - Burstable instances. - The list of AZ names to consider for the node template. If the list is empty or not set, all AZs are considered. When the subnets defined in the node configuration are zonal (e.g., in AWS or Azure), the effective AZs are determined as the intersection of the subnet AZs and the AZs specified in the node template. - OS selection (Windows 2019/2022 or Linux). Azure-only feature. Ensure that the quota for `Standard_DS2_v2` VMs is available. Contact the support team, as this feature has to be enabled for the organization. - Customer-specific. Azure-only feature. When enabled, the inventory will include customer-specific (preview) instances; otherwise, they will be excluded. Contact the support team, as this feature must be enabled for the organization.	All
Set up instance family prioritization	Custom priority order can be defined in cases where it is preferred that the Autoscaler prioritize certain instance families when creating new nodes. The priority queue is organized into tiers, each representing a set of instance families. The Autoscaler will attempt to find an instance type from the highest available tier before moving down to the next one. Instance families within the same priority tier are treated equally. When no instance families from the priority queue are available, the Autoscaler will fall back to its default behavior.	All
Custom instances	The Autoscaler will consider GCP custom VM types in addition to predefined machines. The extended memory setting will allow the Autoscaler to determine whether a custom VM with increased memory per vCPU would be optimal.	GCP
Set CPU limit	Controls the total CPU cores for nodes provisioned with this template.	All

Processor Architecture Support

Cast AI supports different processor architectures across cloud providers through node templates. You can create architecture-specific node templates or multi-architecture templates that include both x86 and ARM options.

Cloud Provider	x86-64 (AMD64)	ARM64	Notes
AWS EKS	✓	✓	ARM instances use AWS Graviton processors
GCP GKE	✓	✓	ARM support is available through T2A instances
Azure AKS	✓	✓*	*ARM support requires additional configuration

Using ARM on AKS

Cast AI supports ARM-based instances on Azure AKS. This feature currently requires:

Feature flag enablement for your organization (contact Cast AI support)
Specific configuration steps for your environment
Deployment in an ARM-supported Azure region
Quota for required ARM SKUs (such as Standard_D2ps_v5)

There may be additional costs associated with ARM image storage.

Contact Cast AI support for detailed guidance on implementing ARM support for your AKS clusters.

Create a Node template

You have the following options to create Node templates:
- Create Node template through the API
- Create Node template through the UI ( Cluster --> Autoscaler --> Node templates )
- Terraform
Sometimes, when you create a Node template, you may want to associate it with custom Node configurations for provisioning nodes. You can achieve this by linking the template with a custom Node configuration.

Using the `shouldTaint` flag

While creating a Node template, you can choose whether the nodes created by the Cast AI Autoscaler should be tainted. This is controlled through shouldTaint property in the API payload.

🚧
When shouldTaint is set to false
Since no taints will be applied on the nodes created by Cast AI, any pods being deployed to the cluster, even the ones without nodeSelector, might get scheduled on these nodes. This effect might not always be desirable.

When `shouldTaint` is set to `true`

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  nodeSelector:
    scheduling.cast.ai/node-template: spark-jobs
  tolerations:
    - key: scheduling.cast.ai/node-template
      value: spark-jobs
      operator: Equal
      effect: NoSchedule
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

When `shouldTaint` is set to `false`

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  nodeSelector:
    scheduling.cast.ai/node-template: spark-jobs
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

Using `nodeSelector`

You can use nodeSelector to schedule pods on the nodes created using the template. By default, you construct nodeSelector using the template name. However, you can opt for a custom label if it suits your use case better.

Using a Node template name in `nodeSelector`

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  nodeSelector:
    scheduling.cast.ai/node-template: spark-jobs
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

Using multiple custom labels in `nodeSelector`

In case you have a Node template with multiple custom labels custom-label-key-1=custom-label-value-1 and custom-label-key-2=custom-label-value-2. You can schedule your pods on a node created using that Node template by providing nodeSelector with all the custom labels as described below:

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  nodeSelector:
    custom-label-key-1: custom-label-value-1
    custom-label-key-2: custom-label-value-2
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

Using `nodeAffinity`

You can use nodeAffinity to schedule pods on the nodes created using the template. By default, you construct nodeAffinity using the template name. However, you may use a custom label to better fit your use case. The only supported nodeAffinity operator is In.

Using the Node template name in `nodeAffinity`

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: scheduling.cast.ai/node-template
              operator: In
              values:
                - "spark-jobs"
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

Using a Node template's custom labels in `nodeAffinity`

In case you have a Node template with multiple custom labels custom-label-key-1=custom-label-value-1 and custom-label-key-2=custom-label-value-2. You can schedule your pods on a node created using that Node template by providing nodeAffinity with all the custom labels as described below:

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: custom-label-key-1
              operator: In
              values:
                - "custom-label-value-1"
            - key: custom-label-key-2
              operator: In
              values:
                - "custom-label-value-2"
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

Using a mix of `nodeAffinity` and `nodeSelector`

In case you have a Node template with multiple custom labels custom-label-key-1=custom-label-value-1 and custom-label-key-2=custom-label-value-2, you can schedule your pods on a node created using that Node template by providing nodeAffinity and nodeSelector with all the custom labels as described below:

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: custom-label-key-1
              operator: In
              values:
                - "custom-label-value-1"
  nodeSelector:
    custom-label-key-2: custom-label-value-2
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

More Specific Requirements/Constraints

Templates also support further instance type selection refinement via additional nodeAffinity/ nodeSelector values. This can be done if the additional constraints don't conflict with the template constraints.

Here's an example with some assumptions:

The template has no constraints;
The template is named my-template;
The template has custom labels enabled, and they are as follows:

product: "my-product"
team: "my-team"

Here's an example deployment that would further specify what a pod supports/needs:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: "team"
                operator: In
                values: ["my-team"]
                # Pick only nodes that are compute-optimized
              - key: "scheduling.cast.ai/compute-optimized"
                operator: In
                values: ["true"]
                # Pick only nodes that are storage-optimized
              - key: "scheduling.cast.ai/storage-optimized"
                operator: In
                values: ["true"]
      nodeSelector:
        # template selector (can also be in affinities)
        product: "my-product"
        team: "my-team"
        # Storage optimized nodes will also have a taint, so we need to tolerate it.
      tolerations:
        - key: "scheduling.cast.ai/storage-optimized"
          operator: Exists
          # toleration for the template
        - key: "scheduling.cast.ai/node-template"
          value: "my-template"
          operator: "Equal"
        effect: "NoSchedule"
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 700m
            memory: 700Mi
          limits:
            memory: 700Mi

Using this example, you'd get a storage-optimized, compute-optimized node on a template that doesn't have those requirements for the whole pool.

Note: If a template has one of those constraints (on the template itself), there is currently no ability to loosen requirements for some pods based on their nodeAffinity/ nodeSelector values.

Node template matching

By default, workloads must specify node selectors or affinity that match all terms defined in the Node template's Custom labels. If at least one term is not matched, then the Node template is not considered.

A workload can match Multiple Node templates. In such cases, the templates are scored primarily by the number of terms that the Node template contains (there are additional scoring criteria). The greater the number of terms, the higher the score.

Partial Node template matching

Partial matching allows workloads to specify at least one of all terms of the Node template for it to be considered when provisioning a node for the workload. To enable partial Node template matching, go to Autoscaler settings, navigate to Unscheduled pods policy, and select Partial node template matching.

When partial matching is enabled, the Unscheduled Pods Policy Applied event will include information on how the Node template was selected. This can be used to troubleshoot autoscaler decisions.

Node template selection

When multiple Node templates are matched, they are sorted, and the first one is selected to create a node for the workload. The sorting criteria, in order of their priority, are:

Node templates where all terms were matched are sorted higher.
The count of matched terms, in descending order.
The average price per CPU of instance types within the Node template, in descending order.
The name of the template, in alphabetical order.

Node template CPU limit

You can enable and configure CPU limits on node templates to control the total number of CPU cores provisioned during cluster upscaling and rebalancing. When configured, the total number of CPU cores of all nodes provisioned using that template will not exceed the specified limit.

🚧
Caution
When using partial node template matching, it's important to note that if CPU limits are reached on the best-matching template, the pods will remain pending. This can impact your cluster's ability to schedule workloads effectively.

To set a CPU limit:

Enable the CPU limit toggle
Enter the maximum number of CPU cores allowed to provision for nodes under this template

Impact on Autoscaler

When a CPU limit is configured, the autoscaler will only provision nodes up to the specified CPU limit when responding to unscheduled pods. If pods cannot be scheduled due to the CPU limit, they will receive a pod event indicating that autoscaling failed due to the node template CPU limit.

Impact on Rebalancing

During rebalancing, the total CPU count may temporarily exceed the configured limit. However, after rebalancing, the final cluster state will respect the configured CPU limits.

Support of Dedicated (Sole tenant) nodes

📘
This is a new feature currently supported only on GKE clusters
Please note that the sole tenancy node group must be shared with the GKE cluster's project or organization.

Dedicated nodes (also known as Sole tenant nodes) are dedicated hosts in the customer's cloud account that can host VMs that are used in the Kubernetes clusters. The setup for GKE clusters can be found in the Google Cloud documentation.

Cast AI Autoscaler can be instructed to prefer dedicated hosts until capacity is available when the following parameters are set in the Node template:

Attribute	Description
Affinity	The affinity rules required for choosing the node dedicated note group. It is constructed using: Key Value Operator: `IN`: In values `NotIn`: Not in values `Exists`: Just exist `DoesNotExist`: Values do not exist `Gt`: Greater then `Lt`: Lower then
azName	The availability zone of the dedicated node group
InstanceTypes	Instance types of the dedicated node group

Example of configuration setup in the API response:

    "dedicatedNodeAffinity": [
      {
        "instanceTypes": [
          "c2-node-60-240"
        ],
        "name": "test-nodegroup",
        "affinity": [
          {
            "operator": "IN",
            "key": "stnodegroup",
            "values": [
              "test-group"
            ]
          },
          {
            "operator": "IN",
            "key": "compute.googleapis.com/project",
            "values": [
              "test-123"
            ]
          }
        ],
        "azName": "europe-west1-b"

Please note that all constraints configured in the node template will be respected (e.g., the minimum CPU minCPU set to 16).

Once the dedicated host capacity is saturated, Cast AI will not add additional dedicated hosts to the node group. Instead, it will return to multi-tenant instances configured in the Node template. During the fallback to multi-tenant instances, the same availability zone will be selected as configured in the dedicatedNodeAffinity.

🚧
Currently, the Rebalancer does not simulate the deletion of instances running on dedicated nodes; therefore, rebalancing the whole cluster might produce some suboptimal results.

High availability

Maintaining minimum nodes per template

While Cast AI doesn't directly support setting a minimum node count per template, you can maintain a fixed number of nodes using a helper deployment. This is useful for maintaining a fixed set of nodes at all times for specific workloads.

Implementation

You can maintain a minimum number of nodes by creating a helper deployment. This deployment uses minimal resources to keep nodes active without affecting your actual workloads:

Places one pod per node using podAntiAffinity
Uses the desired node template via nodeSelector
Prevents node removal using Cast AI's removal-disabled label:
```
autoscaling.cast.ai/removal-disabled: "true"
```

Use this helper deployment with the desired node template for which you always wish to have a fixed number of nodes.

Helper deployment

This is an example of a helper deployment that maintains 6 nodes for a specific template:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: min-nodes-nt1 # Replace `nt1` with node template name
  namespace: kube-system
  labels:
    app.kubernetes.io/name: min-nodes-nt1 # Replace `nt1` with node template name
    autoscaling.cast.ai/removal-disabled: "true"
spec:
  replicas: 6  # This sets the minimum number of nodes; modify as needed
  selector:
    matchLabels:
      app.kubernetes.io/name: min-nodes-nt1 # Replace `nt1` with node template name
      autoscaling.cast.ai/removal-disabled: "true"
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app.kubernetes.io/name: min-nodes-nt1 # Replace `nt1` with node template name
        autoscaling.cast.ai/removal-disabled: "true"
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: kubernetes.io/hostname
            labelSelector:
              matchLabels:
                app.kubernetes.io/name: min-nodes-nt1 # Replace `nt1` with node template name
      nodeSelector:
        scheduling.cast.ai/node-template: nt1  # Specify your node template name here
      # Add required tolerations if using a spot-only node template
      tolerations:
        - key: scheduling.cast.ai/spot
          operator: Exists
      containers:
      - name: placeholder
        image: busybox
        command: ["sh", "-c", "tail -f /dev/null"]
        resources:
          requests:
            memory: 10Mi

This helper deployment works by:

Creating lightweight placeholder pods that consume minimal resources
Using podAntiAffinity to ensure one pod per node
Targeting your specific node template using nodeSelector
Preventing node removal with the autoscaling.cast.ai/removal-disabled label set to true
Including spot tolerations if the node template is configured for spot-only instances (optional)

Depending on your requirements, modify this helper deployment manifest to suit your desired node count.

Node templates

What are node templates?

📘
Note

Main attributes

Processor Architecture Support

Using ARM on AKS

Create a Node template

Using the `shouldTaint` flag

🚧
When shouldTaint is set to false

When `shouldTaint` is set to `true`

When `shouldTaint` is set to `false`

Using `nodeSelector`

Using a Node template name in `nodeSelector`

Using multiple custom labels in `nodeSelector`

Using `nodeAffinity`

Using the Node template name in `nodeAffinity`

Using a Node template's custom labels in `nodeAffinity`

Using a mix of `nodeAffinity` and `nodeSelector`

More Specific Requirements/Constraints

Node template matching

Partial Node template matching

Node template selection

Node template CPU limit

🚧
Caution

Impact on Autoscaler

Impact on Rebalancing

Support of Dedicated (Sole tenant) nodes

📘
This is a new feature currently supported only on GKE clusters

🚧
Currently, the Rebalancer does not simulate the deletion of instances running on dedicated nodes; therefore, rebalancing the whole cluster might produce some suboptimal results.

High availability

Maintaining minimum nodes per template

Implementation

Helper deployment

What are node templates?

📘Note

Main attributes

Processor Architecture Support

Using ARM on AKS

Create a Node template

Using the shouldTaint flag

🚧When shouldTaint is set to false

When shouldTaint is set to true

When shouldTaint is set to false

Using nodeSelector

Using a Node template name in nodeSelector

Using multiple custom labels in nodeSelector

Using nodeAffinity

Using the Node template name in nodeAffinity

Using a Node template's custom labels in nodeAffinity

Using a mix of nodeAffinity and nodeSelector

More Specific Requirements/Constraints

Node template matching

Partial Node template matching

Node template selection

Node template CPU limit

🚧Caution

Impact on Autoscaler

Impact on Rebalancing

Support of Dedicated (Sole tenant) nodes

📘This is a new feature currently supported only on GKE clusters

🚧Currently, the Rebalancer does not simulate the deletion of instances running on dedicated nodes; therefore, rebalancing the whole cluster might produce some suboptimal results.

High availability

Maintaining minimum nodes per template

Implementation

Helper deployment

📘
Note

Using the `shouldTaint` flag

🚧
When shouldTaint is set to false

When `shouldTaint` is set to `true`

When `shouldTaint` is set to `false`

Using `nodeSelector`

Using a Node template name in `nodeSelector`

Using multiple custom labels in `nodeSelector`

Using `nodeAffinity`

Using the Node template name in `nodeAffinity`

Using a Node template's custom labels in `nodeAffinity`

Using a mix of `nodeAffinity` and `nodeSelector`

🚧
Caution

📘
This is a new feature currently supported only on GKE clusters

🚧
Currently, the Rebalancer does not simulate the deletion of instances running on dedicated nodes; therefore, rebalancing the whole cluster might produce some suboptimal results.