Node Templates

What is it?

Node Templates are part of the Autoscaler component. They allow you to define virtual buckets of constraints such as:

  • instance types to be used;
  • lifecycle of the nodes to add;
  • node provisioning configurations;
  • and other various properties.

CAST AI will respect all the above settings when creating nodes for your workloads to run on.

UI is available at CAST AI console and you can find it in: Cluster --> Autoscaler --> Node Templates

Main attributes

AttributeDescriptionAvailability per Cloud provider
Custom labels and taintsChoose this if custom labels or taints should be applied on the nodes created by CAST AI.

Based on the selected properties, NodeSelector and Toleration must be applied to the workload to trigger the autoscaling.
All
Node configuration linkBy default, node templates mostly focus on the type of resources that need to be scheduled. To learn how to configure the scheduling of your resources and how to use other kinds of attributes on provisioned nodes, see Node configurationAll
LifecycleThis attribute tells the Autoscaler which node lifecycle (spot or on-demand) it should create. There is an option to configure both lifecycles, in which case workloads targeted to run on spot nodes must be marked accordingly. Check this guide.

Additional configuration settings available when you want the Autoscaler to use spot nodes:

Spot fallback – when spot capacity is unavailable in the cloud, CAST AI will create temporary on-demand fallback nodes.

Interruption prediction model - this feature works only for AWS customers: CAST AI can react to AWS rebalancing notifications or its own ML model to predict spot interruptions and rebalance affected nodes proactively. See this guide for more details.

Diversified spot instances - by default, CAST AI seeks the most cost-effective instances without assessing your cluster's current composition. To limit the impact of a potential mass spot reclaim, you can instruct the Autoscaler to evaluate and enhance the diversity of spot nodes in the cluster, but this may increase your costs. Read more
The interruption prediction model is only available for AWS
Processor architectureYou can select x86_64, ARM64, or both architecture nodes to be created by CAST AI. When using a multi-architecture Node Template, also use the NodeSelector kubernetes.io/arch: "arm64" to ensure that the Pod lands on an ARM node.AWS, GCP
GPU-enabled instancesChoose this attribute to run workloads on GPU-enabled instances only. Once you select it, Instance constraints get enhanced with GPU-related properties.AWS, GCP
Apply Instance constraintsApply additional constraints on instances to be selected, such as:

- Instance Family;
- Min/Max CPU;
- Min/Max Memory;
- Compute-optimized;
- Storage-optimized;
- GPU manufacturer, name, count.
All

Create a Node Template

  • You have the following options to create node templates:

  • Sometimes, when you create a Node Template, you may want to associate it with custom node configurations to be used when provisioning nodes. You can achieve this by linking the template with Node configuration

Using the shouldTaint flag

While creating a Node Template, you can choose if the nodes created by the CAST AI Autoscaler should be tainted or not. This is controlled through shouldTaint property in the API payload.

🚧

When shouldTaint is set to false

Since no taints will be applied on the nodes created by CAST AI, any pods being deployed to the cluster, even the ones without nodeSelector, might get scheduled on these nodes. This effect might not always be desirable.

When shouldTaint is set to true

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  nodeSelector:
    scheduling.cast.ai/node-template: spark-jobs
  tolerations:
    - key: scheduling.cast.ai/node-template
      value: spark-jobs
      operator: Equal
      effect: NoSchedule
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

When shouldTaint is set to false

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  nodeSelector:
    scheduling.cast.ai/node-template: spark-jobs
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

Using nodeSelector

You can use nodeSelector to schedule pods on the nodes created using the template. By default, you construct nodeSelector using the template name. However, you may choose to use a custom label to fit your use case better.

Using a node template name in nodeSelector

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  nodeSelector:
    scheduling.cast.ai/node-template: spark-jobs
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

Using multiple custom labels in nodeSelector

In case you have a node template with multiple custom labels custom-label-key-1=custom-label-value-1 and custom-label-key-2=custom-label-value-2. You can schedule your pods on a node created using that node template by providing nodeSelector with all the custom labels as described below:

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  nodeSelector:
    custom-label-key-1: custom-label-value-1
    custom-label-key-2: custom-label-value-2
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

Using nodeAffinity

You can use nodeAffinity to schedule pods on the nodes created using the template. By default, you construct nodeAffinity using the template name. However, you may choose to use a custom label to fit your use case better. The only supported nodeAffinity operator is In.

Using the node template name in nodeAffinity

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: scheduling.cast.ai/node-template
              operator: In
              values:
                - "spark-jobs"
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

Using a node template's custom labels in nodeAffinity

In case you have a node template with multiple custom labels custom-label-key-1=custom-label-value-1 and custom-label-key-2=custom-label-value-2. You can schedule your pods on a node created using that node template by providing nodeAffinity with all the custom labels as described below:

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: custom-label-key-1
              operator: In
              values:
                - "custom-label-value-1"
            - key: custom-label-key-2
              operator: In
              values:
                - "custom-label-value-2"
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

Using a mix of Affinity and NodeSelectors

In case you have a node template with multiple custom labels custom-label-key-1=custom-label-value-1 and custom-label-key-2=custom-label-value-2, you can schedule your pods on a node created using that node template by providing nodeAffinity and nodeSelector with all the custom labels as described below:

apiVersion: v1
kind: Pod
metadata:
  name: busybox-sleep
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: custom-label-key-1
              operator: In
              values:
                - "custom-label-value-1"
    custom-label-key-2: custom-label-value-2
  containers:
    - name: busybox
      image: busybox:1.28
      args:
        - sleep
        - "1200"

More Specific Requirements/Constraints

Templates also support further instance type selection refinement via additional affinities/nodeSelectors. This can be done if the additional constraints aren't in conflict with the template constraints.

Here's an example with some assumptions:

  • The template has no constraints;
  • The template is named my-template;
  • The template has custom labels enabled and they are as follows:
product: "my-product"
team: "my-team"

Here's an example deployment that would further specify what a pod supports/needs:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: "team"
                operator: In
                values: ["my-team"]
                # Pick only nodes that are compute-optimized
              - key: "scheduling.cast.ai/compute-optimized"
                operator: In
                values: ["true"]
                # Pick only nodes that are storage-optimized
              - key: "scheduling.cast.ai/storage-optimized"
                operator: In
                values: ["true"]
      nodeSelector:
        # template selector (can also be in affinities)
        product: "my-product"
        team: "my-team"
        # Storage optimized nodes will also have a taint, so we need to tolerate it.
      - key: "scheduling.cast.ai/storage-optimized"
        operator: Exists
        # toleration for the template
      - key: "scheduling.cast.ai/node-template"
        value: "my-template"
        operator: "Equal"
        effect: "NoSchedule"
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 700m
            memory: 700Mi
          limits:
            memory: 700Mi

Using this example, you'd get a storage-optimized, compute-optimized node on a template that doesn't have those requirements for the whole pool.

Note: If a template has one of those constraints (on the template itself), there is currently no ability to loosen requirements for some pods based on their affinity/selectors.