Spot nodes, castai-spot-handler, and mutating webhook

Is there a way to make a cluster always use spot nodes without having to add the node selector to every deployment?

Yes, you can achieve that by using the mutating webhook, check out this page for more information.

How and when does CAST AI add tolerations for spot instances?

The mutating webhook adds tolerations on the fly (castai-pod-node-lifecyle).

However, when you pause the cluster, if the pod-node-lifecycle pods are already in a pending state and you can't annotate the other workloads as the nodes get deleted, they will miss the mutation. You can add the castai-pod-node-lifecycle namespace to:

 - name: NAMESPACES_TO_KEEP                
   value: ""`

Set the value to hibernate. This should help keep the mutation webhook running and allow for better results.

How can I schedule a workload specifically on CAST AI-managed nodes?

CAST AI adds a taint to spot instances, so spot toleration would be needed in this case.

When it comes to on-demand instances, all it takes is modifying the nodeAffinity/nodeSelector for provisioner.cast.ai/managed-by: cast.ai.

If you have a node pool and there are workloads with nodeAffinity that go there, the only thing that needs to be done for all other apps is to go just to CAST AI nodes via Spot only cluster.

staticConfig:  
  defaultToSpot: false  
  spotPercentageOfReplicaSet: 30  
  IgnorePodsWithNodeSelectorsAffinities: true

With non-CAST AI nodeAffinityany custom nodeAffinity/selector created will be not-mutated (left untouched). Everything else will get the mutation and get CAST AI nodeAffinity for spot and toleration that will automatically force CAST AI nodes.

Can I manually prevent a workload from being scheduled on spot instances if I disagree with CAST AI's determination that it's safe for spot?

Of course, you can do that. We require a specific nodeSelector to opt-in to spot, so by default, everything will go on-demand instances until you tell CAST AI which workloads you want on spot instances.

We are planning to use vClusters and CAST AI-managed clusters for our QA environemnts. But vClusters are behind a namespace on the host cluster. Can we keep them out of spot instances?

You will likely need to use labelselectors instead of namespaces for this type of use case. Here's an example:

  forcePodsToSpot:  
    - labelSelector:  
        matchExpressions:  
          - key: app.kubernetes.io/name  
            operator: In  
            values:  
              - spot-pod-1  
              - spot-pod-2

This configures the values.yaml file for the CAST AI mutating webhook that we set based on this example: Spot only cluster.

I had an interrupted node but it took 16 minutes from the time it was interrupted to when it was dead and only than a new node was up. Can we make this process shorter?

We recommend enabling the spot instance policy within the autoscaler policies to shorten this time.

When the spot instance policy is disabled, CAST AI doesn't send delete requests for interrupted nodes. In this case, the nodes were marked for deletion by the node deletion policy, which took more time.

Is there a way to have half of the replicas of a deployment to run on spot instances and the other half on on-demand?

You can use workload level override to run certain workloads on spot and others on-demand instances. You can use lifecycle annotation or point percentage.

Lifecycle annotation:

kubectl patch deployment resilient-app -p ‘{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/lifecycle":"spot"}}}}}'
kubectl patch deployment sensitive-app -p ‘{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/lifecycle":"on-demand"}}}}} '

spot-percentage:

kubectl patch deployment conservative-app -p '{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/spot-percentage":"50"}}}}} '

Check out this page for more information.

What would happen if I have the annotation configured at 50% and only one replica?

It's a Kubernetes best practice to have more than one replica per application. But if you have only one replica, the workload level is overridden with a 50% split - this will depend on how the mutating webhook is configured.

Depending on how you have configured the webhook, the override will honor the configuration and act according to what was configured in the installation.

What is the expected result when using `scheduling.cast.ai/spot-percentage` annotation as a 50/50 split and not using the mutating webhook for a workload?

The provided annotation only works to override the configuration of the mutating webhook. CAST AI currently does not offer an alternative way to split 50/50 other than by using the mutating webhook.

You'll need to decide which workload should run in place and which should run on-demand if you don't use the mutating webhook in production. Another option is to use a nodeTemplate to achieve which workloads go on which types of nodes. Regardless of whether nodeTemplates are used in this case or the tolerance is explicitly set on a replica's workload, it will be honored.

We need to make sure that we have a scheduled pod even when it's on an on-demand node. How can we do it?

If no spot instances are available, or even on-demand ones are cheaper - and you enable spot fallback - the pod will get provisioned on an on-demand instance.

Note: The autoscaler will attempt to replace this fallback (on-demand) node as per the time mentioned in the node template for fallback replacement.

What is the maximum wait time for a pending pod provided the spot instance is not available and the autoscaler decides to go for fallback?

The pod is moved immediately, under one minute of wait time.

Where does the below error "No spot capacity" originate from?

The error comes from AWS.

Is it possible to find out when CAST AI started using a spot instance and when it ended?

One way to do that is through metrics.

Another way to do that is by going through audit logs.

You can also check spot costs in CAST AI's cost monitoring section. You can check how much you've spent on spot, fallback, and on-demand nodes in a given time frame. This will give you an idea about your usage pattern.

Can you please describe the process of how the fallback node back to spot node and whether it takes Pod Disruption Budgets into account?

Yes, the autoscaler respects PDBs when replacing spot instances with fallback instances. PDBs are mainly for node drain operations, which come after spot fall-forward starts.

If a PDB is breached during this process (meaning its disruption budget can't be met due to replacement), the affected pod stays put for 20 minutes. During this time, the autoscaler follows PDB rules and doesn't rush replacement.

If the PDB isn't met even after 20 minutes, the autoscaler enforces spot fallback replacement.

Does CAST AI add any taints on nodes when receiving spot interruptions?

Yes, during spot interruptions you would see autoscaling.cast.ai/draining=spot_interuption

Can a PodDisruptionBudget (PDB) Guarantee Availability of Pods During Spot Node Interruptions?

Spot node interruptions can lead to unexpected pod evictions, even with a PDB in place. When a spot node is reclaimed by the cloud provider, pods running on that node may be evicted, impacting the availability of the service.