Spot-only cluster

How it works

🚧

Notice

Spot-webhook is being phased out. Funcionality is moving to nodeTemplates.

CAST AI gives customers the flexibility to run all or a portion of workloads on spot instances without having to modify manifest files. To achieve this, users must install and configure a Mutating Admission Webhook.

When there's a request to schedule a pod, the CAST AI Mutating Admission Webhook (in short, mutating webhook) will mutate the workload manifest - for example, add spot toleration to influence the desired pod placement by the Kubernetes Scheduler.

Mutating Admission Webhook

CAST AI Mutating Admission Webhook presets:

  • Spot-only
  • Spot-only except kube-system
  • Partial Spot
  • Custom

📘

Note

Pods that are already running will not be affected.

The Webhook only mutates pods during scheduling. Over time, all pods should eventually be rescheduled and, in turn, mutated. The application owners will release a new version of the workload that will trigger all the replicas to be rescheduled. Evictor or Rebalancing will remove older nodes, put pods for rescheduling, etc.

If you'd like to initiate mutation for the whole namespace immediately, run this command which will recreate all pods:

kubectl -n {NAMESPACE} rollout restart deploy

Spot-only

Please use Node template configuration.

Spot-only except kube-system

Preset allSpotExceptKubeSystem.

This mode works the same as the Spot-only mode, but it forces all pods in the kube-system namespace to be placed on on-demand nodes. This mode is recommended for clusters where the high-availability aspect of the control plane is vitally important, while other pods can tolerate spot interruptions.

Install Spot-only except kube-system

To run all pods, excluding kube-system on spot instances, use:

helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
    castai-helm/castai-pod-node-lifecycle \
    --set staticConfig.preset=allSpotExceptKubeSystem

Partial Spot

Preset partialSpot.

When 100% of pods on spot instances is not desirable, you can use a ratio such as 60% on stable on-demand instances and the remaining 40% of pods in the same ReplicaSet (Deployment / StatefulSet) running on spot instances.

This conservative configuration ensures that there are enough pods on stable computing for the base load. Still, it allows achieving significant savings for pods above the base load by putting them on spot instances. This setup is recommended for all types of environments, from production to development.

Install Partial Spot

For running 40% workload pods on spot instances and keeping the remaining pods of the same ReplicaSet on on-demand instances, use:

helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
    castai-helm/castai-pod-node-lifecycle \
    --set staticConfig.preset=partialSpot

To set a custom ratio for partial Spot, replace 70 with [1-99] as a percentage value:

helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
    castai-helm/castai-pod-node-lifecycle \
    --set staticConfig.defaultToSpot=false --set staticConfig.spotPercentageOfReplicaSet=70

Custom

No preset.

This mode can be adjusted to match your cluster's needs and requirements. Instead of choosing a specific preset, you can configure the behavior yourself.

KeyTypeDefaultDescription
staticConfig.defaultToSpotbooleantrueIf true the webhook will add spot tolerations and node selectors to all pods that don't match other rules.
staticConfig.spotPercentageOfReplicaSetint0The percentage of pods (per ReplicaSet) which should be put on Spot instances. Acceptable values [1-100]. 0 means the feature is turned off.
staticConfig. IgnorePodsWithNodeSelectorsAffinitiesbooleanfalseShould the webhook skip mutating pods that contain a custom nodeSelector or NodeAffinity labels? Following well-known labels will not be affected in either case if IgnorePodsWithNodeSelectorsAffinities is true or false: scheduling.cast.ai/compute-optimized, scheduling.cast.ai/storage-optimized, scheduling.cast.ai/gpu, topology.cast.ai/subnet-id, provisioner.cast.ai/managed-by, network-tag.gcp.cast.ai/*, *kubernetes.io/*, *.k8s.io/*
staticConfig.SkipNodeSelectorbooleanfalseWhen true, the webhook will skip adding node selectors and will only add spot tolerations. This setting will not force Spot usage but will allow Pods to run on Spot instances. It is useful for maximizing RI/CUD utilization before spilling over to Spot.
staticConfig.ignorePodslist of PodAffinityTerm[]Terms describing the label selectors for pods that should be ignored by the webhook.
staticConfig.forcePodsToSpotlist of PodAffinityTerm[]Terms describing the label selectors for pods that should be put on spot instances.
staticConfig.forcePodsToOnDemandlist of PodAffinityTerm[]Terms describing the label selectors for pods that should be put on on-demand instances.

Schema description of the PodAffinityTerm object can be found in the official kubernetes-api documentation. The property topologyKey is ignored and the property namespaceSelector is not yet supported.

Install Custom

Here is an example of a values.yaml with custom rules defined:

staticConfig:
  defaultToSpot: false
  spotPercentageOfReplicaSet: 30
  IgnorePodsWithNodeSelectorsAffinities: false
  ignorePods:
    - labelSelector:
        matchLabels:
          app.kubernetes.io/name: ignored-pod
  forcePodsToSpot:
    - labelSelector:
        matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
              - spot-pod-1
              - spot-pod-2
  forcePodsToOnDemand:
    - namespaces:
        - kube-system
        - default

To install the webhook with these custom rules, execute this command:

helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
    castai-helm/castai-pod-node-lifecycle \
    --values values.yaml

Workload level override

The Mutating Webhook is a cluster-level configuration, but it can have exceptions that could be enforced per Deployment or StatefulSet.

Annotation NameValueLocationEffect
scheduling.cast.ai/lifecycle"on-demand"Deployment or StatefulSetAll Pods will be scheduled on on-demand instances
scheduling.cast.ai/lifecycle"spot"Deployment or StatefulSetAll Pods will be scheduled on spot instances
scheduling.cast.ai/spot-percentage"65" [1-99]Deployment or StatefulSetOverride Partial Spot configuration, schedule up to 65% on spot and remaining (at least 35%) on on-demand
kubectl patch deployment resilient-app -p '{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/lifecycle":"spot"}}}}}'
kubectl patch deployment sensitive-app -p '{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/lifecycle":"on-demand"}}}}}'
kubectl patch deployment conservative-app -p '{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/spot-percentage":"50"}}}}}'

Troubleshooting

The mutating webhook will ignore these types of pods:

  • Bare pods without ReplicaSet Controller
  • Pods in castai-pod-node-lifecycle namespace
  • Pods with TopologySpreadConstraints with TopologyKey=Lifecycle
  • DaemonSets will get Spot Toleration by default, ensuring DaemonSet Pods can run on spot and on-demand nodes

The CAST AI Mutating webhook pods write logs to thestdOut.

If the cluster has Deployments with 1000+ replicas set higher Memory Requests and Limits, by appending these parameters to Helm command:

--set resources.requests.memory=1G --set resources.limits.memory=1G