Spot only cluster

How it works

CAST AI gives customers the flexibility to run all or a portion of workloads on spot instances without having to modify manifest files. To achieve this, users need to install and configure a Mutating Admission Webhook.

When there's a request to schedule a pod, the CAST AI Mutating Admission Webhook (in short, mutating webhook) will mutate the workload manifest - for example, add spot toleration to influence the desired pod placement by the Kubernetes Scheduler.

Mutating Admission Webhook

CAST AI Mutating Admission Webhook presets:

  • Spot-only
  • Spot-only except kube-system
  • Partial Spot
  • Custom
  • [Coming soon] Intelligent placement on Rebalancing

πŸ“˜

Pods that are already running will not be affected.

The Webhook only mutates pods during scheduling. Over time, all pods should eventually be re-scheduled and, in turn, mutated. The application owners will release a new version of the workload that will trigger all the replicas to be rescheduled, Evictor, or Rebalancing will remove older nodes, putting pods for rescheduling, etc.

If you'd like to initiate mutation for the whole namespace immediately, run this command which will recreate all pods:
kubectl -n {NAMESPACE} rollout restart deploy

Spot-only

Preset allSpot.

The Spot-only mutating webhook will mark all workloads in your cluster as suitable for spot instances, causing the autoscaler to prefer spot instances when scaling the cluster up. As this will make the cluster more cost-efficient, choosing this mode is recommended for Development and Staging environments, batch job processing clusters, etc.

The CAST AI autoscaler will create spot instances only if the pod has "Spot toleration," see Spot instances. The mutating webhook will add the Spot toleration and the Spot node selector to all the workloads being scheduled.

Install Spot-only

To run all pods (including kube-system) on spot instances, use:

helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
    castai-helm/castai-pod-node-lifecycle \
    --set staticConfig.preset=allSpot

Spot-only except kube-system

Preset allSpotExceptKubeSystem.

This mode works the same as the Spot-only mode but it forces all pods in the kube-system namespace to be placed on on-demand nodes. This mode is recommended for clusters where the high-availability aspect of the control plane is vitally important while other pods can tolerate spot interruptions.

Install Spot-only except kube-system

To run all pods excluding kube-system on spot instances, use:

helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
    castai-helm/castai-pod-node-lifecycle \
    --set staticConfig.preset=allSpotExceptKubeSystem

Partial Spot

Preset partialSpot.

When 100% of pods on spot instances is not a desirable scenario, you can use a ratio such as 60% on stable on-demand instances and the remaining 40% of pods in the same ReplicaSet (Deployment / StatefulSet) running on spot instances.

This conservative configuration ensures that there are enough pods on stable compute for the base load, but still allows achieving significant savings for pods above the base load by putting them on spot instances. This setup is recommended for all types of environments, from production to development.

Install Partial Spot

For running 40% workload pods on spot instances and keeping the remaining pods of the same ReplicaSet on on-demand instances, use:

helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
    castai-helm/castai-pod-node-lifecycle \
    --set staticConfig.preset=partialSpot

To set a custom ratio for partial Spot, replace 70 with [1-99] as a percentage value:

helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
    castai-helm/castai-pod-node-lifecycle \
    --set staticConfig.defaultToSpot=false --set staticConfig.spotPercentageOfReplicaSet=70

Custom

No preset.

This mode can be adjusted to match the needs and requirements of your cluster. Instead of choosing a specific preset, you can configure the behavior on your own.

KeyTypeDefaultDescription
staticConfig.defaultToSpotbooleantrueIf true the webhook will add spot tolerations and node selectors to all pods that don't match other rules.
staticConfig.spotPercentageOfReplicaSetint0The percentage of pods (per ReplicaSet) which should be put on Spot instances. Acceptable values [1-100]. 0 means the feature is turned off.
staticConfig. IgnorePodsWithNodeSelectorsAffinitiesbooleanfalseShould the webhook skip mutating pods that contain a custom nodeSelector or NodeAffinity labels? Following well-known labels will not be affected in either case if IgnorePodsWithNodeSelectorsAffinities is true or false: scheduling.cast.ai/compute-optimized, scheduling.cast.ai/storage-optimized, scheduling.cast.ai/spot-reliability, scheduling.cast.ai/gpu, topology.cast.ai/subnet-id, provisioner.cast.ai/managed-by, network-tag.gcp.cast.ai/*, *kubernetes.io/*, *.k8s.io/*
staticConfig.ignorePodslist of PodAffinityTerm[]Terms describing the label selectors for pods that should be ignored by the webhook.
staticConfig.forcePodsToSpotlist of PodAffinityTerm[]Terms describing the label selectors for pods that should be put on spot instances.
staticConfig.forcePodsToOnDemandlist of PodAffinityTerm[]Terms describing the label selectors for pods that should be put on spot instances.

Schema description of the PodAffinityTerm object can be found in the official kubernetes-api documentation. The property topologyKey is ignored and the property namespaceSelector is not yet supported.

Install Custom

Here is an example of a values.yaml with custom rules defined:

staticConfig:
  defaultToSpot: false
  spotPercentageOfReplicaSet: 30
  IgnorePodsWithNodeSelectorsAffinities: false
  ignorePods:
    - labelSelector:
        matchLabels:
          app.kubernetes.io/name: ignored-pod
  forcePodsToSpot:
    - labelSelector:
        matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
              - spot-pod-1
              - spot-pod-2
  forcePodsToOnDemand:
    - namespaces:
        - kube-system
        - default

To install the webhook with these custom rules, execute this command:

helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
    castai-helm/castai-pod-node-lifecycle \
    --values values.yaml

Workload level override

The Mutating Webhook is a cluster-level configuration, but one can have exceptions that could be enforced per Deployment or StatefulSet.

Annotation NameValueLocationEffect
scheduling.cast.ai/lifecycle"on-demand"Deployment or StatefulSetAll Pods will be scheduled on on-demand instances
scheduling.cast.ai/lifecycle"spot"Deployment or StatefulSetAll Pods will be scheduled on spot instances
scheduling.cast.ai/spot-percentage"65" [1-99]Deployment or StatefulSetOverride Partial Spot configuration, schedule up to 65% on spot and remaining (at least 35%) on on-demand
kubectl patch deployment resilient-app -p '{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/lifecycle":"spot"}}}}}'
kubectl patch deployment sensitive-app -p '{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/lifecycle":"on-demand"}}}}}'
kubectl patch deployment conservative-app -p '{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/spot-percentage":"50"}}}}}'

Troubleshooting

The mutating webhook will ignore these types of pods:

  • Bare pods without ReplicaSet Controller
  • Pods in "castai-pod-node-lifecycle" namespace
  • Pods with TopologySpreadConstraints with TopologyKey=Lifecycle
  • DaemonSets will get Spot Toleration by default, ensuring DaemonSet Pods can run on spot and on-demand nodes

The CAST AI Mutating webhook pods write logs to stdOut.

If the cluster has Deployments with 1000+ replicas set higher Memory Requests and Limits, by appending these parameters to Helm command

--set resources.requests.memory=1G --set resources.limits.memory=1G