Spot-only cluster
How it works
Notice
Spot-webhook is being phased out. Funcionality is moving to
nodeTemplates
.
CAST AI gives customers the flexibility to run all or a portion of workloads on spot instances without having to modify manifest files. To achieve this, users must install and configure a Mutating Admission Webhook.
When there's a request to schedule a pod, the CAST AI Mutating Admission Webhook (in short, mutating webhook) will mutate the workload manifest - for example, add spot toleration to influence the desired pod placement by the Kubernetes Scheduler.
Mutating Admission Webhook
CAST AI Mutating Admission Webhook presets:
- Spot-only
- Spot-only except
kube-system
- Partial Spot
- Custom
Note
Pods that are already running will not be affected.
The Webhook only mutates pods during scheduling. Over time, all pods should eventually be rescheduled and, in turn, mutated. The application owners will release a new version of the workload that will trigger all the replicas to be rescheduled. Evictor or Rebalancing will remove older nodes, put pods for rescheduling, etc.
If you'd like to initiate mutation for the whole namespace immediately, run this command which will recreate all pods:
kubectl -n {NAMESPACE} rollout restart deploy
Spot-only
Please use Node template configuration.
Spot-only except kube-system
kube-system
Preset allSpotExceptKubeSystem
.
This mode works the same as the Spot-only mode, but it forces all pods in the kube-system
namespace to be placed on on-demand nodes. This mode is recommended for clusters where the high-availability aspect of the control plane is vitally important, while other pods can tolerate spot interruptions.
Install Spot-only except kube-system
kube-system
To run all pods, excluding kube-system
on spot instances, use:
helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
castai-helm/castai-pod-node-lifecycle \
--set staticConfig.preset=allSpotExceptKubeSystem
Partial Spot
Preset partialSpot
.
When 100% of pods on spot instances is not desirable, you can use a ratio such as 60% on stable on-demand instances and the remaining 40% of pods in the same ReplicaSet (Deployment / StatefulSet) running on spot instances.
This conservative configuration ensures that there are enough pods on stable computing for the base load. Still, it allows achieving significant savings for pods above the base load by putting them on spot instances. This setup is recommended for all types of environments, from production to development.
Install Partial Spot
For running 40% workload pods on spot instances and keeping the remaining pods of the same ReplicaSet on on-demand instances, use:
helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
castai-helm/castai-pod-node-lifecycle \
--set staticConfig.preset=partialSpot
To set a custom ratio for partial Spot, replace 70 with [1-99] as a percentage value:
helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
castai-helm/castai-pod-node-lifecycle \
--set staticConfig.defaultToSpot=false --set staticConfig.spotPercentageOfReplicaSet=70
Custom
No preset.
This mode can be adjusted to match your cluster's needs and requirements. Instead of choosing a specific preset, you can configure the behavior yourself.
Key | Type | Default | Description |
---|---|---|---|
staticConfig.defaultToSpot | boolean | true | If true the webhook will add spot tolerations and node selectors to all pods that don't match other rules. |
staticConfig.spotPercentageOfReplicaSet | int | 0 | The percentage of pods (per ReplicaSet) which should be put on Spot instances. Acceptable values [1-100] . 0 means the feature is turned off. |
staticConfig. IgnorePodsWithNodeSelectorsAffinities | boolean | false | Should the webhook skip mutating pods that contain a custom nodeSelector or NodeAffinity labels? Following well-known labels will not be affected in either case if IgnorePodsWithNodeSelectorsAffinities is true or false : scheduling.cast.ai/compute-optimized , scheduling.cast.ai/storage-optimized , scheduling.cast.ai/gpu , topology.cast.ai/subnet-id , provisioner.cast.ai/managed-by , network-tag.gcp.cast.ai/* , *kubernetes.io/* , *.k8s.io/* |
staticConfig.SkipNodeSelector | boolean | false | When true , the webhook will skip adding node selectors and will only add spot tolerations. This setting will not force Spot usage but will allow Pods to run on Spot instances. It is useful for maximizing RI/CUD utilization before spilling over to Spot. |
staticConfig.ignorePods | list of PodAffinityTerm | [] | Terms describing the label selectors for pods that should be ignored by the webhook. |
staticConfig.forcePodsToSpot | list of PodAffinityTerm | [] | Terms describing the label selectors for pods that should be put on spot instances. |
staticConfig.forcePodsToOnDemand | list of PodAffinityTerm | [] | Terms describing the label selectors for pods that should be put on on-demand instances. |
Schema description of the PodAffinityTerm
object can be found in the official kubernetes-api documentation. The property topologyKey
is ignored and the property namespaceSelector
is not yet supported.
Install Custom
Here is an example of a values.yaml
with custom rules defined:
staticConfig:
defaultToSpot: false
spotPercentageOfReplicaSet: 30
IgnorePodsWithNodeSelectorsAffinities: false
ignorePods:
- labelSelector:
matchLabels:
app.kubernetes.io/name: ignored-pod
forcePodsToSpot:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- spot-pod-1
- spot-pod-2
forcePodsToOnDemand:
- namespaces:
- kube-system
- default
To install the webhook with these custom rules, execute this command:
helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-pod-node-lifecycle castai-pod-node-lifecycle \
castai-helm/castai-pod-node-lifecycle \
--values values.yaml
Workload level override
The Mutating Webhook is a cluster-level configuration, but it can have exceptions that could be enforced per Deployment or StatefulSet.
Annotation Name | Value | Location | Effect |
---|---|---|---|
scheduling.cast.ai/lifecycle | "on-demand" | Deployment or StatefulSet | All Pods will be scheduled on on-demand instances |
scheduling.cast.ai/lifecycle | "spot" | Deployment or StatefulSet | All Pods will be scheduled on spot instances |
scheduling.cast.ai/spot-percentage | "65" [1-99] | Deployment or StatefulSet | Override Partial Spot configuration, schedule up to 65% on spot and remaining (at least 35%) on on-demand |
kubectl patch deployment resilient-app -p '{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/lifecycle":"spot"}}}}}'
kubectl patch deployment sensitive-app -p '{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/lifecycle":"on-demand"}}}}}'
kubectl patch deployment conservative-app -p '{"spec": {"template":{"metadata":{"annotations":{"scheduling.cast.ai/spot-percentage":"50"}}}}}'
Troubleshooting
The mutating webhook will ignore these types of pods:
- Bare pods without ReplicaSet Controller
- Pods in
castai-pod-node-lifecycle
namespace - Pods with
TopologySpreadConstraints
withTopologyKey=Lifecycle
- DaemonSets will get Spot Toleration by default, ensuring DaemonSet Pods can run on spot and on-demand nodes
The CAST AI Mutating webhook pods write logs to thestdOut
.
If the cluster has Deployments with 1000+ replicas set higher Memory Requests and Limits, by appending these parameters to Helm command:
--set resources.requests.memory=1G --set resources.limits.memory=1G
Updated about 2 months ago