Pausing a cluster
CAST AI provides a set of CronJobs that can be used to pause and resume Kubernetes cluster on a defined schedule. When executed CAST AI components will continue to run on a defined single node, while the rest of the cluster capacity will be removed. Once cluster is set to resume, CAST AI will use standard Autoscaler capabilities to provide most cost efficient nodes to run pending pods.
In order to pause and resume a cluster two CronJobs will be executed:
- Hibernate-pause Job
Disable Unscheduled Pod Policy (to prevent growing cluster)
Prepare Hibernation node (node that will stay hosting essential components)
Mark essential Deployments with Hibernation toleration
Delete all other nodes (only hibernation node should stay running)
- Hibernate-resume Job
Renable Unscheduled Pod Policy to allow cluster to expand to needed size
- Override default hibernate-node size
Set the HIBERNATE_NODE environment variable to override the default node sizing selections. Make sure the size selected is appropriate for your cloud.
- Install hibernate
Run this command to install Hibernate CronJobs
kubectl apply -f https://raw.githubusercontent.com/castai/hibernate/main/deploy.yaml
Change API key
Create API token with Full Access permissions and encode base64
echo -n "98349587234524jh523452435kj2h4k5h2k34j5h2kj34h5k23h5k2345jhk2" | base64
use this value to update Secret
apiVersion: v1
kind: Secret
metadata:
name: castai-hibernate
namespace: castai-agent
type: Opaque
data:
API_KEY: >-
CASTAI-API-KEY-REPLACE-ME-WITH-ABOVE==
OR for convenience use one liner
kubectl get secret castai-hibernate -n castai-agent -o json | jq --arg API_KEY "$(echo -n 9834958-CASTAI-API-KEY-REPLACE-ME-5k2345jhk2 | base64)" '.data["API_KEY"]=$API_KEY' | kubectl apply -f -
Set Cloud env variable
AKS is set by default, but requires changing in both CronJobs "Cloud" env variable to [EKS|GKE|AKS]
- Install with Helm
Add CAST AI helm charts repository.
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
Install hibernate
Now let's install it. (update cloud and apiKey variables)
helm upgrade -i castai-hibernate castai-helm/castai-hibernate -n castai-agent --set cloud=<AKS|EKS|GKE> --set apiKey=< CASTAI-API-KEY-REPLACE-ME-WITH-BASE64_ENCODE>
Schedule hibernate cronjobs
Update hibernate-pause and hibernate-resume cronjob schedules according to business needs.
Default examples:
#update hibernate-pause schedule according to business needs.
pauseCronSchedule: "0 22 * * 1-5"
#update hibernate-resume schedule according to business needs.
resumeCronSchedule: "0 7 * * 1-5"
- Upgrade hibernate
In order to upgrade this component to the latest version, run the following command:
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
helm upgrade castai-hibernate castai-helm/castai-hibernate --reuse-values -n castai-agent
- Helm values
Each of CAST AI helm charts has values described in this Github repository.
Updated 7 months ago