Pausing a cluster

CAST AI provides a set of CronJobs that can be used to pause and resume Kubernetes cluster on a defined schedule. When executed CAST AI components will continue to run on a defined single node, while the rest of the cluster capacity will be removed. Once cluster is set to resume, CAST AI will use standard Autoscaler capabilities to provide most cost efficient nodes to run pending pods.

In order to pause and resume a cluster two CronJobs will be executed:

- Hibernate-pause Job

Disable Unscheduled Pod Policy (to prevent growing cluster)
Prepare Hibernation node (node that will stay hosting essential components)
Mark essential Deployments with Hibernation toleration
Delete all other nodes (only hibernation node should stay running)


- Hibernate-resume Job

Renable Unscheduled Pod Policy to allow cluster to expand to needed size


- Override default hibernate-node size

Set the HIBERNATE_NODE environment variable to override the default node sizing selections. Make sure the size selected is appropriate for your cloud.


- Install hibernate

Run this command to install Hibernate CronJobs

kubectl apply -f https://raw.githubusercontent.com/castai/hibernate/main/deploy.yaml

Change API key

Create API token with Full Access permissions and encode base64

echo -n "98349587234524jh523452435kj2h4k5h2k34j5h2kj34h5k23h5k2345jhk2" | base64

use this value to update Secret

apiVersion: v1
kind: Secret
metadata:
  name: castai-hibernate
  namespace: castai-agent
type: Opaque
data:
  API_KEY: >-
    CASTAI-API-KEY-REPLACE-ME-WITH-ABOVE==

OR for convenience use one liner

kubectl get secret castai-hibernate -n castai-agent -o json | jq --arg API_KEY "$(echo -n 9834958-CASTAI-API-KEY-REPLACE-ME-5k2345jhk2 | base64)" '.data["API_KEY"]=$API_KEY' | kubectl apply -f -

Set Cloud env variable

AKS is set by default, but requires changing in both CronJobs "Cloud" env variable to [EKS|GKE|AKS]




- Install with Helm

Add CAST AI helm charts repository.

helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update

Install hibernate

Now let's install it. (update cloud and apiKey variables)

helm upgrade -i castai-hibernate castai-helm/castai-hibernate -n castai-agent --set cloud=<AKS|EKS|GKE> --set apiKey=< CASTAI-API-KEY-REPLACE-ME-WITH-BASE64_ENCODE>

Schedule hibernate cronjobs

Update hibernate-pause and hibernate-resume cronjob schedules according to business needs.

Default examples:

#update hibernate-pause schedule according to business needs.
pauseCronSchedule: "0 22 * * 1-5"

#update hibernate-resume schedule according to business needs.
resumeCronSchedule: "0 7 * * 1-5"



- Upgrade hibernate

In order to upgrade this component to the latest version, run the following command:

helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
helm upgrade castai-hibernate castai-helm/castai-hibernate --reuse-values -n castai-agent



- Helm values

Each of CAST AI helm charts has values described in this Github repository.


What’s Next