What is the impact of CAST AI on availability?

The CAST AI autoscaler continuously monitors the cluster and upscales or downscales it based on the real-time demand.

Does the CAST AI balancing function call the Kubernetes autoscaler function?

No, CAST AI has its own autoscaler, which replaces the default Kubernetes autoscaler.

When a cluster is connected to CAST AI, does it control autoscaling for whole entire or select node pools? For example, if I created a new node pool, is there way I can keep it out of the CAST AI autoscaler ?

CAST AI will look for optimization opportunities everywhere by default. However, if you'd like to exclude certain nodes or node pools, you can do that easily by adding this label to them: "true

Is there a way to provision several or many nodes at the same time?

CAST AI provisions compute instances based on workload demands, which may result in several nodes being provisioned by our autoscaler automatically. The process of node provisioning takes 1 to 2 minutes.

Should we remove all the existing node groups if we are going for CAST AI's full cluster autoscaler?

CAST AI supports node groups. However, physical isolation may create waste. So we recommend that if there's no business value to the existing node groups, it's best to remove them.

Should we remove the AWS autoscaler deployment?

Example scenario:

In our shared-staging cluster, we still have a deployment called aws-cluster-autoscaler-shared-staging-aws-cluster-autoscaler. If it is going to be overridden/ignored I'd rather leave it there just in case we need to roll back the CAST AI optimization deployment

Please scale down to zero and let CAST take over the autoscaler part. If you'd like to keep it up, you can. However, the CAST AI and cluster autoscaler both will add capacity (this extra capacity will be removed later on).

Can you please share some guidance on cluster headroom? I would like to add some buffer room so that pods have a place to run when nodes go down.

We recommend using placeholder pods; sleeping pods with the same requests as your critical pods. If you'd like to have room for 3 pods, set up 3 sleeping pods with a low priorityClass (negative number).

That way, your high-priority workloads will evict the low-priority pods during a spot interrupt event. The evicted pods will trigger the autoscaler and the capacity will be replaced.

Under what condition will the CAST AI controller stop scaling up new node ? Can we see such an event in the CAST AI controller logs ?

CAST AI will continue scaling as long as the autoscaler is active.


  • if a newly added node remains in a non-ready state for 15 minutes, it will be replaced.
  • if the kubelet on a node fails to send a heartbeat for 15 minutes, indicating a dead node, it will be replaced.

Currently Audit logs is the right place for alerting. You can also make use of audit events API.

What is the implication to our service if CAST AI is down?

If CAST AI experiences an issue, this wouldn't impact anything running in the cluster. However, it would prevent CAST AI from scaling new instances within the cluster.

You can set up Cluster Autoscaler as a delayed secondary autoscaler if desired. For example, if CAST AI doesn't schedule a pod within 3 minutes, the Cluster Autoscaler would be a fallback. CAST AI is characterized by high availability and so far none of our customers had to use this fallback method.

What is the logic behind the 70/30 CPU/MEM cost split?

The CPU/MEM split is based on the pricing of different tiers of instances that run the same CPU but have different levels of memory. For instance, if you have 1:2, 1:4, and 1:8 CPU-to-memory ratios, we can abstract out what the cost is for each additional gig of memory and decide what the correct allocation is across different shapes.

How are costs counted for EC2? Do they include only instances, or traffic, volumes, and other cost factors as well?

CAST AI only calculates for instance (on-demand or spot) pricing based on the provider's public pricing, updated hourly on our side.

Does CAST AI support GCP Anthos or other on-prem solutions for Kubernetes?

CAST AI doesn't support such solutions at the moment.

Does CAST AI help to reduce costs for GKE?

Yes, CAST AI helps reduce costs in your Google Kubernetes Engine clusters. The amount of savings will vary based on the cluster configuration you choose. We have consistently helped our customers save anywhere between 30-50% on their GKE costs.

Can I set the autoscaler to give nodes a buffer zone so when resources hit 80% in either CPU or RAM, CAST AI can spin up a new node?

We typically recommend running a dummy pod for this kind of scenario. You can create a deployment or pod with the desired resource requests and give it a low-priority class to create room in the cluster. The low-priority class will allow the dummy pod to be preempted when you introduce a new workload that needs to be scheduled.

Here's an example you can adjust or use as a guide:

Create priority classes:

cat \<\<EOF | kubectl apply -f -  
kind: PriorityClass  
  name: low-priority  
value: -1  
globalDefault: false  
description: "This priority class should be used for dummy service pods only."  

Low priority Workloads example:

cat \<\<EOF | kubectl apply -f -  
apiVersion: apps/v1  
kind: Deployment  
  name: spark-app-dummy  
  namespace: default  
    app: overprovisioning  
  replicas: 5  
      app: overprovisioning  
        app: overprovisioning  
      - name: pause  
            cpu: "4"  
            memory: "5Gi"  
            ephemeral-storage: "150Gi"  
      priorityClassName: low-priority  
      nodeSelector:  "castai-storage-optimized"  
      - key: ""  
        value: "castai-storage-optimized"  
        operator: "Equal"  
        effect: "NoSchedule"  

Can I make sure that there are always at least two on-demand instances (regardless of type) when autoscaling with CAST AI?

This use case makes sense if you have some essential services that you'd like to host on on-demand instances instead of spot instances to eliminate the chance of failure.

There are a few ways in which you can accomplish this in CAST AI:

  1. You can do it via the mutating webhook (Spot only cluster).
  2. You could also set up a dedicated node template with lifecycle = on-demand and workload pinned to this node template via nodeSelector.
  3. Finally, if you'd like to hold two on-demand nodes open and keep them from getting bin-packed, you can launch a busy box workload with 2 replicas and a podantiaffinity to itself. Place this on your on-demand nodeTemplate with a minimum node size set to the minimum on-demand capacity you want. For instance, if you need 2 - 4core on-demand boxes, set the min-cpu to 4cores and your busybox placeholder to 2 replicas:
apiVersion: apps/v1  
kind: Deployment  
  name: busybox-deployment  
  replicas: 2  
      app: busybox  
        app: busybox  
            - labelSelector:  
                  - key: app  
                    operator: In  
                      - busybox  
              topologyKey: ""  
        - name: busybox  
          image: busybox  
            - "sleep"  
            - "3600"

Does CAST AI support new sidecar containers features?

This feature is not supported by CAST AI.