Rebalancing

What's the best practice with rebalancing clusters - should we do it now and then, or on a regular basis to keep costs down?

Typically, we advise a rebalance after the initial onboarding process to immediately achieve cost savings. Then periodic rebalancing is recommended, as it helps to reduce some of the fragmentation that occurs naturally due to autoscaling.


What is minNodes in the context of?

The minimum count of worker nodes to be part of the rebalancing plan.


Rebalancing is an operation performed at the node level, meaning it can only be carried out on a node if all the workloads on that node can be scheduled on the new node without any issues. However, if a workload has an affinity to a custom label, it will be considered problematic for rebalancing since the CAST AI autoscaler isn't aware of this custom node selector.

If you still want to proceed with rebalancing, you can add the following label to your workloads:

autoscaling.cast.ai/disposable="true"

This label will mark them as disposable, indicating that they can be safely relocated during the rebalancing process.

spec:  
  affinity:  
    nodeAffinity:  
      requiredDuringSchedulingIgnoredDuringExecution:  
        nodeSelectorTerms:  
          - matchExpressions:  
              - key: environment  
                operator: In  
                values:  
                  - production

If you're using the custom label, you have to let CAST know about it via the node template. Once you create a Node template, CAST AI will also support custom labels.

Learn more about it on this page.


Let's say I have two pods running on one spot node in the cluster. The CPU utilization is 80% and memory 80% (the node isn't underutilized). The scheduled rebalance mechanism will check to see if this spot instance type in this zone is the cheapest one - and if not, will it redeploy it on a new spot instance?

If savings match or exceed the target savings value, CAST AI will rebalance if it can find cheaper options.

If it finds a combination that provides a specified level of savings, you can tell it to only run if it will achieve NN% savings. For instance, if you have an 80% utilized node, switching to a different spot instance may reduce costs by 12%. Let's say that you have a cost savings target of 10%. In this case, the rebalance will run. If the new node will only be 3% cheaper, and your savings target is 10%, it won't run.


Why didn't CAST AI support Pod Disruption Budget on this pod?

Scenario:

The customer ran the rebalance operation - everything was working fine until the rebalance was stuck on the last node.

The team checked and saw that there was a pod with a PDB and 0 allowed disruptions. The rebalance was pending for a few minutes, and then the node was deleted, removing the pod that had a PDB.

Solution:

CAST AI honors PDBs for 20 minutes in the draining phase. After this time passes, it assumes these are invalid PDBs and force drains the node.

If you don't want that to happen, consider adding removal-disabled annotation to the workload. That node will be skipped completely.


We have deployed a Spark application on production, it’s running now, nodes also scaled up, but we can see that it's not ready for rebalance. Why is that?

The customer gets the following error:

issues:
 - kind: RemovalDisabled  
    description: annotated with removal disabled

If the node can be safely rebalanced, they can temporarily add autoscaling.cast.ai/disposable=”true” and then remove the label or mark this to ignore.


Does CAST AI always drain the node and string first or sometimes removes the nodes without doing a graceful shutdown? I ask this because I see logs stating "Node was drained Initiated by: Rebalancer" but many times I also see "Node was deleted initiated by: autoscaler," which doesn't mention draining the node?

If the node was empty already, it would just be deleted. If the node has pods on it and is selected for eviction, it will be gracefully drained first. Any scenario where there are pods on the node will initiate a drain if the node is to be deleted.

The autoscaler only deletes empty nodes. A rebalance first creates the new nodes, and then drains and deletes the old nodes.


How can we specify minNode to have high availability?

You can specify minNode in rebalance configuration section of update/create nodeTemplate section of the API.

After that, the rebalance should respect the minNode count for that node template.

  "rebalancingConfig":{
     "minNodes":0
  },

Currently, this is available through API/Terraform only.


We are trying to run a rebalance with minNode count set to 3 but seems this is not being respected, why is that?

This appears to be an issue on our side related to the migration of this field to the node template. What is happening now, is that the value passed in UI/API is not considered when creating a green node setup. What is considered instead is the RebalanceMinNodes field of a node template.

This field is not exposed in UI but can be changed through API:

"rebalancingConfig": {  
    "minNodes": 0  
}

This is how you create a rebalancing plan with a default node template with RebalanceMinNodes=0 (which is the default value). Next, you can update the node template through API and setRebalanceMinNodes=3. Finally, you can try rebalancing and it should generate the expected number of green nodes.


Does CAST AI support rebalancing on spread key: kubernetes.io/hostname?

issues:
- kind: "TopologySpread"  
description: "Unsupported topology spread key: kubernetes.io/hostname"

This is pretty much impossible to satisfy if you use a skew of 1 and have 5 nodes, but 3 of them are 100% full. If a deployment scales to a replica of 10, then you would need to create 8 new nodes with 1 pod to satisfy the skew.

We recommend switching to podAntiAffinity. Playing around with replica count might help with rebalancing as well. Another approach is using soft affinity.


We are trying to rebalance our development cluster, but we are seeing not ready status with this message?

issues:
  - kind: PodNodeRequirements
    description: PersistentVolume "pvc-ad3c1269-8772-479f-b3a0-bf380309e67a" NodeAffinity topology label "topology.gke.io/zone" declares unsupported az "asia-south1-b"

CAST AI takes zones (locations field in API, "default node zones") from the cluster object, not the node pool. To add new zones into zonal clusters, enable at Cluster object, trigger cluster reconcile, and you should see nodes in specified zones.


Is it mandatory to have the default node template enabled? Rebalance is showing the below error:

issues: 
  - kind: PreGroupScalability
    description: Autoscaling for Node Template "default-by-castai" is disabled.

This is a new problematic pod kind. If a node has pods that use the default node template (DNT) and DNT is turned off, then the node will be considered problematic. It's mandatory to have the node template turned on to be able to create nodes for pods that use that template. This has been always the case with the rebalance feature, but it's now explicitly exposed via problematic pods.


I don't want to use the default node template and have all the workloads qualified for other node templates, what can I do?

In this case, you won't need to enable the default node template (DNT). As long as there aren't any DNT workloads, they won't run into this problematic pod kind.


Why am I seeing podAntiAffinity (zone) restricting nodes to rebalance?

Unfortunately, CAST AI don’t support pod anti-affinity yet for rebalancing.


Running a rebalance on dev/staging shows no improvements or changes, but the Savings tab shows a configuration comparison with a much improved instance fleet. Why is that?

The Savings report doesn't take into account node groups or templates. In this case, there are 3 nodes in default and 2 in system-reserved, the rebalance feature cannot combine the two and there may be pod anti-affinity that is forcing the additional nodes.


I noticed that that the total pod count before rebalance is the total count after rebalance. Is this normal?

Yes, it is possible. Each node has some DaemonSet pods running, so having fewer nodes means having fewer pods. Also, the workload could have changed during rebalancing.


When CAST AI performs a rebalance for the first time, would it go and set the min as well as desired sizes for the available ASGs to 0?

Yes, CAST AI sets the minimum and desired sizes for the available ASGs to 0