Preparation

Rebalancing replaces your suboptimal nodes with new, optimal ones. However, that is only done on nodes that aren't running problematic workloads. To increase the value of the rebalancing operation, you should decrease the number of problematic pods as much as possible.

Problematic workloads

Problematic workloads are pods that have unsupported node selector criteria. For example, pods that have a declared required node affinity on a custom label are considered problematic due to rebalance being unaware of the custom label:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: environment
                operator: In
                values:
                  - production

For a full list of supported node selector criteria, visit the Configure pod placement by topology section.

Go to the Rebalancer page of your cluster to find the workloads preventing some nodes from being rebalanced. Check the Status column:

How to resolve such problematic pod example

To consider the custom labels, one NodeTemplate should map to the custom labels:

Ignoring problematic workloads

If a workload can't be successfully rebalanced, it gets a Not ready label in the Rebalancer view. You can mark it as disposable to ignore all potential issues detected by rebalancing. To mark a workload as disposable, apply the following label or annotation:

kubectl label pod <POD_NAME> autoscaling.cast.ai/disposable="true"

Minimize disruption

During the rebalancing, existing nodes will be drained, and workloads will be migrated to the new nodes. This means that this process might be disruptive to some workloads. Rebalancing aims to minimize disruption by first creating new nodes and then draining the old ones.

You might need to take special care if workloads do not tolerate interruptions. You have multiple options:

  1. Execute the rebalancing during maintenance hours. This step would help you achieve the most cost savings.
  2. Disable rebalancing for certain nodes. You can achieve this by labeling or annotating the nodes running critical pods or by annotating critical pods. Please note that Evictor will not rebalance and evict annotated nodes, and this can reduce savings. It is recommended that some nodes dedicated to critical workloads be annotated rather than annotating multiple pods, which could be scheduled on multiple nodes and prevent their optimization.
NameValueTypeEffect
autoscaling.cast.ai/removal-disabled"true"Annotation on a Pod, but can be both label or annotation on a NodeRebalancer or Evictor won't drain a Node with this annotation or a Node running a Pod with this annotation.
cluster-autoscaler.kubernetes.io/safe-to-evict"false"Annotationon aPodRebalancer or Evictor won't drain a Node running a Pod with this annotation.

Example commands for finding and labeling a node running critical workloads:

Find the node that hosts your critical pod:

kubectl get pod -ojsonpath='{.spec.nodeName}' critical-deployment-5ddb8f8995-94wdb

Label the node that is running your critical pod:

kubectl label node ip-10-0-101-156.eu-central-1.compute.internal autoscaling.cast.ai/removal-disabled=true

📘

Note

When Rebalancing using Aggressive mode, the autoscaling.cast.ai/removal-disabled annotation is honored when applied to a node but will be overridden if set on a workload.