Rebalancing replaces your suboptimal nodes with new ones. However, that is only done on nodes which aren't running any problematic workloads. To increase the value of the rebalancing operation, you should decrease the number of problematic pods as much as possible.

Problematic workloads

Problematic workloads are pods which have unsupported node selector criteria. For example, pods which have a declared required node affinity on a custom label are considered problematic:

          - matchExpressions:
              - key: environment
                operator: In
                  - production

For a full list of supported node selector criteria visit the Configure pod placement by topology section.

Go to the Rebalance page of your cluster to find the workloads which are preventing some nodes from being rebalanced. Check the Readiness column:


Minimize disruption

Existing nodes will be drained and the workloads will be migrated to the new nodes during the rebalancing. This mean that this process might be disruptive to your workloads. Rebalancing aims to minimize disruption by first creating new nodes and then draining the old ones.

If your workloads do not tolerate interruptions, you might need to take special care. You have multiple options:

  1. Execute the rebalancing during maintenance hours. This would help you achieve the most cost savings.

  2. Disable rebalancing for certain nodes. This can be achieved by labelling or annotating the nodes which are running the critical pods or by annotating the critical pods. Please note, that annotated nodes will not be rebalanced and evicted by Evictor and this can reduce savings. It is recommended to annotate some nodes dedicated for critical workloads, rather than annotating multiple pods, which could be scheduled on multiple nodes and prevent their optimization.

    NameValueTypeEffect"true"Annotationon a Pod, but can be bothlabelorannotationon a NodeRebalancer or Evictor won't drain a Node with this annotation or a Node running a Pod with this annotation.

    Example commands for finding and labelling a node running critical workloads:

    Find the node that hosts your critical pod:

    kubectl get pod -ojsonpath='{.spec.nodeName}' critical-deployment-5ddb8f8995-94wdb

    Label the node that is running your critical pod:

    kubectl label node