Scheduled rebalancing

How it works

Scheduled rebalancing enables full or partial rebalancing process on a user-defined schedule, scope, and trigger, to automate various use cases. Most commonly scheduled rebalancing is used in the following scenarios:

  • identify the most expensive spot instances and replace them with cheaper alternatives,
  • perform full rebalancing of clusters on weekends,
  • roll old nodes,
  • periodically target and replace nodes with specific labels.

A rebalancing schedule is an organization-wide object that can be assigned to one or multiple clusters. Once rebalancing schedule is triggered it creates a rebalancing plan that is scoped by parameters provided in the node selection preferences, whilst the problematic workloads are excluded automatically.

To set up scheduled rebalancing, navigate to the organizational level Optimization menu, or access the Rebalancer view within the cluster.

Setup

There are two main components in scheduled rebalancing functionality

  • Rebalancing job - a job that triggers a rebalancing schedule on an associated cluster.
  • Rebalancing schedule - an organization-wide rebalancing schedule that can be used on multiple rebalancing jobs.

The Rebalancing schedule consists of four parts:

  • The schedule describes when to run the scheduling (periodically, within the maintenance window, etc.)
  • Node selection preferences contain rules for picking specific nodes, how many nodes to target, and other similar rules that mimic decision-making when rebalancing a cluster manually.
  • Trigger requirements - decision rules for executing the rebalancing plan. Includes rules like “Savings threshold”, which determine whether the generated plan should be executed.
  • Execution safeguard - stops rebalancing before draining the original nodes if CAST AI finds that the user-specified minimum level of savings can't be achieved. This additional protection layer is required as during the rebalancing planned nodes might temporarily be unavailable in the cloud provider's inventory.

Settings

The following settings can be adjusted when setting up the scheduled rebalancing:

SettingValue
Node life cycleSpot, On-demand or Any
Target using labelsKey-value pairs are provided as NodeSelector terms. In case multiple terms are provided the values are handled using AND logic. i.e. only nodes that satisfy all listed selector terms will be targeted
Minimum node ageAmount of time since the node creation before the node is allowed to be considered for rebalancing. '0' - means a node of any age can be considered.
Evict nodes gracefullyDefines whether the nodes that failed to get drained until a predefined timeout of 20 minutes, will be kept with a rebalancing.cast.ai/status=drain-failed annotation instead of being forcefully drained.
Maximum batch sizeMaximum number of nodes that will be selected for rebalancing. '0' indicates that all nodes in the cluster can be selected
Savings thresholdMinimum projected savings to be achieved. Rebalancing will not be executed if the plan does not project a number that is equal to or higher than the specified value.
Execution timeExecution time can be adjusted by providing a timezone and a crontab expression
Guaranteed minimum savingsSpecify the minimum acceptable value of savings. The rebalancing plan will be executed beyond the node creation phase if there are guaranteed minimum savings that are equal to or higher than the specified value.
If minimum savings can't be achieved newly created nodes will be deleted and rebalancing will be aborted