Scheduled rebalancing
How it works
Scheduled rebalancing enables full or partial rebalancing process on a user-defined schedule, scope, and trigger, to automate various use cases. Most commonly scheduled rebalancing is used in the following scenarios:
- identify the most expensive spot instances and replace them with cheaper alternatives,
- perform full rebalancing of clusters on weekends,
- roll old nodes,
- periodically target and replace nodes with specific labels.
A rebalancing schedule is an organization-wide object that can be assigned to one or multiple clusters. Once rebalancing schedule is triggered it creates a rebalancing plan that is scoped by parameters provided in the node selection preferences, whilst the problematic workloads are excluded automatically.
To set up scheduled rebalancing, navigate to the organizational level Optimization menu, or access the Rebalancer view within the cluster.
Setup
There are two main components in scheduled rebalancing functionality
- A rebalancing job is a job that triggers a rebalancing schedule on an associated cluster.
- Rebalancing schedule - an organization-wide rebalancing schedule that can be used on multiple rebalancing jobs.
The Rebalancing schedule consists of four parts:
- The schedule describes when to run the scheduling (periodically, within the maintenance window, etc.)
- Node selection preferences contain rules for picking specific nodes, how many nodes to target, and other similar rules that mimic decision-making when rebalancing a cluster manually.
- Trigger requirements - decision rules for executing the rebalancing plan. Includes rules like “Savings threshold”, which determine whether the generated plan should be executed.
- Execution safeguard - stops rebalancing before draining the original nodes if CAST AI finds that the user-specified minimum level of savings can't be achieved. This additional protection layer is required as during the rebalancing planned nodes might temporarily be unavailable in the cloud provider's inventory.
Settings
The following settings can be adjusted when setting up the scheduled rebalancing:
Setting | Description |
---|---|
Specify resource offering | Spot, On-demand or Any |
Target using labels | Key-value pairs are provided as nodeSelector terms. In the UI, when multiple terms are provided, the values are handled using AND logic, i.e., only nodes that satisfy all listed selector terms will be targeted.For OR logic between label values, use the CAST AI API endpoint POST /v1/rebalancing-schedules . In the API request, use the In operator and specify multiple values for the same label key in the matchExpressions section. This API-only feature allows targeting nodes that match any of the specified label values. Note that when using this API method, the UI may not display all values correctly. |
Minimum node age | Amount of time since the node creation before the node is allowed to be considered for rebalancing. '0' - means a node of any age can be considered. |
Evict nodes gracefully | Defines whether the nodes that failed to get drained until a predefined timeout of 20 minutes, will be kept with a rebalancing.cast.ai/status=drain-failed annotation instead of being forcefully drained. |
Maximum batch size | Maximum number of nodes that will be selected for rebalancing. '0' indicates that all nodes in the cluster can be selected |
Sort selected nodes | The algorithm used to sort selected nodes: Highest normalized CPU price - sorts by the most expensive nodes based on the price of normalized CPU (node cost / CPU count). Highest requested CPU price - sorts by the most expensive nodes based on the price of requested CPU (node cost / requested CPU count). Least utilized - sorts by the least utilized nodes first, regardless of price. |
Aggressive mode | Rebalance problematic pods, those without a controller, job pods, and pods with the removal-disabled annotation |
Savings threshold | The savings threshold can be turned off to initiate a rebalance regardless of cost impact (e.g. when nodes need to be replaced due to upgrades). Target savings - the minimum projected savings to be achieved. Rebalancing will not be executed if the plan does not project savings that meet or exceed the specified value. Guaranteed minimum savings - in situations where capacity becomes unavailable between the time the plan was generated and the creation of nodes, the Rebalancer will create alternative nodes. If the cost of the newly created nodes ends up being higher, thus not generating the required minimum savings, the newly created nodes will be deleted, and the rebalancing will be aborted. |
Execution times | Execution time can be adjusted by providing a timezone and a crontab expression |
Using OR
Conditions for Node Labels in Scheduled Rebalancing
OR
Conditions for Node Labels in Scheduled RebalancingWhile the console UI supports AND
logic for node labels, you can use OR
conditions through the CAST AI API only. Here's how:
- Use the appropriate CAST AI API endpoint for creating rebalancing schedules:
POST /v1/rebalancing-schedules
- In your API request, focus on the
matchExpressions
section within thenodeSelectorTerms
. - Use the
In
operator and specify multiple values for the same label key to create anOR
condition.
Example JSON payload snippet:
"matchExpressions": [
{
"key": "nodetemplate",
"operator": "In",
"values": [
"customerA",
"customerB",
"customerC"
]
}
]
This configuration targets nodes with the nodetemplate
label matching any of the specified values.
Warning
When using this API method, the UI may not display all values correctly. You'll see just one label in the UI, and you'll need to assign the specific cluster to it.
For more details, refer to the CAST AI API reference documentation for creating rebalancing schedules.
Updated about 1 month ago