How it works
The CAST AI Autoscaler removes nodes when it detects that they have not been utilized for a certain period of time. The goal of removing empty nodes is to save costs and reduce waste.
The Autoscaler only removes empty nodes. Nodes end up empty because all pods running on the node have been deleted, possibly due to ReplicaSet
scale down or Job
completion.
Downscaling options
You have two levels of downscaling available at your disposal:
- Node deletion policy - this policy just removes nodes that are empty and no longer running in any capacity. For example, if a job you're running goes past its run, a node may become empty, and CAST AI will automatically remove it to avoid waste.
- Evictor - it continuously compacts pods into fewer nodes, creating empty nodes that can be removed following the Node deletion policy (if you choose to enable it). Evictor actively bin packs your cluster state and moves pods around to achieve higher node utilization. Nodes that have been freed up are removed in accordance with the Node deletion policy.
Bin packing flow
CAST AI implemented the Evictor component to solve the bin-packing problem.
- Evictor continuously scans your cluster every 60 seconds (default cycle time).
- It identifies nodes that have been active for more than 5 minutes (default
NODE_GRACE_PERIOD_MINUTES
) as potential candidates for eviction. - Evictor assesses if all current workloads on a candidate node can be rescheduled on existing capacity in the cluster, considering node selectors, affinity rules, and other constraints.
- If a suitable node is found, Evictor cordons the node, drains it, and moves the workloads to other nodes, optimizing resource utilization.
- Once a node becomes empty, the Node deletion policy (which should be enabled) will remove the node after a 5-minute Time-To-Live (TTL) period.
Note
Evictor does not use percentage-based utilization criteria. It considers overall cluster capacity and workload distribution when making eviction decisions.
Updated 2 months ago