Evictor will end up bin-packing a single replica as well. That is why the aggressive mode is recommended for more fault-tolerant systems.
You can annotate the workload using the following command to prevent Evictor from evicting a node running an annotated pod (this can also be applied to a node):
kubectl annotate pods <pod-name> autoscaling.cast.ai/removal-disabled="true"
Note that this solution has one limitation: it will lock other replicas from moving and prevent rebalancing.
Aggressive Evictor might work if you have a lot of single replicas as long as they aren't mission-critical and can be moved around.
Policies in the API override the parameters, you can update this via the API.
The first step is to get the current policy configurations: Gets policies configuration for the target cluster
Then update the configuration accordingly via PUT: Upsert cluster's policies configuration
All CAST AI components can run in Active/Passive mode, and Evictor supports it.
Evictor in non-aggressive mode will only consider ReplicaSets (usually Deployments), so elements like ReplicaSet controllers should be treated the same - they will get evicted if there are 2+ replicas.
The latest version of Evictor in non-aggressive mode will not touch parallel jobs.
ScopedMode means that Evictor will only consider evicting pods from nodes that were created by CAST AI (cast.ai/managedby=castai label on node).
Evictor has been shown to run fine in a 30,000 CPU cluster where 100 node batches are being removed every minute. You can specify batch size, and how many nodes Evictor can evict in a single batch. With a sleep of 10 seconds, one node in a batch is usually enough for sub 10,000 CPU clusters.
Check out this page to learn more about the permissions required by Evictor.
Rebalancing and Evictor are two different CAST AI features.
You can find detailed information on how each of them works at the following links:
We recommend using Pod Disruption Budgets to control the drain rate of critical applications, here's a helpful resource: Specifying a disruption budget for your application.
Setting minAvailable: 1 in a Pod Disruption Budget will ensure that at least one pod is always available.
Evictor doesn't work on the basis of thresholds but runs a simulation to detect where pods would go if a node was deleted. If simulations result in a graceful (and non-impactful) migration of all pods to other nodes, Evictor will carry on with the eviction.
The Managed Evictor Advanced Configuration feature only works if the castai-agent version is at least v0.49.1. You can manually change the Evictor Advanced Configuration configmap with older versions. If you experience any issues with Evictor Advanced Configuration, please check the castai-agent version first.
Yes, Evictor will ignore them as they are used by the Deployment level for rolling updates and Evictor works on pod scope. However, Evictor takes the pod distribution budget into account.
Yes, CAST AI does that.
Init containers are the containers that run until completion, and all the same rules apply to them.
Kubernetes will execute the preStop hooks for any running containers, if configured. Next, it will send the SIGTERM signal to all running containers, which all apps must listen to when running in K8s. Then it will wait for 30 seconds, by default, or the configured terminationGracePeriodSeconds time for all containers to exit. If the containers are still running after that time, it sends the SIGKILL signal, which stops the container abruptly.
If any of those mechanisms aren't respected by the app (e.g., it's not listening to SIGTERM), downtime is possible.
level=debug msg="#569 node - ip-10-16-126-241.ec2.internal eliminated by PodsEvictable, reason: pod disruption budget" aggressive_mode=true level_int=5 scoped_mode=true
"Eliminated" means that the pod was spared from eviction to avoid breaking Pod Disruption Budgets (PDBs).
The NODE_GRACE_PERIOD determines the time we wait before considering a new node for eviction. While the timeout for a force-delete can be set in the API call, there might not be a specific setting to forcefully terminate a node after a certain time.
All nodes currently have a roughly 20-minute drain timeout. If a pod doesn't drain gracefully within this timeframe, the node is returned to service. This means if you have a deployment set to sleep for 20 minutes before terminating, it may not be fully honored if it exceeds the autoscaler's set timeout.
Updated 14 days ago