How Evictor avoids downtime during Bin Packing

Evictor follows certain rules to avoid downtime. In order for the node to be considered for possible removal due to bin-packing, all of the pods running on the node must meet following criteria:

  • A pod must be replicated: it should be managed by a Controller (e.g. ReplicaSet, ReplicationController, Deployment), which has more than one replicas (see Overrides)
  • A pod is not part of StatefulSet
  • A pod must not be marked as non-evictable (see Overrides)
  • All static pods (YAMLs defined in node's /etc/kubernetes/manifests by default) are considered evictable
  • All DaemonSet-controller pods are considered evictable
  • Pod disruption budgets are respected

🚧

Aggressive Mode

In more fault tolerant systems, to achieve even better waste reduction, one can turn on aggressive mode, in that scenario Evictor would end up bin packing not just multi-replica pods.

Override Evictor rules for Pods & Nodes

  • autoscaling.cast.ai/removal-disabled="true"
    • Node: Annotation / Label
    • Pod: Annotation
    • Description: Evictor won't try to Evict a Node with this Annotation or Node running Pod annotated with this value.
      annotated with this Annotation.
  • beta.evictor.cast.ai/disposable="true"
    • Pod: Annotation
    • Description: Evictor will treat this Pod as Evictable despite any of the other rules.
  • beta.evictor.cast.ai/eviction-disabled="true" (deprecated)
    • Node: Annotation / Label
    • Pod: Annotation
    • Description: Evictor won't try to Evict a Node with this Annotation or Node running Pod

Examples of override commands

Label or annotate a pod, so Evictor won't evict a node running an annotated pod (can be applied on a node as well).

kubectl label pods <pod-name> beta.evictor.cast.ai/eviction-disabled="true"
kubectl annotate pods <pod-name> beta.evictor.cast.ai/eviction-disabled="true"

Label or annotate a node, to prevent eviction of pods as well as removal of the node (even when it's empty):

kubectl label nodes <node-name> autoscaling.cast.ai/removal-disabled="true"
kubectl annotate nodes <node-name> autoscaling.cast.ai/removal-disabled="true"

You can also annotate a pod to make it dispossable, irrespective of other criteria that would normally make the pod un-evictable. Here is an example of a disposable pod manifest:

kind: Pod
metadata:
  name: disposable-pod
  annotations:
    beta.evictor.cast.ai/disposable: "true"
spec:
  containers:
    - name: nginx
      image: nginx:1.14.2
      ports:
        - containerPort: 80
      resources:
        requests:
          cpu: '1'
        limits:
          cpu: '1'

Due to applied annotation, pod will be targeted for eviction even though it is not replicated.

Advanced configuration options

List of more advanced configuration options can be found at: helm-charts/castai-evictor

Troubleshooting

Evictor policy is not allowed to be turned on

  • The reasons why Evictor is unavailable in the policies page is that CAST AI has detected an already existing Evictor installation
  • In such scenario, CAST AI will not try to manage evictor settings & upgrades
  • If you want CAST AI to manage Evictor configurations and upgrade to the most recent version, then you need to remove the current installation first

How to check the logs

To check Evictor logs, run the following command:

kubectl logs -l app.kubernetes.io/name=castai-evictor -n castai-agent

Manually install Evictor

Evictor will compact your pods into fewer nodes, creating empty nodes that will be removed by the Node deletion policy. To install Evictor run this command:

helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade --install castai-evictor castai-helm/castai-evictor -n castai-agent --set dryRun=false

This process will take some time. Also, by default, Evictor will not cause any downtime to single replica deployments / StatefulSets, pods
without ReplicaSet, meaning that those nodes can't be removed gracefully. Familiarize with rules and available overrides in order to setup Evictor to meet your needs.

In order for evictor to run in more aggressive mode (start considering applications with single replica), you should pass the following parameters:

--set dryRun=false,aggressiveMode=true

In order for evictor to run in scoped mode (only removing nodes created by CAST AI when using scoped autoscaler), you should pass the following parameters:

--set dryRun=false,scopedMode=true

Evictor by default will only impact nodes older than 5 minutes, if you wish to change the grace period before a node can be considered for eviction set the nodeGracePeriodMinutes parameter to the desired time in minutes. This is useful for slow to start nodes to prevent them from being marked for eviction before they can start taking workloads.

--set dryRun=false,nodeGracePeriodMinutes=8

Manually upgrade Evictor

  • Check the Evictor version you are currently using:

    helm ls -n castai-agent
    
  • Update the helm chart repository to make sure that your helm command is aware of the latest charts:

    helm repo update
    
  • Install the latest Evictor version:

    helm upgrade --install castai-evictor castai-helm/castai-evictor -n castai-agent --set dryRun=false --set image.repository=us-docker.pkg.dev/castai-hub/library/castai-evictor
    
  • Check whether the Evictor version was changed:

    helm ls -n castai-agent