Pod Pinner

πŸ“˜

Beta release

A recently released feature for which we are actively gathering community feedback.

Pod Pinner addresses the misalignment between the actions of the CAST AI Autoscaler and the Kubernetes cluster scheduler.

For example, while the CAST AI Autoscaler efficiently binpacks pods and creates nodes in the cluster in a cost-optimized manner, the Kubernetes cluster scheduler determines the actual placement of pods on nodes. This can lead to suboptimal pod placement, fragmentation, and unnecessary resource waste, as pods may end up on different nodes than those anticipated by the CAST AI Autoscaler.

Pod Pinner enables the integration of the CAST AI Autoscaler's decisions into your cluster, allowing it to override the decisions of the Kubernetes cluster scheduler. Installing Pod Pinner can directly enhance savings in the cluster. Pod Pinner is a CAST AI in-cluster component, similar to the CAST AI agent, cluster controller, and others.

Installation

  1. Review whether your cluster has the castai-pod-pinner deployment in the castai-agent namespace available. If it has, make sure it has 0 replicas running. If it hasn't - rerun the phase 2 onboarding script. More information on onboarding can be found here;
  2. Scale up the castai-pod-pinner deployment to exactly 1 replica.

It is suggested that you keep the Pod Pinner pod as stable as possible, especially during rebalancing. You can do so by applying the same approach you are using for castai-agent. For instance, you can add the autoscaling.cast.ai/removal-disabled: "true" label/annotation to the pod. If the Pod Pinner pod restarts during rebalancing, the pods won't get pinned to the nodes as expected by the Rebalancer. It may result in suboptimal pod placement as the Kubernetes cluster scheduler will schedule the pods.

Note that you can scale down the castai-pod-pinner deployment anytime. This will result in normal behavior and will not impact the cluster negatively other than the Kubernetes scheduler taking over pod scheduling.

Logs

You can access logs in the Pod Pinner pod to see what decisions are being made. Here is a list of important logs:

ExampleMeaning
node placeholder createdA node placeholder has been created. The real node will use this placeholder when it joins the cluster.
pod pinnedA pod has been successfully bound to a node. Such logs always appear after the node placeholder is created.
node placeholder not foundThis log appears when Pod Pinner tries to bind a pod to a non-existing node. This may occur if Pod Pinner fails to create the node placeholder.
pinning podThis log occurs when the Pod Pinner's webhook intercepts a pod creation and binds it to a node. This happens during rebalancing.
node placeholder deletedA node placeholder has been deleted. This happens when a node fails to get created in the cloud, and Pod Pinner needs to clean up the created placeholder.
failed streaming pod pinning actions, restarting...The connection between the Pod Pinner pod and CAST AI has been reset. This is expected to happen occasionally and will not negatively impact your cluster.
http: TLS handshake error from 10.0.1.135:48024: EOFThis log appears as part of the certificate rotation performed by the webhook. This is a non-issue log and will not negatively impact the cluster.

Good to Know

Failed pod status reason: OutOf{resource}

OutOfcpu, OutOfmemory, OutOf{resource} pod statuses happen when the scheduler schedules a pod on a node, but the kubelet rejects it due to lack of some resource. These are Failed pods that CAST AI and the Kubernetes control-plane know how to ignore.

This happens when many pods are upscaled at the same time. The scheduler has various optimizations to deal with large bursts of pods, so it makes scheduling decisions in parallel. Sometimes, those decisions conflict, resulting in pods scheduled on nodes where they don't fit. This happens especially in GKE. If you see this status, don't be afraid. The control-plane will eventually clean those pods up after a few days.

Pods might get this status when the Kubernetes scheduler takes over scheduling decisions due to a blip in Pod Pinner's availability. However, this does not negatively impact the cluster as Kubernetes recreates the pods.

Failed pod status reason: NodeAffinity

If you use spot-webhook, your cluster may encounter this issue, which puts the pods in Failed status. This occurs because Pod Pinner is unaware of other webhook-applied changes to the pods when binding them to nodes. This means that Pod Pinner may have a pod with different node selectors in mind compared to reality.

As with OutOf{resource} pod status, this is simply a visual inconvenience as the pod will get recreated by Kubernetes.