Pod Pinner

Pod Pinner is a critical component that coordinates between the Cast AI Autoscaler and the Kubernetes scheduler to ensure the most optimal pod placement in your Kubernetes cluster. Without Pod Pinner, the Kubernetes scheduler might place pods on unintended nodes, leading to suboptimal resource utilization and increased costs.

Pod Pinner enables the integration of the Cast AI Autoscaler's decisions into your cluster, allowing it to override the decisions of the Kubernetes cluster scheduler when necessary. For example, Pod Pinner assigns pending pods to a placeholder node during node rebalancing, ensuring they are directed to the correct node once it’s ready.

Installing Pod Pinner can directly enhance cluster savings. Pod Pinner is a Cast AI in-cluster component similar to the Cast AI agent, cluster controller, and others. There is no risk of downtime if Pod Pinner fails; the default Kubernetes scheduler will automatically take over pod placement decisions for rebalancing and autoscaling.

How Pod Pinner works

Pod Pinner works alongside the Cast AI Autoscaler to ensure pods are scheduled onto the specific nodes chosen by the Autoscaler. Here's how this process works:

  1. When the Cast AI Autoscaler determines new nodes are needed, it sends scheduling instructions to Pod Pinner through a continuous data stream.

  2. Pod Pinner creates a placeholder Node resource in Kubernetes using the API. This placeholder reserves the space for the incoming node before it physically joins the cluster, allowing Pod Pinner to coordinate scheduling decisions ahead of time.

  3. For pods that need scheduling, Pod Pinner uses the Kubernetes /pods/binding API to explicitly assign them to specific nodes by setting spec.nodeName. This direct binding bypasses the default Kubernetes scheduling process.

  4. When the actual node joins the cluster, it automatically connects to its corresponding placeholder node, ensuring continuity in the scheduling process.

This coordination prevents scheduling conflicts between Cast AI and Kubernetes by ensuring pods land on their intended nodes rather than letting the Kubernetes scheduler place them based on its own scoring mechanisms. This is especially important because the Kubernetes scheduler isn't aware of nodes that are about to join the cluster and may make placement decisions that are different from what the Cast AI Autoscaler intended.

Limitations

🚧

Notice

Pod Pinner may conflict with Spot-webhook, we do not recommend using them together at this time.

Using them together may result in some failed pods during scheduling. This is because Pod Pinner is unaware of changes applied by other webhooks when binding pods to nodes. While these failed pods are typically recreated by Kubernetes without negative impact, we are working on improving compatibility between Pod Pinner and Spot-webhook to fully address this issue.

Custom resources

Pod Pinner cannot pin pods that use custom resources (such as GPUs and ENIs). This is because custom resources like GPUs are initialized after a node joins the cluster. When a node first boots up, custom resources (like GPUs) are not immediately available - they require initialization by their respective drivers.

Installation and version upgrade

For newly onboarded clusters, the latest version of the Pod Pinner castware component castai-pod-pinner is installed automatically. Therefore, at the beginning:

  1. Review whether your cluster has the castai-pod-pinner deployment in the castai-agent namespace available:
$ kubectl get deployments.apps   -n castai-agent
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
castai-agent                1/1     1            1           15m
castai-agent-cpvpa          1/1     1            1           15m
castai-cluster-controller   2/2     2            2           15m
castai-evictor              0/0     0            0           15m
castai-kvisor               1/1     1            1           15m
castai-pod-pinner           2/2     2            2           15m

Helm

Option 1: Cast AI-Managed (default)

By default, Cast AI manages Pod Pinner, including automatic upgrades.

  1. Check the currently installed Pod Pinner chart version. If it's >= 1.0.0, an upgrade is not needed.
    You can check the version with the following command:
$ helm list -n castai-agent --filter castai-pod-pinner
NAME             	NAMESPACE   	REVISION	UPDATED                                	STATUS  	CHART                  	APP VERSION
castai-pod-pinner	castai-agent	11      	2024-09-26 11:40:00.245427517 +0000 UTC	deployed	castai-pod-pinner-1.0.2	v1.0.0
  1. If the version is < 1.0.0 run the following commands to install or upgrade Pod Pinner to the latest version.
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
helm upgrade --install castai-pod-pinner castai-helm/castai-pod-pinner -n castai-agent

After installation or upgrade to version >= 1.0.0Pod Pinner will automatically be scaled to 2 replicas and will be managed by Cast AI, as indicated by the charts.cast.ai/managed=true label applied to the pods of the castai-pod-pinner deployment. All the following Pod Pinner versions will be updated automatically.

Option 2: Self-Managed

To control the Pod Pinner version yourself:

helm upgrade -i castai-pod-pinner castai-helm/castai-pod-pinner -n castai-agent --set managedByCASTAI=false

This prevents Cast AI from automatically managing and upgrading Pod Pinner.

Upgrading using Pod Pinner when in Self-Managed mode:

If the version is < 1.0.0 run the following commands to upgrade Pod Pinner to the latest version.

helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
helm upgrade --install castai-pod-pinner castai-helm/castai-pod-pinner -n castai-agent --set managedByCASTAI=false

Re-running Onboarding Script

You can also install Pod Pinner by re-running the phase 2 onboarding script. For more information, see the cluster onboarding documentation.

Terraform Users

For Terraform users, you can manage Pod Pinner installation and configuration through your Terraform scripts. This allows for version control and infrastructure-as-code management of Pod Pinner settings.

Enabling/Disabling Pod Pinner

Pod Pinner is enabled by default but can be disabled in the Cast AI console.

If you disable Pod Pinner this way, the deployment will be scaled down to 0 replicas and not auto-upgraded by Cast AI.

Autoscaler Settings UI

To enable/disable Pod Pinner:

  1. Navigate to Autoscaler settings in the Cast AI console.
  2. You'll find the Pod Pinner option under the "Unscheduled pods policy" section.
  3. Check/uncheck the "Enable pod pinning" box to activate/deactivate the Pod Pinner. Disabling Pod Pinner will scale down the deployment to 0 replicas and turn off auto-upgrades.

📘

Note

When Pod Pinner is disabled through the console and the charts.cast.ai/managed=true label is present, CAST AI will scale down the deployment to 0 replicas no matter what. To manually control Pod Pinner while keeping it active, use the self-managed installation option mentioned above.

Ensuring stability

It is suggested that you keep the Pod Pinner pod as stable as possible, especially during rebalancing. You can do so by applying the same approach you are using for castai-agent.

For instance, you can add the autoscaling.cast.ai/removal-disabled: "true" label/annotation to the pod. If the Pod Pinner pod restarts during rebalancing, the pods won't get pinned to the nodes as expected by the Rebalancer. It may result in suboptimal pod placement as the Kubernetes cluster scheduler will schedule the pods.

📘

Note

You can scale down the castai-pod-pinner deployment anytime. This will result in normal behavior and will not impact the cluster negatively other than the Kubernetes scheduler taking over pod scheduling.

Logs

You can access logs in the Pod Pinner pod to see what decisions are being made. Here is a list of the most important log entries:

ExampleMeaning
node placeholder createdA node placeholder has been created. The real node will use this placeholder when it joins the cluster.
pod pinnedA pod has been successfully bound to a node. Such logs always appear after the node placeholder is created.
node placeholder not foundThis log appears when Pod Pinner tries to bind a pod to a non-existing node. This may occur if Pod Pinner fails to create the node placeholder.
pinning podThis log occurs when the Pod Pinner's webhook intercepts a pod creation and binds it to a node. This happens during rebalancing.
node placeholder deletedA node placeholder has been deleted. This happens when a node fails to be created in the cloud, and Pod Pinner must clean up the placeholder that was created.
failed streaming pod pinning actions, restarting...The connection between the Pod Pinner pod and Cast AI has been reset. This is expected to happen occasionally and will not negatively impact your cluster.
http: TLS handshake error from 10.0.1.135:48024: EOFThis log appears as part of the certificate rotation performed by the webhook. This is a non-issue log and will not negatively impact the cluster.

Troubleshooting

Failed pod status reason: OutOf{resource}

OutOfcpu, OutOfmemory, OutOf{resource} pod statuses happen when the scheduler schedules a pod on a node, but the kubelet rejects it due to a lack of some sort of resource. These are Failed pods that CAST AI and the Kubernetes control-plane know how to ignore.

This happens when many pods are upscaled at the same time. The scheduler has various optimizations to deal with large bursts of pods, so it makes scheduling decisions in parallel. Sometimes, those decisions conflict, resulting in pods scheduled on nodes where they don't fit. This happens especially in GKE. If you see this status, don't be afraid. The control-plane will eventually clean those pods up after a few days.

Pods might get this status when the Kubernetes scheduler takes over scheduling decisions due to a blip in Pod Pinner's availability. However, this does not negatively impact the cluster as Kubernetes recreates the pods.

Failed pod status reason: Node affinity

If you use spot-webhook, your cluster may encounter this issue, which puts the pods in a Failed status. This occurs because Pod Pinner is unaware of other webhook-applied changes to the pods when binding them to nodes. This means that Pod Pinner may have a pod with different node selectors in mind compared to reality.

As with the OutOf{resource} pod status, this is simply a visual inconvenience as the pod will get recreated by Kubernetes.