Cast AI Workload Autoscaler supports Kubernetes in-place resource resizing for clusters running Kubernetes v1.33+, allowing resource adjustments that minimize pod disruption. In-place resizing attempts to apply resource changes without restarting pods, significantly reducing optimization overhead compared to traditional pod recreation. This feature allows workloads to have their CPU and memory requests dynamically adjusted while maintaining continuous operation in many scenarios.

📘
Kubernetes v1.33+ Feature
In-place pod resizing requires Kubernetes v1.33 or later with the InPlacePodVerticalScaling feature gate enabled (enabled by default in v1.33+).

Key capabilities

Resource requests are adjusted with minimal disruption—often without pod restarts—maintaining application availability during optimization
When in-place resizing cannot be applied immediately, the system gracefully falls back to the configured apply mode (immediate or deferred pod restart)
Automatic protection against memory limit reduction during downscaling to prevent potential application issues, following Kubernetes best practices
In-place resizing complements existing immediate and deferred scaling modes with reduced disruption
Leverages the standard Kubernetes in-place resizing API for compatibility across different cluster configurations

This is particularly valuable for production workloads where minimizing disruption is critical, while still achieving the cost and performance benefits of continuous resource optimization.

Learn more about the benefits of in-place rightsizing on our blog How In-Place Pod Resizing Works in Kubernetes and Why Cast AI Makes It Better.

Prerequisites

For in-place resizing to work with Workload Autoscaler:

Kubernetes version: v1.33 or later
Operating system: Linux nodes only (Windows pods not supported)
Quality of Service: Pods must maintain their original QoS class (Guaranteed, Burstable, or BestEffort)
Workload Autoscaler version: v0.53.0 (Helm chart v0.1.122) or later for full support

How it works

When Workload Autoscaler generates a recommendation for a pod that supports in-place resizing:

Compatibility check: The system verifies that the pod, containers, and cluster support in-place resizing
In-place attempt: If compatible, attempts to adjust CPU and memory requests directly on the running pod
Restart scenarios: Container restarts may occur when:
- The container's resize policy requires a restart for the resource being changed (typically memory)
- The resize enters a pending state due to resource constraints
- The recommendation includes changes that cannot be applied in-place (such as removing CPU limits)
Graceful fallback: If in-place resizing fails or isn't supported, the system falls back to the configured apply mode (immediate or deferred pod restart)
Memory protection: Memory limits are not decreased during in-place resizing to prevent application issues

Pod statuses during resizing

During in-place resizing operations, you may observe certain Pod statuses that appear concerning but are part of normal Kubernetes behavior. These statuses occur due to the asynchronous nature of in-place resizing and standard Kubernetes scheduling.

Normal operating statuses

`PodResizePending`

The PodResizePending status indicates that a resize request has been submitted to Kubernetes but hasn't yet been applied. This is expected behavior for in-place resizing, which operates asynchronously.

When Workload Autoscaler submits a resize request:

The resize request enters a pending state while Kubernetes evaluates Node capacity
If the Node has sufficient resources, the resize completes and the Pod continues running
If the resize remains pending due to resource constraints, Workload Autoscaler takes action based on your configured mode for applying changes:
- Immediate mode: Triggers Pod eviction to complete the resize through a restart
- Deferred mode: Leaves the pending resize until the next natural Pod restart

Workload Autoscaler doesn't check Node capacity before submitting resize requests. Instead, it submits the resize and handles the result. Node capacity can change between submission and application, so this approach allows resizes to succeed when resources become available.

`OutOfCPU` and `OutOfMemory`

OutOfCPU and OutOfMemory statuses occur during Pod admission when kubelet rejects a Pod due to insufficient Node resources. These statuses indicate a timing issue during Kubernetes scheduling, not a problem with your workload configuration.

These statuses typically appear when:

Multiple Pods scale up simultaneously
The Kubernetes scheduler makes parallel scheduling decisions
Those scheduling decisions conflict, resulting in more Pods assigned to a Node than can fit

This is a normal race condition in Kubernetes scheduling. The scheduler includes optimizations for handling large Pod bursts, but occasionally these parallel decisions result in temporary admission failures.

Important: These are Failed Pod statuses that Kubernetes automatically handles through retry mechanisms. Provided your workloads have appropriate Pod Disruption Budgets (PDBs) configured, these transient failures won't impact application availability. Kubernetes will reschedule the affected Pods to suitable Nodes.

📘
Note
OutOfCPU and OutofMemory occur during admission for newly scheduled Pods, not for Pods already running on a Node during in-place resize operations.

For more information on Pod statuses during resize operations, see Kubernetes documentation.

Configuration

In-place resizing is automatically enabled for eligible workloads when running on Kubernetes v1.33+ clusters. No additional configuration is required at the workload level.

For clusters that support in-place resizing, you can control this behavior through Helm values:

# Enable in-place resizing (default: true)
helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler \
  -n castai-agent \
  --reset-then-reuse-values \
  --set inPlaceResizeEnabled=true

# Disable deferred in-place resizing specifically (default: true when in-place is enabled)
helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler \
  -n castai-agent \
  --reset-then-reuse-values \
  --set inPlaceResizeEnabled=true \
  --set inPlaceResizeDeferredEnabled=false

Limitations

In-place resizing has several limitations that Workload Autoscaler respects:

Resource types: Only CPU and memory can be resized in-place
Memory limits: Cannot be decreased without container restart
QoS class preservation: Original QoS class must be maintained
Container types: Non-restartable init containers and ephemeral containers cannot be resized in-place
Swap memory: Pods using swap cannot resize memory without a restart
Static policies: Pods with static CPU/memory manager policies cannot be resized in-place

For detailed information about Kubernetes in-place resizing limitations, see the official Kubernetes documentation.

When restarts occur

While in-place resizing reduces disruption, container restarts can still occur in the following scenarios:

Resize policy configuration: Containers with RestartContainer resize policy for specific resources will restart when those resources change. By default, Kubernetes applies NotRequired for both CPU and memory, but certain configurations or resource changes may require restarts.
Resource constraints: When the node cannot immediately accommodate the resize request, the request enters a Pending state. If the resize remains pending, Workload Autoscaler triggers a controller-based update (rollout), which causes pod restarts.
Incompatible changes: Some resource changes cannot be applied in-place due to Kubernetes limitations. For example, removing or significantly modifying CPU limits during the first recommendation application may require a restart.

These restart scenarios are handled gracefully by Workload Autoscaler, which respects Pod Disruption Budgets and follows the configured apply mode's rollout strategy.

Kubernetes v1.33+ Feature

Key capabilities

Prerequisites

How it works

Pod statuses during resizing

Normal operating statuses

PodResizePending

OutOfCPU and OutOfMemory

Note

Configuration

Limitations

When restarts occur

`PodResizePending`

`OutOfCPU` and `OutOfMemory`