In-Place Pod Resizing
Cast AI Workload Autoscaler supports Kubernetes in-place resource resizing for clusters running Kubernetes v1.33+, allowing resource adjustments that minimize pod disruption. In-place resizing attempts to apply resource changes without restarting pods, significantly reducing optimization overhead compared to traditional pod recreation. This feature allows workloads to have their CPU and memory requests dynamically adjusted while maintaining continuous operation in many scenarios.
Kubernetes v1.33+ FeatureIn-place pod resizing requires Kubernetes v1.33 or later with the
InPlacePodVerticalScalingfeature gate enabled (enabled by default in v1.33+).
Key capabilities
- Resource requests are adjusted with minimal disruption—often without pod restarts—maintaining application availability during optimization
- When in-place resizing cannot be applied immediately, the system gracefully falls back to the configured apply mode (immediate or deferred pod restart)
- Automatic protection against memory limit reduction during downscaling to prevent potential application issues, following Kubernetes best practices
- In-place resizing complements existing immediate and deferred scaling modes with reduced disruption
- Leverages the standard Kubernetes in-place resizing API for compatibility across different cluster configurations
This is particularly valuable for production workloads where minimizing disruption is critical, while still achieving the cost and performance benefits of continuous resource optimization.
Learn more about the benefits of in-place rightsizing on our blog How In-Place Pod Resizing Works in Kubernetes and Why Cast AI Makes It Better.
Prerequisites
For in-place resizing to work with Workload Autoscaler:
- Kubernetes version: v1.33 or later
- Operating system: Linux nodes only (Windows pods not supported)
- Quality of Service: Pods must maintain their original QoS class (Guaranteed, Burstable, or BestEffort)
- Workload Autoscaler version: v0.53.0 (Helm chart v0.1.122) or later for full support
How it works
When Workload Autoscaler generates a recommendation for a pod that supports in-place resizing:
- Compatibility check: The system verifies that the pod, containers, and cluster support in-place resizing
- In-place attempt: If compatible, attempts to adjust CPU and memory requests directly on the running pod
- Restart scenarios: Container restarts may occur when:
- The container's resize policy requires a restart for the resource being changed (typically memory)
- The resize enters a pending state due to resource constraints
- The recommendation includes changes that cannot be applied in-place (such as removing CPU limits)
- Graceful fallback: If in-place resizing fails or isn't supported, the system falls back to the configured apply mode (immediate or deferred pod restart)
- Memory protection: Memory limits are not decreased during in-place resizing to prevent application issues
Pod statuses during resizing
During in-place resizing operations, you may observe certain Pod statuses that appear concerning but are part of normal Kubernetes behavior. These statuses occur due to the asynchronous nature of in-place resizing and standard Kubernetes scheduling.
Normal operating statuses
PodResizePending
PodResizePendingThe PodResizePending status indicates that a resize request has been submitted to Kubernetes but hasn't yet been applied. This is expected behavior for in-place resizing, which operates asynchronously.
When Workload Autoscaler submits a resize request:
- The resize request enters a pending state while Kubernetes evaluates Node capacity
- If the Node has sufficient resources, the resize completes and the Pod continues running
- If the resize remains pending due to resource constraints, Workload Autoscaler takes action based on your configured mode for applying changes:
- Immediate mode: Triggers Pod eviction to complete the resize through a restart
- Deferred mode: Leaves the pending resize until the next natural Pod restart
Workload Autoscaler doesn't check Node capacity before submitting resize requests. Instead, it submits the resize and handles the result. Node capacity can change between submission and application, so this approach allows resizes to succeed when resources become available.
OutOfCPU and OutOfMemory
OutOfCPU and OutOfMemoryOutOfCPU and OutOfMemory statuses occur during Pod admission when kubelet rejects a Pod due to insufficient Node resources. These statuses indicate a timing issue during Kubernetes scheduling, not a problem with your workload configuration.
These statuses typically appear when:
- Multiple Pods scale up simultaneously
- The Kubernetes scheduler makes parallel scheduling decisions
- Those scheduling decisions conflict, resulting in more Pods assigned to a Node than can fit
This is a normal race condition in Kubernetes scheduling. The scheduler includes optimizations for handling large Pod bursts, but occasionally these parallel decisions result in temporary admission failures.
Important: These are Failed Pod statuses that Kubernetes automatically handles through retry mechanisms. Provided your workloads have appropriate Pod Disruption Budgets (PDBs) configured, these transient failures won't impact application availability. Kubernetes will reschedule the affected Pods to suitable Nodes.
Note
OutOfCPUandOutofMemoryoccur during admission for newly scheduled Pods, not for Pods already running on a Node during in-place resize operations.
For more information on Pod statuses during resize operations, see Kubernetes documentation.
Configuration
In-place resizing is automatically enabled for eligible workloads when running on Kubernetes v1.33+ clusters. No additional configuration is required at the workload level.
For clusters that support in-place resizing, you can control this behavior through Helm values:
# Enable in-place resizing (default: true)
helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler \
-n castai-agent \
--reset-then-reuse-values \
--set inPlaceResizeEnabled=true
# Disable deferred in-place resizing specifically (default: true when in-place is enabled)
helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler \
-n castai-agent \
--reset-then-reuse-values \
--set inPlaceResizeEnabled=true \
--set inPlaceResizeDeferredEnabled=falseLimitations
In-place resizing has several limitations that Workload Autoscaler respects:
- Resource types: Only CPU and memory can be resized in-place
- Memory limits: Cannot be decreased without container restart
- QoS class preservation: Original QoS class must be maintained
- Container types: Non-restartable init containers and ephemeral containers cannot be resized in-place
- Swap memory: Pods using swap cannot resize memory without a restart
- Static policies: Pods with static CPU/memory manager policies cannot be resized in-place
For detailed information about Kubernetes in-place resizing limitations, see the official Kubernetes documentation.
When restarts occur
While in-place resizing reduces disruption, container restarts can still occur in the following scenarios:
- Resize policy configuration: Containers with
RestartContainerresize policy for specific resources will restart when those resources change. By default, Kubernetes appliesNotRequiredfor both CPU and memory, but certain configurations or resource changes may require restarts. - Resource constraints: When the node cannot immediately accommodate the resize request, the request enters a
Pendingstate. If the resize remains pending, Workload Autoscaler triggers a controller-based update (rollout), which causes pod restarts. - Incompatible changes: Some resource changes cannot be applied in-place due to Kubernetes limitations. For example, removing or significantly modifying CPU limits during the first recommendation application may require a restart.
These restart scenarios are handled gracefully by Workload Autoscaler, which respects Pod Disruption Budgets and follows the configured apply mode's rollout strategy.
Updated about 15 hours ago
