Rightsizing recommendations and Workload Autoscaling

How long does it take to start generating enough data to create useful recommendations?

30 minutes is enough for CAST AI to start creating valuable recommendations.



Could you please clarify whether pods will still need to restart after the optimization is applied?

Resizing support is on its way.

The K8s feature will be available from 1.27 in alpha and 1.28 in beta, read more:Kubernetes 1.27: In-place Resource Resize for Kubernetes Pods (alpha).



Does our automatic workload rightsizing take HPA into account?

No, CAST AI doesn't take HPA into account.



With Workload Autoscaler, is the castai-agent VPA still needed?

Currently, it's still needed. The agent is a special case for Workload Autoscaler since this is where it gets its metrics. If the agent dies, woop is 'blind', while for any other workload in the cluster, woop would eventually upscale it based on incoming metrics.



Is Workload Autoscaler that takes HPA into account on the roadmap? Or is there a potential workaround?

Workload Autoscaler will soon implement a way to scale pods not only vertically based on resource usage but also horizontally.



Are there any plans for Workload Autoscaler to manage DaemonSets and StatefulSets?

Yes, that is part of our roadmap and one of our milestones.



If I turn on Workload Autoscaler for a deployment and set CPU overhead to 1% and memory overhead to 10%, will it automatically add more pods to the deployment?

Currently, Workload Autoscaler works by modifying Pod Request on Pod recreation (scheduling), not modifying the deployment. We recommend using KEDA for horizontal scaling with custom metrics.



Do we have a way to support Workload Autoscaler declaratively on the workload and namespace level ?

Currently, CAST AI doesn't support this.



Is it possible to adjust limits with Workload Autoscaler?

We currently set the memory limits = 1.5 * requests. This gives our platform time to detect any memory increase in the workload and adjust recommendations before it gets Out-Of-Memory killed.



What does red "!" mean on the workload autoscaler page?

The error usually occurs when the CAST AI workload autoscaler fails to perform an action during workload optimization, often due to an unresponsive cluster controller.



What time range does the workload autoscaler use for its recommendations?

The workload autoscaler bases its recommendations on metrics collected over 24 hours. It continually observes pod metrics exposed by the metrics-server to generate recommendations.



Does the workload autoscaler allow configuring weights for specific days in its recommendations?

Currently, the workload autoscaler does not allow you to set weights for certain days or exclude weekends. It continuously gathers metrics and generates recommendations based on historical usage over the past 24 hours.



Can I exclude specific days, such as weekends, from the workload autoscaler to turn it off during those times?

No days are excluded from the workload autoscaler's observations. It gathers metrics continuously from the time it's installed and builds confidence in its recommendations over time, using historical usage data from the past 24 hours.



What happens when you turn off workload autoscaler on a workload?

The workload autoscaler will revert to the old requests set in the deployment manifest.



Which logs are available for the workload autoscaler, and how can we verify that it is functioning properly for an enabled application?

You can check the event logs for any recent entries related to WOOP operations or errors. Additionally, you can review the logs from the workload-autoscaler pod to observe WOOP’s performance and any issues it may be encountering during its operation.



What is the impact of annotations on UI configuration settings?

When annotations are applied to a workload, the corresponding UI configuration settings are disabled. If someone attempts to update the UI after annotations have been applied, those UI changes will not take effect.



Does Woop process applications differently on spot instances versus on-demand instances?

No, WOOP processes applications similarly on spot and on-demand instances.



Can WOOP be applied to workloads running on Auto Scaling Group (ASG) nodes?

Yes, WOOP can be applied to applications running on ASG nodes



How frequently can WOOP change a recommendation within an hour?

WOOP can change a recommendation up to 240 times per hour, approximately every 30 seconds.



What settings are available to configure the workload autoscaler from the console?

The workload autoscaler can be configured with the following settings:

  • Automation: Toggle on/off to apply recommendations automatically or just generate them.
  • Scaling Policy: Choose a policy from the available options.
  • Recommendation Percentile: Set the target percentile for a recommendation based on the last day’s usage.
  • Overhead: Set additional resource allocation, defaulting to 10% for memory and 0% for CPU.
  • Optimization Threshold: Define the minimum difference required between current and recommended requests for immediate application.
  • Workload Autoscaler Constraints: Establish minimum and maximum resource limits.


How long will a CPU run high before it is scaled up?

Under normal conditions, scaling up occurs every 30 minutes. During a surge (CPU above percentile settings, OOM, change settings), scaling happens immediately, subject to the Optimization Threshold.



What happens if WOOP detects an Out Of Memory (OOM) condition?

When an OOM condition is detected, Woop immediately increases the memory overhead by a fixed amount.



Does WOOP support sidecar pods?

WOOP does not currently support sidecar pods by default. This decision was made to avoid complexity, as sidecar containers can be dynamically added, and we do not always detect their addition in real time. However, support for sidecar pods can be added per customer request.