Node-aware DaemonSet sizing

📣

Early Access Feature

This feature is in early access. It may undergo changes based on user feedback and continued development. We recommend testing in non-production environments first and welcome your feedback to help us improve.

Some workloads — most notably DaemonSets, but also any pod whose footprint is naturally proportional to the node it runs on (log shippers, metric agents, CSI drivers, service meshes) — are hard to size with a single static request. A request that fits a small node wastes capacity on a large one, and a request that fits a large node gets evicted on a small one.

The node allocatable percentage resource strategy lets you express requests as a percentage of the node's allocatable CPU and memory, resolved at pod admission time. Workload Autoscaler reads the percentage from the workload's resource recommendation, looks up the node the pod is bound to, and rewrites the pod's requests and limits to a concrete value before the pod is admitted.

The strategy is currently only configurable via workload annotations. Scaling policy and UI support will follow.

How it works

When you enable the strategy on a workload, Workload Autoscaler:

  1. Picks up the configuration on the next reconciliation and stores the percentages alongside the workload's existing recommendation, together with a static fallback request and limit derived from observed usage.
  2. As each pod is admitted, reads the node assigned to the pod and computes the request as a percentage of that node's allocatable CPU and memory:
    • cpu request = node allocatable CPU × cpuPercent / 100
    • memory request = node allocatable memory × memoryPercent / 100
  3. Clamps the resolved request to the workload's min and max constraints (if set), then derives the limit from the clamped request using the workload's existing limit strategy (multiplier, noLimit, or maintainRatio in annotations).
  4. Stamps the resolved request and limit into the pod spec.

If the target node cannot be determined at admission — for example, when an unbound pod that doesn't pin a node via nodeAffinity is admitted before scheduling — Workload Autoscaler applies the static fallback request and limit instead. Pods are never rejected.

The strategy fully overrides the requests and limits that Workload Autoscaler would otherwise apply for the configured resource.

⚠️

Not compatible with keepLimits

If a workload's policy keeps existing limits (keepLimits in annotations) for CPU or memory, the percentage strategy is not applied to that resource — the workload falls back to the recommender's request and the manifest's original limit. Use multiplier, noLimit, or maintainRatio instead.

Compatibility

Minimum component versions required for node-aware DaemonSet sizing:

ComponentMinimum version
castai-workload-autoscalerv0.105.0
castai-agentv0.123.1
📘

Note

The percentage is resolved at pod admission. Existing pods keep whatever requests and limits they were admitted with — they pick up new values the next time they're recreated.

Configuration

Configure the strategy via the workloads.cast.ai/configuration annotation on the workload (Deployment, StatefulSet, DaemonSet, etc.):

metadata:
  annotations:
    workloads.cast.ai/configuration: |
      vertical:
        resourceStrategy:
          type: nodeAllocatablePercentage
          nodeAllocatablePercentage:
            cpuPercent: 5
            memoryPercent: 2

You can set cpuPercent only, memoryPercent only, or both. The resource that is not configured falls back to whatever the rest of the Workload Autoscaler policy dictates (target utilization, recommender output, and so on).

Limits are derived from the resolved request using the workload's existing CPU and memory limit strategy. See Annotations reference for the available limit options.

Settings reference

FieldTypeRequiredRangeDescription
resourceStrategy.typestringYesnodeAllocatablePercentageSelects the strategy.
resourceStrategy.nodeAllocatablePercentage.cpuPercentfloatOne of cpuPercent/memoryPercent is required(0, 100]Percentage of node-allocatable CPU to request.
resourceStrategy.nodeAllocatablePercentage.memoryPercentfloatOne of cpuPercent/memoryPercent is required(0, 100]Percentage of node-allocatable memory to request.

Example

Consider a DaemonSet annotated to request 5% of node-allocatable CPU and 2% of node-allocatable memory, with limits set to 1.5× the resolved request:

metadata:
  annotations:
    workloads.cast.ai/configuration: |
      vertical:
        resourceStrategy:
          type: nodeAllocatablePercentage
          nodeAllocatablePercentage:
            cpuPercent: 5
            memoryPercent: 2
        cpu:
          limit:
            type: multiplier
            multiplier: 1.5
        memory:
          limit:
            type: multiplier
            multiplier: 1.5

On a 16 vCPU / 64 GiB node, each pod is sized to:

ResourceRequestLimit
CPU800m1200m
Memory~1.28 GiB~1.92 GiB

On a 4 vCPU / 16 GiB node, each pod is sized to:

ResourceRequestLimit
CPU200m300m
Memory~327 MiB~491 MiB

The same workload definition produces different requests on different nodes — no extra configuration, no per-node policies.

Limitations

  • CPU and memory only. Ephemeral storage is not supported.
  • Not compatible with keepLimits for the same resource. Workload Autoscaler rejects configurations that combine the percentage strategy with keepLimits on the same resource. Use multiplier, noLimit, or maintainRatio instead.
  • Target node must be known at admission. Workload Autoscaler only resolves the percentage when the pod has spec.nodeName set, or when a nodeAffinity rule pins it to a single node by metadata.name. DaemonSet pods always meet this requirement — the DaemonSet controller fills in spec.nodeName before the pod is admitted. For Deployments, StatefulSets, and other workloads where the pod is admitted before scheduling, the pod gets the static fallback request and limit unless it pins a node via nodeAffinity.
  • Resolved per pod, not retroactively. Resizing a node in place does not resize the pods already running on it — they pick up new values the next time they're recreated.
  • Annotations only. There is no UI or scaling policy support yet; configure the strategy on each workload directly.

Troubleshooting

  • Pod has the static fallback request, not the percentage-derived one. Workload Autoscaler couldn't determine the target node at admission. Confirm the pod has spec.nodeName set or a nodeAffinity rule that pins it to a single node by metadata.name. Check the Workload Autoscaler logs for fallback messages.
  • The percentage strategy isn't being applied at all. Check that the workload doesn't combine the strategy with keepLimits on the same resource — if it does, Workload Autoscaler falls back to the standard recommendation for that resource. Either remove keepLimits or remove the percentage for that resource.
  • Limits look higher than expected. Limits are derived from the resolved request — that is, after min/max constraints are applied. If you set a min of 500m and a limit multiplier of 1.5, a node where 5% would be 200m still produces a request of 500m and a limit of 750m.

See also