Subnet selection

Feature availability

EKSGKEAKS
Random subnet selection (from available set)+--
Subnet selection by usage*+**--

📘

Note

*** Select the subnet with the greatest number of free IP addresses

** Available if AWS cloud CNI is used with specific settings

Available subnets detection

Cluster subnets with subnet IP CIDR and availability zone are synced periodically with Cast AI. The autoscaler, based on various rules, decides from which subnets to choose when constructing in-memory nodes for autoscaling. The selection is influenced by:

  • Pod node selector for topology.cast.ai/subnet-id label.
  • Pod node affinity for topology.cast.ai/subnet-id label.
  • Availability zone on in-memory node(would choose only subnets in the same zone).
  • Choose a zone based on constraints from other parts—e.g., the Persistent Volume zone affects the in-memory node zone.
  • Choose the least allocated zone (by CPU cores). If IP allocation calculation is enabled, also consider subnet IP availability.
  • Choose a random zone from equally allocated zones.

If subnet calculation is supported and we detect that all the available subnets are full, the pod will get a pod event with a message there is no subnet with enough available IP addresses.

Zone selection and node distribution

How zone allocation works

When the autoscaler selects availability zones for new nodes, it follows this priority order:

  1. Primary factor: Total CPU core allocation - zones with fewer allocated CPU cores are preferred
  2. IP availability check: For EKS clusters with subnet usage calculation enabled, Cast AI verifies that the target zone's subnets have sufficient available IP addresses for the expected pod density
  3. Fallback mechanism: If the least allocated zone lacks sufficient IP capacity, that zone is removed from the available pool, and the next least allocated zone is evaluated
  4. Final fallback: If no zones have adequate IP capacity, node provisioning will be aborted

This means:

  • A zone with fewer CPU cores allocated will be preferred, but only if it has sufficient IP addresses
  • Zone selection automatically falls back through available zones based on CPU allocation until one with adequate IP capacity is found
  • The system prevents provisioning nodes that cannot support the required number of pods due to IP constraints

Understanding zone imbalances

Zone imbalances can occur due to several factors:

  1. Large node bias: Zones with fewer large nodes may appear less balanced by node count but are actually more balanced by resource allocation
  2. StatefulSet volume constraints: Persistent volumes are zone-specific, which can force related pods (and their bin-packed companions) into the same availability zone
  3. IP allocation considerations: While CPU allocation is the primary factor, subnet IP availability also influences zone selection
  4. IP capacity constraints: Zones may be skipped if their subnets lack sufficient IP addresses, concentrating nodes in zones with higher IP availability

Achieving better zone balance

If you need more even node distribution across zones, consider:

  1. Topology Spread Constraints: Add topology spread constraints to your pod specifications to explicitly request zone distribution:
    topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app: your-app
    

Subnet usage calculation

Subnet usage calculation is available only for EKS and when AWS Cloud CNI is used for networking.

Subnet usage is calculated based on CNI settings and instance-type networking capabilities(max ENI count on instance type and ipv4 count per ENI).

CNI settings used to calculate used IP addresses:

NameDescriptionDefault
WARM_ENI_TARGETHow many free ENIs should be attached as reserve.1
WARM_IP_TARGETHow many free secondary IPs should be kept as reserve.
MINIMUM_IP_TARGETMinimum IP count to be requested when adding node.
MAX_ENIAdditional capping on instance type max ENI count.

CNI settings that disable subnet usage calculation:

NameSupported values
AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFGNone or false
ENABLE_POD_ENINone or false
ENABLE_PREFIX_DELEGATIONNone or false
ENABLE_IPv6None or false

📘

Zone Selection Impact

When subnet usage calculation is enabled, IP availability directly affects zone selection. Zones with insufficient IP capacity are automatically removed from consideration during the autoscaling process, which may contribute to zone imbalances if some zones consistently have lower IP availability than others.

How does the calculation work

The source of documentation is here.

Each instance type in AWS has limits on how many ENIs can be attached and how many IPs each ENI can have. Cast AI periodically synchronizes those numbers, which this algorithm uses.

Some key points:

  • Each ENI uses 1 IP for itself, and all other IPs are secondary and can be used for pods, so always (max IPs for pods in ENI = max IPs per ENI - 1).
  • If we just attach ENI, 1 IP will always be used regardless of CNI settings.
  • If WARM_IP_TARGET is specified WARM_ENI_TARGET is not used.
  • If MAX_ENI < instances max ENI count, it works as an override for instance setting. Otherwise, instance setting is used.
  • All the pods that have hostNetwork:true don't get secondary IP and host IP is used for communication (as an example AWS CNI and kube-proxy) PS. They are still counted as PODs and are capped by the podCount constraint on the node.

Here is a detailed description of how WARM_ENI_TARGET, WARM_IP_TARGET, and MINIMUM_IP_TARGET work.

Troubleshooting

Useful commands for investigations

Command to get subnet IP allocation—We consider that the subnet is used only for this Kubernetes cluster (some worker groups or security groups might use some IPs if they were created with this subnet, and this could result in a few IPs difference between calculation and actual allocation, bringing failed node creation instead of pod event in some edge cases). Using the same subnets for anything else than this cluster will make this feature work incorrectly.

aws ec2 describe-network-interfaces --filters Name=subnet-id,Values=subnet_id > subnet_id.yaml

Command to print all pods with IP information, sorted by node.

kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName,POD-IP:.status.podIP,HOST-IP:.status.hostIP --sort-by=.spec.nodeName  --all-namespaces

Investigating zone imbalances

If you notice uneven node distribution across zones:

  1. Check CPU allocation per zone:
    kubectl get nodes -o wide --show-labels | grep topology.kubernetes.io/zone
    kubectl describe nodes | grep -E "(Name:|cpu:|zone)"
    
  2. Identify StatefulSets with persistent volumes:
    kubectl get pvc --all-namespaces -o wide
    kubectl get statefulsets --all-namespaces
    
  3. Review large nodes: Look for nodes with significantly higher CPU allocations that might be skewing zone selection.

Common causes of zone imbalance

  • Large instance types: A single large node can make a zone appear "more allocated" than zones with multiple smaller nodes
  • Volume constraints: StatefulSets with zone-specific volumes can concentrate workloads in particular zones
  • Bin-packing effects: When pods are packed together with volume-constrained pods, they inherit zone restrictions
  • IP exhaustion: Zones with insufficient subnet IP addresses are automatically excluded from consideration