Managing DaemonSets with CAST AI
Learn about the implications of making changes to DaemonSets and its effects on existing nodes.
Background
A DaemonSet in Kubernetes ensures that a specific pod runs on all or selected nodes (using Node Selectors and Node Affinity) in a cluster. It's typically used for background tasks like logging, monitoring, or networking.
Generally, CAST AI aims to bin-pack pods as tightly as possible into as few nodes as possible, which can present challenges when increasing DaemonSet requests or adding new DaemonSets.
The problem
When you change a DaemonSet's container requests, the DaemonSet controller starts a rollout. Here's an example flow:
- Node identified: The DaemonSet controller identifies a node that needs an updated pod.
- Delete old pod: The existing pod on that node is deleted.
- Create a new pod: With the updated container requests, a new pod is created on the same node (using node affinity to ensure it is scheduled on the correct node).
- Repeat for each node: This process is repeated sequentially for all nodes where the DaemonSet is running.
Imagine your nodes have 99% CPU or memory utilization. There's a high chance that when you increase the requests, the new DaemonSet pods won't fit and will stay in the Pending state. If your DaemonSets are providing critical functionality, you might experience downtime.
The same applies to new DaemonSets. New pods might not fit into existing nodes if their resource utilization is high.
Prerequisites
- Basic understanding of Kubernetes DaemonSets and resource management
- Familiarity with CAST AI's rebalancing feature
- Access to modify cluster resources and CAST AI settings
Solution 1: Rebalancing
One possible solution is to rebalance your cluster or just the nodes where the DaemonSets don't fit. CAST AI takes DaemonSet requests into consideration and will create the right-sized nodes to accommodate the new or changed DaemonSet pods.
This solution is viable if you're dealing with a new DaemonSet or the DaemonSet isn't critical and you can tolerate some pods being unavailable temporarily.
Solution 2: Using priority classes
Another solution is a little more complex but is suitable for situations where you can't afford your DaemonSet pods going down - adding a system-cluster-critical
priority class to your DaemonSet. If the DaemonSet pods don't fit when recreated, the Scheduler will evict lower-priority class pods to make room for the DaemonSet.
First, you have to define a ResourceQuota
that allows your pods to utilize a priority class:
apiVersion: v1
kind: ResourceQuota
metadata:
name: critical-daemonsets
namespace: your-namespace
spec:
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values:
- system-cluster-critical
Then, you can add it to your DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: critical-daemonset
namespace: your-namespace
spec:
selector:
matchLabels:
app.kubernetes.io/name: critical-daemonset
template:
metadata:
labels:
app.kubernetes.io/name: critical-daemonset
spec:
priorityClassName: system-cluster-critical
containers:
- image: nginx
name: nginx
resources:
limits:
cpu: 500m
memory: 128Mi
requests:
cpu: 500m
memory: 128Mi
Conclusion
Managing DaemonSet resources in a CAST AI-optimized cluster requires considering the impact on Node utilization and overall cluster efficiency. Whether you choose to rebalance your cluster or use priority classes, monitoring the effects of these changes and adjusting your strategy as needed is crucial.
Whichever solution you choose, adding new DaemonSets or changing the resources of existing ones can lead to cluster inefficiencies. Rebalancing the cluster after such changes is always recommended to ensure that your nodes are right-sized.
Updated 2 months ago
Read more about our Rebalancing or Autoscaling features; or brush up on Kubernetes concepts referenced in this article.