Pod mutations
Pod mutations
What are pod mutations?
Pod mutations is a Cast AI feature that simplifies Kubernetes workload configuration and helps optimize cluster resource usage. It allows you to define templates that automatically modify pod specifications when they are created, reducing manual configuration overhead and ensuring consistent pod scheduling across your cluster.
Why use pod mutations?
Managing Kubernetes workloads at scale presents several challenges:
-
Complex Configuration Requirements: As clusters grow, manually configuring pod specifications becomes increasingly time-consuming and error-prone. Each workload may need specific labels, tolerations, and node selectors to ensure proper scheduling and resource allocation.
-
Legacy System Integration: When onboarding existing clusters to Cast AI, workloads sometimes need to be reconfigured to take full advantage of cost optimization features. This traditionally requires updating deployment manifests, which can be automated using pod mutations.
-
Resource Fragmentation: Without standardized pod configurations, clusters can become fragmented with too many node groups, leading to inefficient resource utilization and increased costs.
Pod mutations address all of these challenges.
How it works
Pod mutations allow you to define templates that automatically modify pod specifications when they are created. These templates can:
- Apply labels and tolerations
- Configure node selectors and affinities
- Link pods to specific Node Templates
- Consolidate multiple Node Templates
- Set Spot Instance preferences
The pod mutations controller, called the pod mutator, runs in your cluster and monitors pod creation events. When a new pod matches a mutation's configured filters, the controller automatically applies that mutation. Note that only one mutation can be applied to any given pod - if multiple mutations match a pod's filters, the most specific filter match will be used.
Installation
Install using the console
- Upon selecting a cluster from the cluster list, head over to Autoscaler --> Pod mutations in the sidebar.
- If you have not installed the
pod-mutator
controller, you will be prompted with a script you need to run in your cluster's cloud shell or terminal.
Install using Helm
- Add the Cast AI Helm repository:
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
- Install the pod mutations controller:
helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-agent pod-mutator \
castai-helm/castai-pod-mutator \
--set castai.apiUrl="https://api.cast.ai" \
--set castai.apiKey="${API_KEY}" \
--set castai.clusterID="${CLUSTER_ID}"
Advanced installation options
The pod mutator supports the configuration of its webhook reinvocation policy. This controls whether the pod mutator should be reinvoked if other admission plugins modify the pod after the initial mutation.
helm upgrade -i --create-namespace -n castai-agent pod-mutator \
castai-helm/castai-pod-mutator \
--set castai.apiUrl="https://api.cast.ai" \
--set castai.apiKey="${API_KEY}" \
--set castai.clusterID="${CLUSTER_ID}" \
--set webhook.reinvocationPolicy="IfNeeded" # Set to "Never" by default
The reinvocationPolicy
can be set to:
Never
(default): The pod mutator will only be called once during pod admissionIfNeeded
: The pod mutator may be called again if other admission plugins modify the pod after the initial mutation
Setting reinvocationPolicy
to IfNeeded
is useful when you have multiple admission webhooks that may interact with each other. For example:
- Pod mutator adds its mutations
- Another webhook modifies the pod
- Pod mutator is invoked again to ensure its mutations are properly applied
⚠️ However, if you want changes made by other webhooks to persist, setting reinvocationPolicy
to IfNeeded
may be counterproductive since the pod mutator will override any modifications that fall under its control when it's reinvoked. Consider your specific use case and the interaction between different webhooks in your cluster before changing this setting from its default value.
Creating pod mutations
Pod mutations are either defined through the PodMutations API or created using the Cast AI console. Each mutation consists of:
- A unique name
- Object filters to select targeted pods
- Mutation rules defining what changes to apply
- Node Template configurations (optional)
- Spot instance preferences (optional)
Console example
After installing the pod-mutator
controller in your cluster, you'll have access to the pod mutations in your console:
To create a new mutation template, simply click on Add template in the top-right, which will open the drawer in which you can define the configuration of your mutation.
- Begin by giving your mutation template a name:

- Then, define the filters by which the mutation candidate pods ought to be discovered by the controller:

- Configure your desired mutation configuration:

- Finally, choose the spot settings most appropriate for this template before hitting Create:

The console UI offers a helping hand when creating mutation by means of tooltips and a live preview of what your configuration will look like:

API example
Here's an example pod mutation API request that applies labels and tolerations to specific workloads:
{
"objectFilter": {
"names": [
"app1",
"app2"
],
"namespaces": [
"production"
]
},
"labels": {
"environment": "production"
},
"spotType": "UNSPECIFIED_SPOT_TYPE",
"name": "production-mutation",
"organizationId": "fhytif73-f95f-44de-ad4b-f7898ce5ee42",
"clusterId": "11111111-1111-1111-1111-111111111111",
"enabled": true,
"tolerations": [
{
"key": "scheduling.cast.ai/node-template",
"operator": "Equal",
"value": "production-template",
"effect": "NoSchedule"
}
]
}
Use the CreatePodMutation endpoint to experiment with your own pod mutations via API.
Node Template consolidation
One powerful feature of pod mutations is the ability to consolidate multiple Node Templates. This helps reduce cluster fragmentation by allowing pods to schedule across multiple Node Template configurations.
When consolidating Node Templates:
- Specify the Node Templates to consolidate
- The controller converts individual node selectors and tolerations into node affinity rules
- Pods can then schedule on any node created by the specified templates
Example consolidation configuration:
{
"objectFilter": {
"namespaces": [
"production"
]
},
"name": "production-mutation",
"nodeTemplatesToConsolidate": [
"template-1",
"template-2"
]
}
Spot Instance configuration
Pod mutations support three Spot Instance modes, and they can specify a percentage-based distribution between Spot and On-demand instances.

Spot Distribution Percentage
When configuring spot settings for your pod mutations, you can specify what percentage of pods should receive spot-related configuration versus remaining on on-demand instances.
- A setting of 50% means that approximately half of your pods will receive the selected spot configuration (optional, preferred, or use-only), while the other half will be scheduled on on-demand instances
- The higher the percentage, the more pods will receive spot-related configurations
- The lower the percentage, the more pods will remain on on-demand instances
For example, with a 75% spot distribution setting:
- 75% of pods will be scheduled according to your chosen spot behavior (optional, preferred, or use-only)
- 25% of pods will be scheduled on on-demand instances
The mutation controller makes this determination when pods are created, applying spot-related mutations to the configured percentage of pods while leaving the remainder configured for on-demand instances.
Spot Distribution Options
Combined with the distribution percentage, you can select one of three spot behavior options:
Mode | Description |
---|---|
Spot Instances are optional | Allows scheduling on either Spot or On-Demand instances to fulfill the selected Spot percentage. There is no preference between instance types if both are available. |
Use only Spot Instances | Strictly maintains the selected Spot/On-Demand ratio. If Spot Instances are unavailable, deployment will fail for the Spot portion. |
Spot Instances are preferred | Targets the selected Spot percentage with Spot Instances but automatically falls back to On-Demand instances if Spot becomes unavailable. Will attempt to rebalance back to Spot when available. |
Note
The actual distribution may vary slightly from the configured percentage, especially with small pod counts or simultaneous pod creation. The distribution may also drift over time as pods are deleted and recreated through the normal application lifecycle. But the difference should be very minimal.
Example Configuration
{
"name": "production-spot-mutation",
"organizationId": "org-12345",
"clusterId": "cluster-67890",
"enabled": true,
"objectFilter": {
"namespaces": [
"production"
]
},
"spotType": "PREFERRED_SPOT",
"spotDistributionPercentage": 75
}
This configuration creates a mutation that:
- Applies to all pods in the "production" namespace
- Sets 75% of pods to use Spot Instances with fallback to On-demand if unavailable
- Keeps 25% of pods on On-demand instances at all times
Combining percentage-based distribution with different spot behavior options allows you to create deployment strategies that balance cost savings with application reliability requirements.
Advanced Configuration with JSON Patch
The Pod Mutations feature supports advanced configuration using JSON Patch, allowing for precise control over pod specifications beyond what's available through the standard UI options.
What is JSON Patch?
JSON Patch is a format for describing changes to a JSON document, defined in RFC 6902. In Kubernetes, it allows for complex modifications to pod specifications through a series of operations such as add, remove, replace, move, copy, and test.
For more information about JSON Patch operations, refer to Kubernetes documentation.
When to Use JSON Patch
Consider using JSON Patch when:
- You need to modify parts of a pod specification not covered by the standard UI options
- You want to perform multiple transformations in a specific order
- You're implementing complex mutation logic that combines adding, removing, and modifying fields
- You need to remove specific elements from arrays or nested structures
Configuring JSON Patch
To configure a JSON Patch:
-
In the pod mutation configuration, expand the "JSON Patch (advanced)" section:
-
Enter your JSON Patch operations in the drawer editor
-
Review the patch for errors before applying
Warning
JSON Patch operations take precedence over UI-defined settings. If there's a conflict between your patch operations and UI configurations, the patch operations will be applied.
JSON Patch Structure
A JSON Patch consists of an array of operations, where each operation is an object with the following properties:
op
: The operation to perform (add
,remove
,replace
,move
,copy
, ortest
)path
: A JSON pointer to the location in the document where the operation is performedvalue
: The value to use for the operation (foradd
andreplace
)from
: A JSON pointer for themove
andcopy
operations
Common Examples
Add Node Selector
When you need to ensure pods are scheduled on specific nodes with certain characteristics, adding a node selector is the way to go:
[
{
"op": "add",
"path": "/spec/nodeSelector",
"value": {
"scheduling.cast.ai/node-template": "high-performance"
}
}
]
This patch adds a node selector that directs pods to nodes using the "high-performance"
node template, which might be optimized for CPU-intensive workloads.
Replace Toleration Effect
If you need to modify how pods tolerate node taints, you can replace specific fields within existing tolerations:
[
{
"op": "replace",
"path": "/spec/tolerations/0/effect",
"value": "NoSchedule"
}
]
This patch changes the effect of the first toleration to "NoSchedule"
, ensuring pods won't be scheduled on nodes with matching taints rather than potentially being evicted later.
Remove a Specific Array Element
Sometimes, you need to remove specific configuration elements that are no longer needed or might conflict with your intended setup:
[
{
"op": "remove",
"path": "/spec/tolerations/2"
}
]
This patch removes the third(0, 1, 2, [...]) toleration in the tolerations array, which might be necessary when transitioning workloads between different node types or environments.
Remove by Key-Value Match
When you need to clean up specific labels or selectors that are no longer relevant:
[
{
"op": "remove",
"path": "/spec/nodeSelector/environment"
}
]
This patch removes only the "environment"
selector while preserving other nodeSelector
entries.
Remove a Specific Value from an Array
For more complex scenarios where you need to target array elements based on their content rather than position:
[
{
"op": "test",
"path": "/spec/tolerations/0/key",
"value": "node-role.kubernetes.io/control-plane"
},
{
"op": "remove",
"path": "/spec/tolerations/0"
}
]
This two-step patch first verifies that the first toleration matches a specific control plane role, then removes it if the test passes.
Complex Example: Replace Node Affinity and Add Toleration
For comprehensive pod scheduling adjustments that require multiple coordinated changes:
[
{
"op": "remove",
"path": "/spec/affinity"
},
{
"op": "add",
"path": "/spec/nodeSelector",
"value": {
"scheduling.cast.ai/node-template": "custom-template"
}
},
{
"op": "add",
"path": "/spec/tolerations/-",
"value": {
"key": "scheduling.cast.ai/node-template",
"operator": "Equal",
"value": "custom-template",
"effect": "NoSchedule"
}
}
]
This multi-operation patch completely reconfigures pod scheduling by removing any existing node affinity rules, setting a node selector for a custom template, and adding a matching toleration.
JSON Patch Limitations
- JSON Patch operations apply to the pod template, not directly to the running pods
- Some Kubernetes fields are immutable and cannot be changed after creation
- Patches that would result in invalid pod specifications will be rejected
Best practices
-
Use meaningful names: Give mutations descriptive names that indicate their purpose so as not to have to look into the configuration to be able to tell a mutation's purpose.
-
Design mutually exclusive filters: Since only one mutation can be applied to a pod, design your filters to clearly separate which pods should receive which mutations. Avoid overlapping filters that could match the same pod.
-
Test in non-production: If possible, validate mutation behavior in a test environment.
-
Monitor changes: Review the effects of mutations through the Cast AI console to ensure desired outcomes.
Limitations
- Mutations only apply to newly created pods. Similarly, mutation changes don't affect existing pods until they are recreated.
- Only one mutation can be applied to a pod. When multiple mutations have matching filters for a pod, Cast AI selects the mutation with the most specific filter (for example, a filter on pod name is more specific than a filter on namespace). We recommend using mutually exclusive filters rather than relying on this specificity ranking.
- Some pod configurations cannot be modified. Refer to the information above on what can be modified. Anything that is not mentioned is beyond the scope of pod mutations now.
- Scaling policies are evaluated every 30 seconds. As a result, changes to resource requests or limits may not be applied immediately.
Troubleshooting
Verify controller status
Check if the pod-mutator
controller is running:
kubectl get pods -n castai-agent -l app=pod-mutator
Check controller logs
View logs for mutation activity:
kubectl logs -n castai-agent -l app=pod-mutator
Common issues
-
Mutations not applying: Verify the object filters match your pods, and the controller is running
-
Configuration conflicts: Check for conflicting mutations targeting the same pods
-
Invalid mutations: Ensure mutation specifications follow the correct format
-
Mutations not applying correctly with multiple webhooks: If you have multiple admission webhooks in your cluster that modify pods, you may need to set
webhook.reinvocationPolicy="IfNeeded"
during installation to ensure the pod mutator can properly apply its mutations after other webhooks make changes. Check the pod mutator logs for any signs of mutation conflicts or ordering issues.
For additional help, contact Cast AI support or visit our community Slack channel.
Updated about 11 hours ago