Pod mutations
What are pod mutations?
Pod mutations is a Cast AI feature that simplifies Kubernetes workload configuration and helps optimize cluster resource usage. It allows you to define templates that automatically modify pod specifications when they are created, reducing manual configuration overhead and ensuring consistent pod scheduling across your cluster.
Why use pod mutations?
Managing Kubernetes workloads at scale presents several challenges:
-
Complex Configuration Requirements: As clusters grow, manually configuring pod specifications becomes increasingly time-consuming and error-prone. Each workload may need specific labels, tolerations, and node selectors to ensure proper scheduling and resource allocation.
-
Legacy System Integration: When onboarding existing clusters to Cast AI, workloads sometimes need to be reconfigured to take full advantage of cost optimization features. This traditionally requires updating deployment manifests, which can be automated using pod mutations.
-
Resource Fragmentation: Without standardized pod configurations, clusters can become fragmented with too many node groups, leading to inefficient resource utilization and increased costs.
Pod mutations address all of these challenges.
How it works
Pod mutations allow you to define templates that automatically modify pod specifications when they are created. These templates can:
- Apply labels and tolerations
- Configure node selectors and affinities
- Link pods to specific Node Templates
- Consolidate multiple Node Templates
- Set Spot Instance preferences
The pod mutations controller, called the pod mutator, runs in your cluster and monitors pod creation events. When a new pod matches a mutation's configured filters, the controller automatically applies that mutation. Note that only one mutation can be applied to any given pod - if multiple mutations match a pod's filters, the most specific filter match will be used.
Installation
Install using the console
- Upon selecting a cluster from the cluster list, head over to Autoscaler --> Pod mutations in the sidebar.
- If you have not installed the
pod-mutatorcontroller, you will be prompted with a script you need to run in your cluster's cloud shell or terminal.
Install using Helm
- Add the Cast AI Helm repository:
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update- Install the pod mutations controller:
helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-agent pod-mutator \
castai-helm/castai-pod-mutator \
--set castai.apiUrl="https://api.cast.ai" \
--set castai.apiKey="${API_KEY}" \
--set castai.clusterID="${CLUSTER_ID}"
NotePrior to Pod Mutator version v0.0.26, an additional parameter
--set castai.organizationID="${ORGANIZATION_ID}"was required. If you're using a fixed Pod Mutator version older than v0.0.26, you'll still need to include this parameter.
Advanced installation options
The pod mutator supports the configuration of its webhook reinvocation policy. This controls whether the pod mutator should be reinvoked if other admission plugins modify the pod after the initial mutation.
helm upgrade -i --create-namespace -n castai-agent pod-mutator \
castai-helm/castai-pod-mutator \
--set castai.apiUrl="https://api.cast.ai" \
--set castai.apiKey="${API_KEY}" \
--set castai.clusterID="${CLUSTER_ID}" \
--set webhook.reinvocationPolicy="IfNeeded" # Set to "Never" by default
NotePrior to Pod Mutator version v0.0.26, an additional parameter
--set castai.organizationID="${ORGANIZATION_ID}"was required. If you're using a fixed Pod Mutator version older than v0.0.26, you'll still need to include this parameter.
The reinvocationPolicy can be set to:
Never(default): The pod mutator will only be called once during pod admissionIfNeeded: The pod mutator may be called again if other admission plugins modify the pod after the initial mutation
Setting reinvocationPolicy to IfNeeded is useful when you have multiple admission webhooks that may interact with each other. For example:
- Pod mutator adds its mutations
- Another webhook modifies the pod
- Pod mutator is invoked again to ensure its mutations are properly applied
However, if you want changes made by other webhooks to persist, setting reinvocationPolicy to IfNeeded may be counterproductive since the pod mutator will override any modifications that fall under its control when it's reinvoked. Consider your specific use case and the interaction between different webhooks in your cluster before changing this setting from its default value.
Creating pod mutations
Pod mutations can be defined through multiple methods:
- Using the Cast AI console interface
- Via the PodMutations API
- As Kubernetes Custom Resources using Terraform or other Kubernetes management tools
Each mutation consists of:
- A unique name
- Object filters to select targeted pods
- Mutation rules defining what changes to apply
- Node Template configurations (optional)
- Spot Instance preferences (optional)
Object filters and targeting
Label vs annotation targeting
Pod mutations only work with labels, not annotations. When configuring object filters, ensure you use labels to target pods.
Workload type targeting
Pod mutations target workloads based on their Kubernetes kind. Use the following kinds for different workload types:
| Workload Type | Kind to Use | Description |
|---|---|---|
| Jobs | Job | For batch processing workloads |
| CronJobs | CronJob | For scheduled recurring jobs |
| Bare Pods | Pod | For standalone pods without a controller |
| Deployments | Deployment | For applications with multiple replicas |
| StatefulSets | StatefulSet | For stateful applications |
Label placement for Deployments
When targeting Deployments with labels, place the label at the pod template level (spec.template.metadata.labels), not at the Deployment level (metadata.labels).
Correct placement:
apiVersion: apps/v1
kind: Deployment
metadata:
name: single-replica-app
spec:
replicas: 1
template:
metadata:
labels:
single-replica: "true" # Place label here (pod template)Multiple mutations for complex scenarios
Pod mutation selection works as an AND operation between namespaces and labels, not OR. For scenarios where you need to target workloads based on either namespace OR labels, create separate mutations.
Example: Target both namespace-based AND label-based workloads
Create two separate mutations:
Mutation 1 - Namespace-based:
{
"name": "system-namespaces",
"objectFilter": {
"namespaces": ["kube-system", "argocd"]
},
"nodeSelector": {
"scheduling.cast.ai/node-template": "system-workloads"
}
}Mutation 2 - Label-based:
{
"name": "specific-labels",
"objectFilter": {
"labels": {
"app.kubernetes.io/name": "castai-agent"
}
},
"nodeSelector": {
"scheduling.cast.ai/node-template": "system-workloads"
}
}Console example
After installing the pod-mutator controller in your cluster, you'll have access to the pod mutations in your console:
To create a new mutation template, simply click on Add template in the top-right, which will open the drawer in which you can define the configuration of your mutation.
- Begin by giving your mutation template a name:
- Then, define the filters by which the mutation candidate pods ought to be discovered by the controller:
- Configure your desired mutation configuration:
- Finally, choose the Spot settings most appropriate for this template before hitting Create:
The console UI offers a helping hand when creating mutations by means of tooltips and a live preview of what your configuration will look like:
API example
Here's an example pod mutation API request that applies labels and tolerations to specific workloads:
{
"objectFilter": {
"names": [
"app1",
"app2"
],
"namespaces": [
"production"
]
},
"labels": {
"environment": "production"
},
"spotType": "UNSPECIFIED_SPOT_TYPE",
"name": "production-mutation",
"organizationId": "fhytif73-f95f-44de-ad4b-f7898ce5ee42",
"clusterId": "11111111-1111-1111-1111-111111111111",
"enabled": true,
"tolerations": [
{
"key": "scheduling.cast.ai/node-template",
"operator": "Equal",
"value": "production-template",
"effect": "NoSchedule"
}
]
}Use the CreatePodMutation endpoint to experiment with your own pod mutations via API.
Terraform example
The castai-pod-mutator Helm chart installs a Kubernetes Custom Resource Definition (CRD) for the PodMutation kind. Pod mutation rules can then be added to a cluster as Kubernetes objects using Terraform or other Kubernetes management tools.
For the latest Terraform examples, see our GitHub repository.
Note on UI syncPod mutations created as Custom Resources in Kubernetes using Terraform or other tools will sync to the Cast AI console with a slight delay (approximately 3 minutes). They will appear with a
(Custom Resource in cluster)suffix in the name to indicate they are managed outside the console.Important considerations:
- These synced mutations cannot be edited through the Cast AI console – any attempt to edit will fail with an error
- All modifications must be made through Terraform
- Changes to the Custom Resource will be reflected in the UI after the sync delay
Here's an example Terraform configuration that creates a pod mutation:
# The castai-pod-mutator helm chart installs Kubernetes Custom Resource Definition (CRD) for the kind 'PodMutation'.
# Pod mutation rules can then be added to a cluster as plain Kubernetes object of this kind.
#
# You should use a name that is *not* shared by a mutation created via Cast AI console.
# In such cases, custom resource mutation will be shadowed by the mutation created via the console.
resource "kubernetes_manifest" "test_pod_mutation" {
manifest = {
apiVersion = "pod-mutations.cast.ai/v1"
kind = "PodMutation"
metadata = {
name = "test-pod-mutation"
}
spec = {
filter = {
# Filter values can be plain strings or regexes.
workload = {
namespaces = ["production", "staging"]
names = ["^frontend-.*$", "^backend-.*$"]
kinds = ["Pod", "Deployment", "ReplicaSet"]
}
pod = {
# labelsOperator can be "and" or "or"
labelsOperator = "and"
labelsFilter = [
{
key = "app.kubernetes.io/part-of"
value = "platform"
},
{
key = "tier"
value = "frontend"
}
]
}
}
restartPolicy = "deferred"
patches = [
{
op = "add"
path = "/metadata/annotations/mutated-by-pod-mutator"
value = "true"
}
]
spotConfig = {
# mode can be "preferred-spot", "optional-spot", or "only-spot"
mode = "preferred-spot"
distributionPercentage = 50
}
}
}
}Important considerations when using Terraform:
- Naming conflicts: Use unique names that don't conflict with mutations created via the Cast AI console
- Console visibility: Terraform-created mutations will appear in the Cast AI UI with a
(Custom Resource in cluster)suffix after a 3-minute sync delay - Read-only in UI: Console-synced Custom Resource mutations cannot be edited through the UI - all changes must be made via Terraform
- Management: Choose either Terraform or console management for each mutation to avoid conflicts
You can also manage pod mutations using other Kubernetes management tools like kubectl, Helm charts, or GitOps workflows by applying the same Custom Resource format.
Node Template consolidation
One powerful feature of pod mutations is the ability to consolidate multiple Node Templates. This helps reduce cluster fragmentation by allowing pods to schedule across multiple Node Template configurations.
When consolidating Node Templates:
- Specify the Node Templates to consolidate
- The controller converts individual node selectors and tolerations into node affinity rules
- Pods can then be scheduled on any node created by the specified templates
Example consolidation configuration:
{
"objectFilter": {
"namespaces": [
"production"
]
},
"name": "production-mutation",
"nodeTemplatesToConsolidate": [
"template-1",
"template-2"
]
}Spot Instance configuration
NoticeIf you are using the deprecated Spot-webhook, make sure to remove it before using pod mutations for Spot Instance configuration management. The two solutions are incompatible.
Pod mutations support three Spot Instance modes, and they can specify a percentage-based distribution between Spot and On-Demand instances.
Interaction with other admission controllers
Pod mutations can interact with other admission controllers in your cluster. If you're using Cast AI's pod-node-lifecycle feature alongside pod mutations, they may conflict when both try to manage Spot Instance scheduling. Symptoms include unwanted spot tolerations being added to pods or node selectors for Spot Instances appearing when not configured.
To resolve conflicts, either migrate to pod mutations for Spot Instance management and disable pod-node-lifecycle, or configure pod-node-lifecycle to ignore pods managed by mutations using label selectors.
Example pod-node-lifecycle exclusion:
ignorePods:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- emissary-ext
- castai-agent
- castai-cluster-controller
Spot Distribution Percentage
When configuring Spot settings for your pod mutations, you can specify what percentage of pods should receive Spot-related configuration versus remaining on On-Demand instances.
- A setting of 50% means that approximately half of your pods will receive the selected Spot configuration (optional, preferred, or use-only), while the other half will be scheduled on On-Demand instances
- The higher the percentage, the more pods will receive Spot-related configurations
- The lower the percentage, the more pods will remain on On-Demand instances
For example, with a 75% Spot distribution setting:
- 75% of pods will be scheduled according to your chosen Spot behavior (optional, preferred, or use-only)
- 25% of pods will be scheduled on On-Demand instances
The mutation controller makes this determination when pods are created, applying Spot-related mutations to the configured percentage of pods while leaving the remainder configured for On-Demand instances.
Note on rapid scalingWhen a deployment scales up instantaneously, for example, from 0 to 10 replicas at once, the pod mutator may not achieve the exact configured Spot/On-Demand distribution (e.g., 60/40) immediately. This happens because the controller must make placement decisions for each pod independently, without knowing the outcome of other pods being created at the same time. While the initial distribution might be skewed, the system is designed to self-correct and converge toward the configured ratio over time as pods are deleted, recreated, or scaled more gradually.
Spot Distribution Options
Combined with the distribution percentage, you can select one of three Spot behavior options:
| Mode | Description |
|---|---|
| Spot Instances are optional | Allows scheduling on either Spot or On-Demand instances to fulfill the selected Spot percentage. If both are available, there is no preference between instance types. |
| Use only Spot Instances | Strictly maintains the selected Spot/On-Demand ratio. If Spot Instances are unavailable, deployment will fail for the Spot portion. |
| Spot Instances are preferred | Targets the selected Spot percentage with Spot Instances but automatically falls back to On-Demand instances if Spot becomes unavailable. Will attempt to rebalance back to Spot when available. |
NoteThe actual distribution may vary slightly from the configured percentage, especially with small pod counts or simultaneous pod creation. For very low replica counts, the system prioritizes maintaining the minimum On-Demand percentage. For example, with a single pod and any Spot distribution below 100%, the pod will be scheduled on On-Demand to ensure the minimum On-Demand percentage is maintained. The distribution may also drift over time as pods are deleted and recreated through the normal application lifecycle, but the system is designed to be self-healing, meaning it will attempt to restore the desired distribution whenever new pods are created.
Example Configuration
{
"name": "production-spot-mutation",
"organizationId": "org-12345",
"clusterId": "cluster-67890",
"enabled": true,
"objectFilter": {
"namespaces": [
"production"
]
},
"spotType": "PREFERRED_SPOT",
"spotDistributionPercentage": 75
}This configuration creates a mutation that:
- Applies to all pods in the "production" namespace
- Sets 75% of pods to use Spot Instances with fallback to On-Demand if unavailable
- Keeps 25% of pods on On-Demand instances at all times
Combining percentage-based distribution with different Spot behavior options allows you to create deployment strategies that balance cost savings with application reliability requirements.
Advanced Configuration with JSON Patch
The Pod Mutations feature supports advanced configuration using JSON Patch, allowing for precise control over pod specifications beyond what's available through the standard UI options.
What is JSON Patch?
JSON Patch is a format for describing changes to a JSON document, defined in RFC 6902. In Kubernetes, it allows for complex modifications to pod specifications through a series of operations such as add, remove, replace, move, copy, and test.
For more information about JSON Patch operations, refer to Kubernetes documentation.
When to Use JSON Patch
Consider using JSON Patch when:
- You need to modify parts of a pod specification not covered by the standard UI options
- You want to perform multiple transformations in a specific order
- You're implementing complex mutation logic that combines adding, removing, and modifying fields
- You need to remove specific elements from arrays or nested structures
Configuring JSON Patch
To configure a JSON Patch:
-
In the pod mutation configuration, expand the "JSON Patch (advanced)" section:
-
Enter your JSON Patch operations in the drawer editor
-
Review the patch for errors before applying
WarningJSON Patch operations take precedence over UI-defined settings. If there's a conflict between your patch operations and UI configurations, the patch operations will be applied.
JSON Patch Structure
A JSON Patch consists of an array of operations, where each operation is an object with the following properties:
op: The operation to perform (add,remove,replace,move,copy, ortest)path: A JSON pointer to the location in the document where the operation is performedvalue: The value to use for the operation (foraddandreplace)from: A JSON pointer for themoveandcopyoperations
Common Examples
Add Node Selector
When you need to ensure pods are scheduled on specific nodes with certain characteristics, adding a node selector is the way to go:
[
{
"op": "add",
"path": "/spec/nodeSelector",
"value": {
"scheduling.cast.ai/node-template": "high-performance"
}
}
]This patch adds a node selector that directs pods to nodes using the "high-performance" node template, which might be optimized for CPU-intensive workloads.
WarningThis patch will replace any existing
nodeSelectorentirely. If you want to preserve existing nodeSelectors, use the method below.
Add a Single Node Selector Key-Value
To add a single nodeSelector key-value pair while preserving existing ones:
[
{
"op": "add",
"path": "/spec/nodeSelector/scheduling.cast.ai~1node-template",
"value": "high-performance"
}
]Note the special syntax with the tilde character (~1), which is used to escape the forward slash in the key name. However, this patch will fail if the nodeSelector object doesn't already exist in the pod specification.
Replace Toleration Effect
If you need to modify how pods tolerate node taints, you can replace specific fields within existing tolerations:
[
{
"op": "replace",
"path": "/spec/tolerations/0/effect",
"value": "NoSchedule"
}
]This patch changes the effect of the first toleration to "NoSchedule", ensuring pods won't be scheduled on nodes with matching taints rather than potentially being evicted later.
Remove a Specific Array Element
Sometimes, you need to remove specific configuration elements that are no longer needed or might conflict with your intended setup:
[
{
"op": "remove",
"path": "/spec/tolerations/2"
}
]This patch removes the third(0, 1, 2, [...]) toleration in the tolerations array, which might be necessary when transitioning workloads between different node types or environments.
Remove by Key
When you need to remove a specific key from a map or object:
[
{
"op": "remove",
"path": "/spec/nodeSelector/environment"
}
]This patch removes the "environment" key from nodeSelector while preserving other nodeSelector entries.
Remove a Specific Value from an Array
For more complex scenarios where you need to target array elements based on their content rather than position:
[
{
"op": "test",
"path": "/spec/tolerations/0/key",
"value": "node-role.kubernetes.io/control-plane"
},
{
"op": "remove",
"path": "/spec/tolerations/0"
}
]This two-step patch first verifies that the first toleration matches a specific control plane role, then removes it if the test passes.
Complex Example: Replace Node Affinity and Add Toleration
For comprehensive pod scheduling adjustments that require multiple coordinated changes:
[
{
"op": "remove",
"path": "/spec/affinity"
},
{
"op": "add",
"path": "/spec/nodeSelector",
"value": {
"scheduling.cast.ai/node-template": "custom-template"
}
},
{
"op": "add",
"path": "/spec/tolerations/-",
"value": {
"key": "scheduling.cast.ai/node-template",
"operator": "Equal",
"value": "custom-template",
"effect": "NoSchedule"
}
}
]This multi-operation patch completely reconfigures pod scheduling by removing any existing node affinity rules, setting a node selector for a custom template, and adding a matching toleration.
Example replacing values for the native Azure key: agentpool
The operation below uses the move and replace operators to modify Pod scheduling key values from the system nodepool in Azure to CAST nodeTemplate. In this example, the key is changed from agentpool to dedicated. The new NodeTemplate in CAST requires this label to schedule pods correctly.
[
{
"op": "move",
"from": "/metadata/labels/agentpool",
"path": "/metadata/labels/dedicated"
},
{
"op": "move",
"from": "/spec/nodeSelector/agentpool",
"path": "/spec/nodeSelector/dedicated"
},
{
"op": "replace",
"path": "/spec/tolerations/[key=agentpool]/key",
"value": "dedicated"
},
{
"op": "replace",
"path": "/spec/affinity/nodeAffinity/requiredDuringSchedulingIgnoredDuringExecution/nodeSelectorTerms/*/matchExpressions/[key=agentpool]/key",
"value": "dedicated"
},
{
"op": "replace",
"path": "/spec/affinity/nodeAffinity/preferredDuringSchedulingIgnoredDuringExecution/*/preference/matchExpressions/[key=agentpool]/key",
"value": "dedicated"
}
]This multi-operation will avoid pod scheduling into system nodePool in Azure.
JSON Patch Limitations
- JSON Patch operations apply to the pod template, not directly to the running pods
- Some Kubernetes fields are immutable and cannot be changed after creation
- Patches that would result in invalid pod specifications will be rejected
Limitations
- Mutations only apply to newly created pods. Similarly, mutation changes don't affect existing pods until they are recreated.
- Only one mutation can be applied to a pod. When multiple mutations have matching filters for a pod, Cast AI selects the mutation with the most specific filter (for example, a filter on pod name is more specific than a filter on namespace). We recommend using mutually exclusive filters rather than relying on this specificity ranking.
- Some pod configurations cannot be modified. Refer to the information above on what can be modified. Anything that is not mentioned is beyond the scope of pod mutations now.
- Scaling policies are evaluated every 30 seconds. As a result, changes to resource requests or limits may not be applied immediately.
Troubleshooting
Verify controller status
Check if the pod-mutator controller is running:
kubectl get pods -n castai-agent -l app=pod-mutatorCheck controller logs
View logs for mutation activity:
kubectl logs -n castai-agent -l app=pod-mutatorCommon issues
-
Mutations not applying: Verify the object filters match your pods, and the controller is running
-
Configuration conflicts: Check for conflicting mutations targeting the same pods
-
Invalid mutations: Ensure mutation specifications follow the correct format
-
Mutations not applying correctly with multiple webhooks: If you have multiple admission webhooks in your cluster that modify pods, you may need to set
webhook.reinvocationPolicy="IfNeeded"during installation to ensure the pod mutator can properly apply its mutations after other webhooks make changes. Check the pod mutator logs for any signs of mutation conflicts or ordering issues. -
Mutations not applying to Deployments: Verify labels are placed at
spec.template.metadata.labels, not at the Deployment level -
Unexpected Spot Instance scheduling: Check for conflicts with
pod-node-lifecycleor other admission controllers -
Labels not working for targeting: Confirm you're using labels, not annotations, for pod selection
For additional help, contact Cast AI support or visit our community Slack channel.
Updated 26 days ago
