Pod mutations

Pod mutations

What are pod mutations?

Pod mutations is a Cast AI feature that simplifies Kubernetes workload configuration and helps optimize cluster resource usage. It allows you to define templates that automatically modify pod specifications when they are created, reducing manual configuration overhead and ensuring consistent pod scheduling across your cluster.

Why use pod mutations?

Managing Kubernetes workloads at scale presents several challenges:

  • Complex Configuration Requirements: As clusters grow, manually configuring pod specifications becomes increasingly time-consuming and error-prone. Each workload may need specific labels, tolerations, and node selectors to ensure proper scheduling and resource allocation.

  • Legacy System Integration: When onboarding existing clusters to Cast AI, workloads sometimes need to be reconfigured to take full advantage of cost optimization features. This traditionally requires updating deployment manifests, which can be automated using pod mutations.

  • Resource Fragmentation: Without standardized pod configurations, clusters can become fragmented with too many node groups, leading to inefficient resource utilization and increased costs.

Pod mutations address all of these challenges.

How it works

Pod mutations allow you to define templates that automatically modify pod specifications when they are created. These templates can:

  • Apply labels and tolerations
  • Configure node selectors and affinities
  • Link pods to specific Node Templates
  • Consolidate multiple Node Templates
  • Set spot instance preferences

The pod mutations controller, called the pod mutator, runs in your cluster and monitors pod creation events. When a new pod matches the configured filters, the controller automatically applies the defined mutations.

Installation

Install using the console

  1. Upon selecting a cluster from the cluster list, head over to Autoscaler --> Pod mutations in the sidebar.
  2. If you have not installed the pod-mutator controller yet, you will be prompted with a script that you need to run in your cluster's cloud shell or terminal.

Install using Helm

  1. Add the Cast AI Helm repository:
helm repo add castai-helm https://castai.github.io/helm-charts
helm repo update
  1. Install the pod mutations controller:
helm repo add castai-helm https://castai.github.io/helm-charts
helm upgrade -i --create-namespace -n castai-agent pod-mutator \
castai-helm/castai-pod-mutator \
--set castai.apiUrl="https://api.cast.ai" \ 
--set castai.apiKey="${API_KEY}" \
--set castai.clusterID="${CLUSTER_ID}"

Advanced installation options

The pod mutator supports the configuration of its webhook reinvocation policy. This controls whether the pod mutator should be reinvoked if other admission plugins modify the pod after the initial mutation.

helm upgrade -i --create-namespace -n castai-agent pod-mutator \
castai-helm/castai-pod-mutator \
--set castai.apiUrl="https://api.cast.ai" \ 
--set castai.apiKey="${API_KEY}" \
--set castai.clusterID="${CLUSTER_ID}" \
--set webhook.reinvocationPolicy="IfNeeded" # Set to "Never" by default

The reinvocationPolicy can be set to:

  • Never (default): The pod mutator will only be called once during pod admission
  • IfNeeded: The pod mutator may be called again if other admission plugins modify the pod after the initial mutation

Setting reinvocationPolicy to IfNeeded is useful when you have multiple admission webhooks that may interact with each other. For example:

  1. Pod mutator adds its mutations
  2. Another webhook modifies the pod
  3. Pod mutator is invoked again to ensure its mutations are properly applied

⚠️ However, if you want changes made by other webhooks to persist, setting reinvocationPolicy to IfNeeded may be counterproductive since the pod mutator will override any modifications that fall under its control when it's reinvoked. Consider your specific use case and the interaction between different webhooks in your cluster before changing this setting from its default value.

Creating pod mutations

Pod mutations are either defined through the PodMutations API or created using the Cast AI console. Each mutation consists of:

  • A unique name
  • Object filters to select targeted pods
  • Mutation rules defining what changes to apply
  • Node Template configurations (optional)
  • Spot instance preferences (optional)

Console example

After installing the pod-mutator controller in your cluster, you'll have access to the pod mutations in your console:

To create a new mutation template, simply click on Add template in the top-right, which will open the drawer in which you can define the configuration of your mutation.

  1. Begin by giving your mutation template a name:
  1. Then, define the filters by which the mutation candidate pods ought to be discovered by the controller:
  1. Configure your desired mutation configuration:
  1. Finally, choose the spot settings most appropriate for this template before hitting Create:

The console UI offers a helping hand when creating mutation by means of tooltips and a live preview of what your configuration will look like:

API example

Here's an example pod mutation API request that applies labels and tolerations to specific workloads:

{
  "objectFilter": {
    "names": [
      "app1",
      "app2"
    ],
    "namespaces": [
      "production"
    ]
  },
  "labels": {
    "environment": "production"
  },
  "spotType": "UNSPECIFIED_SPOT_TYPE",
  "name": "production-mutation",
  "organizationId": "fhytif73-f95f-44de-ad4b-f7898ce5ee42",
  "clusterId": "11111111-1111-1111-1111-111111111111",
  "enabled": true,
  "tolerations": [
    {
      "key": "scheduling.cast.ai/node-template",
      "operator": "Equal",
      "value": "production-template",
      "effect": "NoSchedule"
    }
  ]
}

Use the CreatePodMutation endpoint to experiment with your own pod mutations via API.

Node Template consolidation

One powerful feature of pod mutations is the ability to consolidate multiple Node Templates. This helps reduce cluster fragmentation by allowing pods to schedule across multiple Node Template configurations.

When consolidating Node Templates:

  1. Specify the Node Templates to consolidate
  2. The controller converts individual node selectors and tolerations into node affinity rules
  3. Pods can then schedule on any node created by the specified templates

Example consolidation configuration:

{
  "objectFilter": {
    "namespaces": [
      "production"
    ]
  },
  "name": "production-mutation",
  "nodeTemplatesToConsolidate": [
    "template-1",
    "template-2"
  ]
}

Spot instance configuration

Pod mutations supports three spot instance modes:

ModeDescription
OPTIONAL_SPOTAllows pods to run on spot instances by adding required tolerations.
USE_ONLY_SPOTForces pods to only run on spot instances via node selectors.
PREFERRED_SPOTPrefers spot instances but allows fallback to on-demand via node affinity.

Example spot configuration:

{
  "objectFilter": {
    "namespaces": [
      "production"
    ]
  },
  "spotType": "USE_ONLY_SPOT",
  "name": "production-mutation",
  "enabled": true,
}

Best practices

  1. Use meaningful names: Give mutations descriptive names that indicate their purpose, so as to not have to look into the configuration to be be able to tell a mutation's purpose.

  2. Test in non-production: Validate mutation behavior in a test environment first, if possible.

  3. Monitor changes: Review the effects of mutations through the Cast AI console to ensure desired outcomes.

Limitations

  • Mutations only apply to newly created pods. Similarly, changes to mutations don't affect existing pods until they are recreated.
  • Some pod configurations cannot be modified. Refer to the information above on what can be modified. Anything that is not mentioned is outside of the scope for pod mutations at this time.

Troubleshooting

Verify controller status

Check if the pod-mutator controller is running:

kubectl get pods -n castai-agent -l app=pod-mutator

Check controller logs

View logs for mutation activity:

kubectl logs -n castai-agent -l app=pod-mutator

Common issues

  1. Mutations not applying: Verify the object filters match your pods and the controller is running

  2. Configuration conflicts: Check for conflicting mutations targeting the same pods

  3. Invalid mutations: Ensure mutation specifications follow the correct format

  4. Mutations not applying correctly with multiple webhooks: If you have multiple admission webhooks in your cluster that modify pods, you may need to set webhook.reinvocationPolicy="IfNeeded" during installation to ensure the pod mutator can properly apply its mutations after other webhooks make changes. Check the pod mutator logs for any signs of mutation conflicts or ordering issues.

For additional help, contact Cast AI support or visit our community Slack channel.