Reference

the This reference document outlines the installation and configuration of the Pod Mutator component, including the PodMutation custom resource definition (CRD), all available configuration options, and troubleshooting guidance.

For an introduction to pod mutations concepts, see Pod mutations overview. For a step-by-step first setup, see the Pod mutations quickstart.

Installation

Prerequisites

The pod mutator requires the following Cast AI component Helm chart versions:

ComponentMinimum chart version
castai-agent0.123.0
castai-cluster-controller0.85.0

Verify your installed versions:

helm list -n castai-agent --filter 'castai-agent|cluster-controller'

Install via console

  1. In the Cast AI console, select your cluster from the cluster list.
  2. Navigate to AutoscalerPod mutations in the sidebar.
  3. If the pod mutator is not installed, copy and run the provided installation script.

Install via Helm

  1. Add the Cast AI Helm repository:

    helm repo add castai-helm https://castai.github.io/helm-charts
    helm repo update
  2. Install the pod mutator:

    helm upgrade -i --create-namespace -n castai-agent pod-mutator \
      castai-helm/castai-pod-mutator \
      --set castai.apiKey="${CASTAI_API_KEY}" \
      --set castai.clusterID="${CLUSTER_ID}"

Replace ${CASTAI_API_KEY} and ${CLUSTER_ID} with your actual values. You can find these in the Cast AI console under UserAPI keys and in your cluster's settings.

Required Helm values

ValueDescription
castai.apiKeyYour Cast AI API key. Find this in UserAPI keys in the console.
castai.clusterIDYour cluster's ID. Find this in your cluster's settings in the console.
Where to find your API key and Cluster ID

API Key:

  1. In the Cast AI console, click your profile in the top right corner.
  2. Select API keys from the dropdown menu.
  3. Use an existing key, or click Create access key to generate a new one.

Cluster ID:

  1. Navigate to your cluster list in the Cast AI console.
  2. Select the cluster you want to install the pod mutator on. It will take you to Cluster overview > Dashboard in the sidebar navigation.
  3. Copy the Cluster ID value from the cluster details section.

Optional Helm values

ValueDefaultDescription
castai.apiUrlhttps://api.cast.aiCast AI API endpoint. Change for EU deployments:https://api.eu.cast.ai
webhook.reinvocationPolicyNeverWebhook reinvocation policy. See Webhook configuration.
webhook.failurePolicyIgnoreWebhook failure policy. Ignore allows pods to proceed if the webhook fails.
replicas2Number of pod mutator replicas.
mutator.processingDelay30sDelay before the pod mutator processes a pod.
resources.requests.cpu20mCPU request for pod mutator pods.
resources.requests.memory512MiMemory request for pod mutator pods.
resources.limits.memory512MiMemory limit for pod mutator pods.
hostNetworkfalseRun pods with host networking.
dnsPolicy""DNS policy override. Defaults to ClusterFirstWithHostNet if hostNetwork is true.

Installation example with custom values

helm upgrade -i --create-namespace -n castai-agent pod-mutator \
castai-helm/castai-pod-mutator \
		--set castai.apiUrl="https://api.eu.cast.ai" \
  --set castai.apiKey="${CASTAI_API_KEY}" \
  --set castai.clusterID="${CLUSTER_ID}" \
  --set webhook.reinvocationPolicy="IfNeeded" \
  --set replicas=3

Webhook configuration

The pod mutator runs as a Kubernetes Mutating Admission Webhook. The reinvocationPolicy setting controls whether the webhook is called again if other admission plugins modify the pod after the initial mutation.

PolicyBehavior
Never (default)The pod mutator is called only once during pod admission.
IfNeededThe pod mutator may be called again if other admission plugins modify the pod.

When to use IfNeeded

Set reinvocationPolicy to IfNeeded when:

  • You have multiple admission webhooks that modify pods
  • Other webhooks run before the pod mutator, and their changes affect fields the mutator needs to read
  • You're experiencing issues where mutations aren't being applied correctly due to webhook ordering
helm upgrade pod-mutator castai-helm/castai-pod-mutator -n castai-agent \
  --reset-then-reuse-values \
  --set webhook.reinvocationPolicy="IfNeeded"
⚠️

Warning

When reinvocationPolicy is set to IfNeeded, the pod mutator may override changes made by other webhooks if those changes conflict with mutation rules. Consider your webhook interaction patterns before enabling this setting.

Verify installation

Confirm the pod mutator is running:

kubectl get pods -n castai-agent -l app.kubernetes.io/name=castai-pod-mutator

Expected output:

NAME                                  READY   STATUS    RESTARTS       AGE
castai-pod-mutator-767d48f477-lshm2   1/1     Running   0              112s
castai-pod-mutator-767d48f477-wvprs   1/1     Running   1 (105s ago)   112s

Verify the webhook is registered:

kubectl get mutatingwebhookconfigurations | grep pod-mutator

Upgrading

Upgrade via console

The recommended way to keep the pod mutator up to date:

  1. In the Cast AI console, select Manage Organization in the top right.
  2. Navigate to Component control in the left menu.
  3. Find the pod mutator in the component list.
  4. For any cluster displaying a warning status, click on the component to view the details.
  5. Click Update and run the provided Helm command.

See Component Control for more information.

Upgrade via Helm

helm repo update castai-helm
helm upgrade pod-mutator castai-helm/castai-pod-mutator -n castai-agent --reset-then-reuse-values

The --reset-then-reuse-values flag preserves your existing configuration (API keys, cluster ID) while applying the latest chart defaults.

📘

Note

If you encounter errors like nil pointer evaluating interface {}.enabled during upgrades, use --reset-then-reuse-values instead of --reuse-values.

Check installed version

helm list -n castai-agent --filter pod-mutator

Uninstalling

To remove the pod mutator from your cluster:

helm uninstall pod-mutator -n castai-agent

This removes the pod mutator deployment and webhook configuration. Existing PodMutation custom resources remain in the cluster but are no longer applied to new pods.

To also remove all mutation definitions:

kubectl delete podmutations.pod-mutations.cast.ai --all

To remove the CRD entirely (this deletes all mutations):

kubectl delete crd podmutations.pod-mutations.cast.ai

PodMutation CRD specification

Pod mutations are defined as Kubernetes custom resources of kind PodMutation. Each resource specifies filters to match pods and patch operations or configurations to apply.

Resource definition

apiVersion: pod-mutations.cast.ai/v1
kind: PodMutation
PropertyValue
API grouppod-mutations.cast.ai
API versionv1
KindPodMutation
ScopeCluster
Short namespomu, pomus

Spec fields

The spec object defines the mutation behavior.

FieldTypeRequiredDescription
filterV2objectNoFilter configuration to match pods.
filterobjectNoLegacy filter configuration (backward compatibility).
patchesV2arrayNoPatch operations to apply to matched pods.
patchesarrayNoLegacy patch operations (backward compatibility).
restartPolicystringNoWhen to apply changes. Enum: deferred, immediate.
spotConfigobjectNoSpot Instance configuration.
distributionGroupsarrayNoPercentage-based distribution of configurations (preview).

filterV2

The recommended filter format using typed matcher objects.

filterV2.workload

Filters pods based on their parent workload properties.

FieldTypeRequiredDescription
namespacesarrayNoMatchers for namespace names.
excludeNamespacesarrayNoMatchers for namespaces to exclude.
namesarrayNoMatchers for workload names.
excludeNamesarrayNoMatchers for workload names to exclude.
kindsarrayNoMatchers for workload kinds.
excludeKindsarrayNoMatchers for workload kinds to exclude.

Each array element is a Matcher object:

FieldTypeRequiredDescription
typestringYesMatch type. Enum: exact, regex.
valuestringYesThe value or regex pattern to match.

filterV2.pod

Filters pods based on their labels.

FieldTypeRequiredDescription
labelsobjectNoLabel matchers for pods to include.
excludeLabelsobjectNoLabel matchers for pods to exclude.

Each is a LabelMatcherGroup object:

FieldTypeRequiredDescription
matchersarrayNoList of label matchers.
operatorstringNoHow to combine matchers. Enum: and, or.

Each element in matchers is a LabelMatcher object:

FieldTypeRequiredDescription
keyMatcherobjectYesMatcher for the label key.
valueMatcherobjectYesMatcher for the label value.

Both keyMatcher and valueMatcher are Matcher objects (same structure as workload matchers).

filter (legacy)

The legacy filter format using simple string arrays. Maintained for backward compatibility.

filter.workload

FieldTypeRequiredDescription
namespaces[]stringNoNamespace names or patterns to match.
excludeNamespaces[]stringNoNamespace names or patterns to exclude.
names[]stringNoWorkload names or patterns to match.
excludeNames[]stringNoWorkload names or patterns to exclude.
kinds[]stringNoWorkload kinds to match.
excludeKinds[]stringNoWorkload kinds to exclude.

filter.pod

FieldTypeRequiredDescription
labelsFilterarrayNoLabel conditions for pods to include.
labelsOperatorstringNoHow to combine label conditions. Enum: and, or.
excludeLabelsFilterarrayNoLabel conditions for pods to exclude.
excludeLabelsOperatorstringNoHow to combine exclude label conditions. Enum: and, or.

Each element in labelsFilter and excludeLabelsFilter is a LabelValue object:

FieldTypeRequiredDescription
keystringYesThe label key.
valuestringYesThe label value.

patchesV2

The recommended patch format using grouped operations. Each group contains a sequence of JSON Patch operations (RFC 6902) applied in order.

FieldTypeRequiredDescription
operationsarrayYesSequence of JSON Patch operations to apply.

Each element in operations is a PatchOperation object:

FieldTypeRequiredDescription
opstringYesOperation type. Enum: add, remove, replace, move, copy, test.
pathstringYesJSON Pointer (RFC 6901) to the target location.
valueanyConditionalValue to use. Required for add, replace, test.
fromstringConditionalSource path. Required for move, copy

patches (legacy)

The legacy patch format as a flat array of operations. Maintained for backward compatibility.

Each element is a PatchOperation object (same structure as patchesV2.operations).

spotConfig

Configures Spot Instance scheduling behavior.

FieldTypeRequiredDescription
modestringNoSpot behavior mode. Enum: optional-spot, only-spot, preferred-spot.
distributionPercentageintegerNoPercentage of pods to receive Spot configuration (0–100).

Spot mode values

ModeDescription
optional-spotSchedule on Spot or On-Demand with no preference.
only-spotRequire Spot instances. Pods fail to schedule if Spot is unavailable.
preferred-spotPrefer Spot, fall back to On-Demand. Rebalance to Spot when available.

distributionGroups

📘

Preview feature

Distribution groups are a preview feature that allows splitting a workload's replicas across multiple configurations.

FieldTypeRequiredDescription
namestringYesUnique identifier for the distribution group.
percentageintegerYesPercentage of pods for this group (0–100).
configurationobjectYesConfiguration to apply to pods in this group.

distributionGroups[].configuration

FieldTypeRequiredDescription
patchesarrayNoLegacy patch operations for this group.
patchesV2arrayNoPatch operations for this group (same structure as spec.patchesV2).
spotModestringNoSpot mode for this group. Enum: optional-spot, only-spot, preferred-spot.

restartPolicy

Controls when mutation changes take effect on existing workloads.

ValueDescription
deferredChanges apply only when pods are naturally recreated (default).
immediateReserved for future use.

Resource annotations

When you create a mutation through the Cast AI console or API, the resulting Kubernetes resource includes metadata annotations:

AnnotationDescription
pod-mutations.cast.ai/pod-mutation-idUnique identifier for the mutation.
pod-mutations.cast.ai/pod-mutation-nameThe friendly name provided when creating the mutation.
pod-mutations.cast.ai/pod-mutation-sourceCreation source: api (console/API) or cluster (kubectl/GitOps).

Console-created mutations use the resource name pattern api-mutation-{uuid}. Cluster-created mutations use the name you specify in the manifest.

Example

PLACEHOLDER: Insert actual mutation example here

Filters

Filters determine which pods a mutation applies to. The CRD supports two filter formats: filterV2 (recommended) and filter (legacy for backwards compatibility).

If multiple filter criteria are specified, they are combined with AND logic: a pod must match all specified criteria.

If no filters are specified, the mutation matches all pods in the cluster, which is rarely the intended use case.

FilterV2 (recommended)

The filterV2 object contains workload and pod sections with typed matcher objects.

Workload filters

Workload filters match pods based on their namespace, parent workload name, or workload kind.

Namespace filter

Match pods by the namespace they are created in.

filterV2:
  workload:
    namespaces:
      - type: exact
        value: production
      - type: exact
        value: staging

Each entry is a matcher object:

FieldTypeDescription
typestringMatch type: exact for exact match, regex for regular expression.
valuestringThe namespace name or regex pattern to match.

Examples:

# Match exact namespaces
filterV2:
  workload:
    namespaces:
      - type: exact
        value: production
      - type: exact
        value: staging

# Match namespaces using regex
filterV2:
  workload:
    namespaces:
      - type: regex
        value: "^team-.*$"
      - type: regex
        value: "^env-prod-.*$"

# Exclude specific namespaces
filterV2:
  workload:
    namespaces:
      - type: regex
        value: ".*"
    excludeNamespaces:
      - type: exact
        value: kube-system
      - type: exact
        value: castai-agent

Workload name filter

Match pods by their parent workload's name.

filterV2:
  workload:
    names:
      - type: exact
        value: frontend
      - type: regex
        value: "^backend-.*$"
FieldTypeDescription
typestringMatch type: exact or regex.
valuestringThe workload name or regex pattern to match.

Workload kind filter

Match pods by their parent workload's Kubernetes kind.

filterV2:
  workload:
    kinds:
      - type: exact
        value: Deployment
      - type: exact
        value: StatefulSet
FieldTypeDescription
typestringMatch type: exact or regex.
valuestringThe workload kind or regex pattern to match.

Supported workload kinds:

KindDescription
DeploymentStandard deployment workloads
StatefulSetStateful application workloads
DaemonSetNode-level daemon workloads
ReplicaSetReplica set workloads (typically managed by Deployments)
JobBatch processing workloads
CronJobScheduled recurring workloads
PodStandalone pods without a parent controller

Pod filters

Pod filters match pods based on their labels using a matcher structure.

Label filter

Match pods that have specific labels.

filterV2:
  pod:
    labels:
      matchers:
        - keyMatcher:
            type: exact
            value: environment
          valueMatcher:
            type: exact
            value: production
        - keyMatcher:
            type: exact
            value: tier
          valueMatcher:
            type: exact
            value: frontend
      operator: and
FieldTypeDescription
matchersarrayList of label matchers, each with keyMatcher and valueMatcher objects.
operatorstringHow to combine matchers: and (all must match) or or (any must match).

Each matcher has:

FieldTypeDescription
keyMatcherobjectMatcher for the label key with type and value.
valueMatcherobjectMatcher for the label value with type and value.

Exclude labels filter

Exclude pods that have specific labels, even if they match other filters.

filterV2:
  pod:
    excludeLabels:
      matchers:
        - keyMatcher:
            type: exact
            value: skip-mutation
          valueMatcher:
            type: exact
            value: "true"
      operator: or

Complete filterV2 example

filterV2:
  workload:
    namespaces:
      - type: exact
        value: production
    names:
      - type: regex
        value: "^frontend-.*$"
    kinds:
      - type: exact
        value: Deployment
      - type: exact
        value: StatefulSet
    excludeNames:
      - type: exact
        value: frontend-canary
  pod:
    labels:
      matchers:
        - keyMatcher:
            type: exact
            value: tier
          valueMatcher:
            type: exact
            value: web
      operator: and
    excludeLabels:
      matchers:
        - keyMatcher:
            type: exact
            value: skip-mutation
          valueMatcher:
            type: exact
            value: "true"
      operator: or

This matches pods that:

  • Are in the production namespace, AND
  • Belong to workloads with names starting with frontend- (but not frontend-canary), AND
  • Are part of a Deployment or StatefulSet, AND
  • Have the label tier: web, AND
  • Do NOT have the label skip-mutation: "true"

Legacy filter format

The legacy filter format uses simpler structures without typed matchers:

filter:
  workload:
    namespaces:
      - production
      - staging
    names:
      - frontend
      - backend
    kinds:
      - Deployment
    excludeNamespaces:
      - kube-system
    excludeNames:
      - canary
    excludeKinds:
      - DaemonSet
  pod:
    labelsFilter:
      - key: environment
        value: production
      - key: tier
        value: frontend
    labelsOperator: and
    excludeLabelsFilter:
      - key: skip-mutation
        value: "true"
    excludeLabelsOperator: or
📘

Note

Pod mutatio filters only work with labels, not annotations. When configuring filters, ensure you're targeting pod labels defined at spec.template.metadata.labels in your workload manifests.

Configuration options

Configuration options define what changes the mutation applies to matched pods. The primary method is through patchesV2 operations, with additional support for spotConfig.

Patch operations (patchesV2)

The patchesV2 field contains an array of patch groups, each with an operations array of JSON Patch operations (RFC 6902).

patchesV2:
  - operations:
      - op: add
        path: /metadata/labels/environment
        value: production
      - op: add
        path: /metadata/labels/managed-by
        value: castai

Operation fields

FieldTypeRequiredDescription
opstringYesThe operation: add, remove, replace, move, copy, or test.
pathstringYesJSON pointer to the target location in the pod spec.
valueanyConditionalThe value to use. Required for add, replace, and test operations.
fromstringConditionalSource path for move and copy operations.

Supported operations

OperationDescription
addAdd a value at the specified path. Creates intermediate objects/arrays as needed.
removeRemove the value at the specified path.
replaceReplace the value at the specified path. Path must already exist.
moveMove a value from one path to another.
copyCopy a value from one path to another.
testVerify a value exists at the path. Mutation fails if test fails.

Path syntax

Paths use JSON Pointer syntax (RFC 6901):

  • Paths start with /
  • Path segments are separated by /
  • Array indices are zero-based integers
  • Use /- to append to the end of an array
  • Escape ~ as ~0 and / as ~1 in key names (for example, /metadata/annotations/cast.ai~1mutation for key cast.ai/mutation)

Legacy patches format

The legacy patches field is a flat array of operations (without the grouping):

patches:
  - op: add
    path: /metadata/labels/environment
    value: production
  - op: add
    path: /metadata/labels/managed-by
    value: castai

Common patch examples

Add labels

patchesV2:
  - operations:
      - op: add
        path: /metadata/labels/environment
        value: production
      - op: add
        path: /metadata/labels/team
        value: platform

Add annotations

patchesV2:
  - operations:
      - op: add
        path: /metadata/annotations/prometheus.io~1scrape
        value: "true"
      - op: add
        path: /metadata/annotations/prometheus.io~1port
        value: "9090"
📘

Note

When adding annotations with / in the key (like prometheus.io/scrape), escape the / as ~1 in the path.

Add node selector

patchesV2:
  - operations:
      - op: add
        path: /spec/nodeSelector
        value:
          scheduling.cast.ai/node-template: production-template

Or add to an existing node selector:

patchesV2:
  - operations:
      - op: add
        path: /spec/nodeSelector/scheduling.cast.ai~1node-template
        value: production-template

Add tolerations

Add a toleration to the tolerations array:

patchesV2:
  - operations:
      - op: add
        path: /spec/tolerations/-
        value:
          key: scheduling.cast.ai/spot
          operator: Exists
          effect: NoSchedule

The /- syntax appends to the end of the array, which is safe when you don't know the current array length.

Remove a label

patchesV2:
  - operations:
      - op: remove
        path: /metadata/labels/deprecated-label

Replace a value

patchesV2:
  - operations:
      - op: replace
        path: /spec/nodeSelector/environment
        value: production

Azure agentpool migration

A common use case is migrating from Azure's native agentpool labels to Cast AI Node Templates:

patchesV2:
  - operations:
      - op: move
        from: /metadata/labels/agentpool
        path: /metadata/labels/dedicated
      - op: move
        from: /spec/nodeSelector/agentpool
        path: /spec/nodeSelector/dedicated
      - op: replace
        path: /spec/tolerations/0/key
        value: dedicated

Toleration reference

When adding tolerations, use these field values:

FieldTypeRequiredDescription
keystringYesThe taint key to tolerate.
operatorstringYesEqual (key and value must match) or Exists (only key must exist).
valuestringConditionalThe taint value. Required when operator is Equal.
effectstringNoThe taint effect: NoSchedule, PreferNoSchedule, or NoExecute. If empty, tolerates all effects.

Effect values:

EffectDescription
NoSchedulePods will not be scheduled on the node unless they tolerate the taint.
PreferNoScheduleKubernetes will try to avoid scheduling pods on the node, but it's not guaranteed.
NoExecuteExisting pods are evicted if they don't tolerate the taint. New pods won't be scheduled.

Common Cast AI tolerations:

# Tolerate spot instance nodes
- op: add
  path: /spec/tolerations/-
  value:
    key: scheduling.cast.ai/spot
    operator: Exists
    effect: NoSchedule

# Tolerate a specific node template
- op: add
  path: /spec/tolerations/-
  value:
    key: scheduling.cast.ai/node-template
    operator: Equal
    value: gpu-nodes
    effect: NoSchedule

Spot configuration

Configure Spot Instance scheduling behavior and distribution percentage for matched pods.

spotConfig:
  mode: preferred-spot
  distributionPercentage: 80
FieldTypeDescription
modestringSpot Instance behavior mode.
distributionPercentageintegerPercentage of pods to receive Spot configuration (0–100).

Spot modes

ModeDescription
optional-spotSchedule on either Spot or On-Demand instances. No preference between instance types if both are available.
only-spotStrictly require Spot instances. Pods will fail to schedule if Spot capacity is unavailable.
preferred-spotPrefer Spot instances but automatically fall back to On-Demand if Spot is unavailable. Will attempt to rebalance back to Spot when capacity returns.

Distribution percentage

The distribution percentage determines what fraction of matched pods receive Spot-related configuration (the exact configuration itself being defined separately):

  • 80% distribution: 80% of pods get the configured Spot behavior; 20% are scheduled on On-Demand instances.
  • 100% distribution: All matched pods receive the Spot configuration.
  • 0% distribution: No pods receive Spot configuration (effectively disables Spot for matched pods).

The pod mutator makes this determination at pod creation time. For each new pod, it probabilistically assigns the pod to either the Spot or On-Demand group based on the configured percentage.

📘

Note on rapid scaling

When a deployment scales up instantaneously (for example, from 0 to 10 replicas at once), the actual distribution may not match the configured percentage immediately. This happens because placement decisions are made independently for each pod without knowledge of other pods being created simultaneously. The system self-corrects over time as pods are recreated through normal application lifecycle events.

📘

Note on small replica counts

For workloads with very few replicas, the distribution may not precisely match the percentage. For example, with a single pod and any Spot distribution below 100%, the pod will be scheduled on On-Demand to ensure minimum availability. The distribution becomes more accurate as replica counts increase.

Example configurations

# 80% Spot with fallback to On-Demand
spotConfig:
  mode: preferred-spot
  distributionPercentage: 80

# 100% Spot, strict (no fallback)
spotConfig:
  mode: only-spot
  distributionPercentage: 100

# 50/50 split, no preference when both available
spotConfig:
  mode: optional-spot
  distributionPercentage: 50

# No Spot configuration (empty object)
spotConfig: {}

Patch limitations

  • Patches apply to the pod template at creation time, not to running pods.
  • Some Kubernetes fields are immutable after pod creation; patches targeting these fields will be rejected.
  • Patches that result in invalid pod specifications will cause the mutation to fail.

Conflict resolution

When multiple mutations have filters that match the same pod, Cast AI uses a specificity scoring system to select only one mutation. The most specific mutation wins.

Specificity scoring

Each filter criterion contributes points to a mutation's specificity score:

Filter criterionPoints
Workload name specified4 (most specific)
Pod labels specified2
Namespace specified1 (least specific)

The mutation with the highest total score is selected.

Example scores:

Mutation filtersScoreCalculation
Workload name + pod labels + namespace74 + 2 + 1
Workload name + namespace54 + 0 + 1
Pod labels + namespace30 + 2 + 1
Namespace only10 + 0 + 1

Tie-breaking rules

If two mutations have the same specificity score, the following rules are applied in order until a winner is determined:

  1. Fewer workload names wins. A mutation targeting 1 workload is more specific than one targeting 3 workloads.
  2. Fewer namespaces wins. A mutation targeting 1 namespace is more specific than one targeting 3 namespaces.
  3. More label conditions wins. A mutation with 5 label conditions is more specific than one with 2 conditions.
  4. Alphabetical order by name. If all else is equal, the mutation whose name comes first alphabetically wins.

Best practices

To avoid unexpected behavior from conflict resolution:

  • Design mutually exclusive filters. Structure your mutations so that each pod can only match one mutation's filters.
  • Use specific filters. Prefer workload name filters over broad namespace filters when targeting specific applications.
  • Document your mutation strategy. Keep track of which mutations target which workloads to prevent unintended overlaps.
  • Test with the affected workloads preview. Before creating a mutation, review which workloads will be affected in the console's preview.

Restart policy

The restart policy controls when mutation changes take effect on existing workloads.

ValueDescription
deferredChanges apply only when pods are naturally recreated (default).
immediateReserved for future use.

Currently, only deferred behavior is active. This means:

  • New pods created after the mutation is defined will receive the mutation.
  • Existing pods are not affected until they are deleted and recreated.
  • To apply a mutation to existing pods, trigger a rollout (for example, kubectl rollout restart deployment/my-app).

UI-created vs. cluster-created mutations

Pod mutations can be created through the Cast AI console/API or applied directly to the cluster as custom resources.

Creation methodResource name patternEditable in UIVisible in UI
Cast AI console/APIapi-mutation-{uuid}YesYes
kubectl / GitOpsYour chosen nameNoYes (with suffix)

Console/API-created mutations

When you create a mutation through the Cast AI console or API:

  • The Kubernetes resource name follows the pattern api-mutation-{uuid}
  • The friendly name you provide is stored in the pod-mutations.cast.ai/pod-mutation-name annotation
  • The resource includes a pod-mutations.cast.ai/pod-mutation-source: api annotation
  • You can edit and delete these mutations through the console

Cluster-created mutations

Mutations applied directly to the cluster (via kubectl, Terraform, Helm, ArgoCD, etc.):

  • Use whatever resource name you specify in your manifest
  • Appear in the Cast AI console with a suffix
  • Cannot be edited or deleted through the console UI
  • Must be managed entirely through your chosen tools
  • Are synced to the console with a slight delay (approximately 3 minutes)

Viewing mutations in the cluster

List all pod mutations:

kubectl get podmutations.pod-mutations.cast.ai

View a specific mutation:

kubectl get podmutations.pod-mutations.cast.ai <name> -o yaml

To avoid conflicts, choose one management method per mutation. Do not create mutations with the same logical purpose through both the console and cluster tools.

Limitations

New pods only: Mutations apply only when pods are created. Existing pods are not affected until they are recreated.

One mutation per pod: When multiple mutations match a pod, only the most specific one is applied. All others are ignored.

Immutable fields: Some Kubernetes pod fields cannot be modified. Mutations targeting immutable fields will be rejected.

Distribution accuracy: Spot distribution percentages may not be exact, especially with small replica counts or rapid scaling events. The system self-corrects over time.

Label targeting only: Pod filters work with labels, not annotations. Ensure your targeting criteria use pod labels.

Evaluation frequency: Mutations are evaluated at pod creation time. Changes to mutation definitions don't retroactively affect existing pods.

API reference

For programmatic mutation management, see the PodMutations API documentation.

Troubleshooting

Verify controller status

Check if the pod mutator is running:

kubectl get pods -n castai-agent -l app.kubernetes.io/name=castai-pod-mutator

All pods should show Running status.

Check controller logs

View recent logs for mutation activity:

kubectl logs -n castai-agent -l app.kubernetes.io/name=castai-pod-mutator --tail=100

Common issues

Mutations not being applied

Symptoms: New pods don't receive expected labels, tolerations, or other configurations.

Troubleshooting steps:

  1. Verify the pod mutator is running:

    kubectl get pods -n castai-agent -l app.kubernetes.io/name=castai-pod-mutator
  2. Check that mutations exist in the cluster:

    kubectl get podmutations.pod-mutations.cast.ai
  3. Verify the mutation's filters match your target pods. Check namespace, workload name, workload kind, and labels.

  4. Check pod mutator logs for errors:

    kubectl logs -n castai-agent -l app.kubernetes.io/name=castai-pod-mutator --tail=50
  5. Verify the webhook is registered:

    kubectl get mutatingwebhookconfigurations | grep pod-mutator

Mutations not applying to Deployments

Cause: Labels are placed at the Deployment level instead of the pod template level.

Solution: Ensure labels for targeting are at spec.template.metadata.labels, not metadata.labels:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app  # NOT used for mutation targeting
spec:
  template:
    metadata:
      labels:
        app: my-app  # Used for mutation targeting

Conflicting mutations

Symptoms: A different mutation is applied than expected, or mutations seem inconsistent.

Cause: Multiple mutations match the same pods, and the specificity scoring selects a different mutation than intended.

Solution:

  1. Review which mutations match your target pods using the "Affected workloads" preview in the console.

  2. Design mutually exclusive filters so each pod matches only one mutation.

  3. Understand the specificity scoring: workload name (4 points) > pod labels (2 points) > namespace (1 point).

Mutations not applying with multiple webhooks

Symptoms: Pod mutator runs, but mutations aren't visible on pods, especially when other admission webhooks exist.

Cause: Webhook ordering issues where other webhooks modify pods after the mutator runs.

Solution: Set reinvocationPolicy to IfNeeded:

helm upgrade pod-mutator castai-helm/castai-pod-mutator -n castai-agent \
  --reset-then-reuse-values \
  --set webhook.reinvocationPolicy="IfNeeded"

Installation fails with dependency check error

Symptoms: Helm install fails with component version errors.

Cause: castai-agent or castai-cluster-controller versions are below minimum requirements.

Solution:

  1. Check current versions:

    helm list -n castai-agent --filter 'castai-agent|cluster-controller'
  2. Update components to meet minimum requirements:

    • castai-agent: 0.123.0 or higher
    • castai-cluster-controller: 0.85.0 or higher

Upgrade fails with template error

Symptoms: Helm upgrade fails with nil pointer evaluating interface {}.enabled or similar template errors.

Solution: Use --reset-then-reuse-values instead of --reuse-values:

helm upgrade pod-mutator castai-helm/castai-pod-mutator -n castai-agent --reset-then-reuse-values

Mutation synced to cluster but not visible in console

Symptoms: kubectl get podmutations shows the mutation, but it doesn't appear in the Cast AI console.

Cause: Sync delay between cluster and console (approximately 3 minutes).

Solution: Wait 3-5 minutes for the sync to complete. If the mutation still doesn't appear, check that the cluster is connected and the Cast AI agent is running.

Getting help

If you're unable to resolve an issue, contact Cast AI support or visit the community Slack channel.

Related resources