Creating scaling policies

Creating custom scaling policies allows you to define specific optimization strategies tailored to your workload requirements. The policy creation process involves configuring settings and defining assignment rules to automatically assign workloads in your cluster to the appropriate policies.

Create a new scaling policy

To create a new scaling policy:

From the Scaling policies page, click Create scaling policy
Enter a unique name for your policy
Configure your policy settings
Configure assignment rules
Review and save your configuration

Configure policy settings

The Settings step allows you to define how the policy will optimize workloads. Key configuration options include:

Automatic optimization – Whether to automatically apply recommendations to workloads assigned to this policy
Optimized resources – Which resources (CPU and/or Memory) to optimize
When to apply changes – Whether to use immediate or deferred scaling mode
Recommendation percentile – Which usage percentile to target for optimization
Overhead settings – Additional resource buffer to add to recommendations
Resource constraints – Minimum and maximum limits for scaled resources
Look-back period – Historical data window for generating recommendations
Startup metrics – Whether to ignore initial resource usage spikes

For detailed information about each configuration option and its impact on workload optimization, see Workload autoscaling settings.

Configure assignment rules

Assignment rules define which workloads are automatically assigned to your scaling policy. This feature allows you to create intelligent assignment criteria based on workload characteristics.

To configure assignment rules:

In the Assignment rules step, click Add rule
Define your assignment criteria using the available filters:
- Namespaces - Target workloads in specific namespaces Cast AI supports regex for namespace matching, allowing flexible namespace targeting: Examples:
  - ["production", "kube-system"] - Matches exact namespace names
  - ["dev-.*", "test-.*"] - Matches any namespace starting with "dev-" or "test-"
  - ["(test|sandbox)"] - Matches either "test" or "sandbox" namespaces
  - ["^app-[0-9]+$"] - Matches namespaces like "app-1", "app-42", etc.
    📘
    Regex syntax
    Cast AI uses Go regex syntax (RE2).
- Workload type - Filter by workload kinds (Deployment, StatefulSet, DaemonSet, Job, CronJob)
- Workload labels - Use label key-value pairs to identify workloads For more details on configuring complex rules, see below.
Add multiple rules if needed to capture different workload patterns

Policy assignment priority hierarchy

When multiple assignment rules could apply to the same workload, Workload Autoscaler follows a strict priority hierarchy, which is outlined below.

Assignment priority order

Workload Autoscaler evaluates workload assignments in the following priority order:

Manual/Explicit Assignment
- API assignments – Policies assigned via the Cast AI API
- Annotation assignments – Policies specified using workloads.cast.ai/configuration with scalingPolicyName
- Direct UI assignments – Policies manually assigned through the Optimization page in the Cast AI Console
Assignment Rules
- Evaluated based on policy priority order (as configured in the policies table)
- The first matching policy wins
- Rules within a policy use AND logic; rules between policies use OR logic
System Defaults If workloads have not been manually assigned to policies, and if no assignment rules match them, they will be moved to default system scaling policies:
- StatefulSets: resiliency
- All other workloads: balanced

How assignment rules are evaluated

When multiple policies have assignment rules that could match the same workload:

Rules are evaluated in policy priority order (as arranged in the policies table)
The first matching rule assigns the workload to that policy
Workloads that don't match any rules fall back to system defaults if they are not manually assigned to a policy

For workloads that match multiple rules within the same policy, all matching conditions are applied using AND logic within a rule and OR logic between rules.

Important considerations

Assignment rule overrides: Even if you have carefully configured assignment rules, workloads may not follow them if they have been manually assigned via API, annotations, or the UI. Manual assignments always take precedence over assignment rules.

Asynchronous propagation: For large clusters, when you update policy order or assignment rules, changes may take several seconds or more to propagate since the reassignment process is handled asynchronously. After making such changes, refresh the Console to see them take effect; only then make additional changes.

Duplicate a system policy

If you want to use a system policy as a starting point but need to customize it or create a modified version of any of your existing policies, do the following:

Navigate to Workload Autoscaler > Scaling policies
Find the policy you wish to duplicate in the policies table
Click the duplicate option for that policy
The system creates a fully editable copy with all the original settings
Modify any settings as needed in the configuration interface
Configure assignment rules to define workload assignment criteria
Save your customized policy

📘
Note
The readonly policy cannot be duplicated as it is reserved for Cast AI components.

Managing policy priority

You can adjust the priority order of your scaling policies by rearranging them in the policies table. Policy priority determines which policy a workload is assigned to when multiple policies have assignment rules that could match the same workload.

To change policy priority:

Navigate to the Scaling policies page
Use the drag handle (≡) on the left side of each policy row
Drag and drop policies to reorder them according to your desired priority
Click Save to apply the new priority order

📘
Note
This change is propagated asynchronously, so it may take some time to be reflected in the system. Workloads will be gradually reassigned to the newly ordered matching policy.

Policies higher in the list (lower order numbers) have higher priority and will be evaluated first when determining workload assignments. This allows you to create a hierarchy where more specific policies take precedence over general ones.

Alternative creation methods

While the Cast AI console provides a user-friendly interface for creating scaling policies, you can also create and manage policies programmatically using the Cast AI API or Terraform.

API integration

You can create scaling policies with assignment rules using the Workload Optimization API.

Basic API policy creation example

When creating a scaling policy via API, you define both the optimization settings and the assignment rules that determine which workloads the policy applies to. Here's a complete example:

{
  "name": "production-frontend-policy",
  "applyType": "IMMEDIATE",
  "managementPolicy": "MANAGED",
  "cpu": {
    "target": "p95",
    "overhead": 0.05
  },
  "memory": {
    "target": "max",
    "overhead": 0.10
  },
  "assignmentRules": [
    {
      "workload": {
        "gvk": ["Deployment", "StatefulSet"],
        "labelsExpressions": [
          {
            "key": "tier",
            "operator": "KUBERNETES_LABEL_SELECTOR_OP_IN",
            "values": ["frontend", "web"]
          }
        ]
      }
    },
    {
      "namespace": {
        "names": ["production", "prod-frontend"]
      }
    }
  ]
}

This example creates a scaling policy that:

Applies immediate scaling (recommendations applied as soon as thresholds are met)
Targets the 95th percentile for CPU usage with 5% overhead
Uses maximum observed memory usage with 10% overhead
Automatically assigns to Deployments or StatefulSets with tier=frontend or tier=web labels
OR assigns to any workloads in production or prod-frontend namespaces

Assignment rule structure in API

Assignment rules in the API follow a specific structure that allows for complex workload matching. Each policy can contain multiple assignment rules, and each rule can match based on workload characteristics, namespace properties, or both.

Rule composition:

Each assignmentRule can contain a workload matcher, a namespace matcher, or both
Multiple matchers within a single rule are combined with AND logic
Multiple rules within a policy are combined with OR logic

Workload matching capabilities:

Field	Type	Description	Example
`gvk`	Array	Kubernetes workload types to match	`["Deployment", "StatefulSet"]`
`labelsExpressions`	Array	Label-based matching conditions	See label operators table below

Namespace matching:

Field	Type	Description	Example
`names`	Array	Specific namespace names to match	`["production", "staging"]`

Advanced assignment rules for policy creation

When creating scaling policies via API, you have access to sophisticated assignment rule matching capabilities. These capabilities allow you to create highly specific policies that automatically assign workloads based on complex criteria.

Label operators for workload matching:

Operator	Requires `values` field	Description
`KUBERNETES_LABEL_SELECTOR_OP_CONTAINS`	Yes	Label value contains the specified substring. If no `key` is specified, checks all workload labels
`KUBERNETES_LABEL_SELECTOR_OP_REGEX`	Yes	Label value matches the specified regex pattern. If no `key` is specified, checks all workload labels
`KUBERNETES_LABEL_SELECTOR_OP_IN`	Yes	Label value must be in the specified list
`KUBERNETES_LABEL_SELECTOR_OP_NOT_IN`	Yes	Label value must not be in the specified list
`KUBERNETES_LABEL_SELECTOR_OP_EXISTS`	No	Label key must exist (regardless of value)
`KUBERNETES_LABEL_SELECTOR_OP_DOES_NOT_EXIST`	No	Label key must not exist

Workload type matching with GVK format

When defining which workload types your scaling policy should target, you can specify them using the GVK (Group, Version, Kind) format. The gvk field uses Kubernetes' Group/Version/Kind format, following the same pattern as kubectl get commands.

GVK format options:

Format	Syntax	Use case	Example
Kind only	`kind`	Match any version of the workload type	`"Deployment"` matches all Deployments
Group + Kind	`kind.group`	Match specific API group	`"Deployment.apps"` for apps/v1 Deployments
Full GVK	`kind.version.group`	Match exact API version	`"Deployment.v1.apps"` for specific version

GVK follows exactly the same naming pattern as kubectl get. Everything that works with kubectl get command on your cluster can be used as the gvk value.

Label expressions for precise workload targeting

Label expressions provide powerful filtering capabilities when creating scaling policies. They use Kubernetes label selector operators to match workloads based on their labels.

Complete label expression structure:

{
  "key": "environment",
  "operator": "KUBERNETES_LABEL_SELECTOR_OP_IN", 
  "values": ["production", "staging"]
}

Complex assignment rule example

This example demonstrates how to create a scaling policy with sophisticated assignment rules:

{
  "assignmentRules": [
    {
      "workload": {
        "gvk": ["Deployment", "CronJob"],
        "labelsExpressions": [
          {
            "key": "environment",
            "operator": "KUBERNETES_LABEL_SELECTOR_OP_IN",
            "values": ["production", "staging"]
          },
          {
            "key": "managed-by",
            "operator": "KUBERNETES_LABEL_SELECTOR_OP_EXISTS"
          },
          {
            "key": "experimental",
            "operator": "KUBERNETES_LABEL_SELECTOR_OP_DOES_NOT_EXIST"
          }
        ]
      }
    }
  ]
}

This example creates a scaling policy that targets Deployments and CronJobs that are in production or staging environments, have a managed-by label, and don't have an experimental label.

Regex pattern namespace matching example

{
  "assignmentRules": [
    {
      "namespace": {
        "names": ["dev-.*", "(test|sandbox)", "prod-frontend-[0-9]+"]
      }
    }
  ]
}

Assignment rule evaluation in policy creation

Understanding how assignment rules are evaluated is crucial when creating policies with multiple assignment rules:

Evaluation type	Logic	Description
Within a rule	AND	All matchers in a single rule must match for the workload to be assigned
Between rules	OR	Any rule within the policy can match to assign the workload
Policy priority	First match wins	When multiple policies match the same workload, the highest priority policy is applied

Policy priority can be changed via the UI or Terraform. This change is propagated asynchronously, so it may take some time to be reflected in the system. Workloads will be gradually reassigned to the newly ordered matching policy.

Terraform provider

Use the Cast AI Terraform provider to define scaling policies as infrastructure-as-code. This approach is ideal for managing policies consistently across multiple clusters or environments.

For complete documentation and examples, see the Cast AI Terraform provider documentation.

📘
Note
Full assignment rules support including regex patterns and advanced label operators is available in Cast AI Terraform provider v7.58.3 and later. For complete documentation and examples, see the Cast AI Terraform provider documentation.

Next steps

After creating your scaling policies:

Manage scaling policies - Learn how to apply policies to workloads and manage policy assignments
Workload autoscaling settings - Explore detailed configuration options for fine-tuning optimization behavior