Getting started
Early Access FeatureThis feature is in early access. It may undergo changes based on user feedback and continued development. We recommend testing in non-production environments first and welcome your feedback to help us improve.
This guide walks you through setting up OMNI for your cluster, from initial onboarding through creating your first edge location, configuring node templates, and provisioning compute capacity in the edge.
Before you begin
Ensure you meet the prerequisites listed in the OMNI Overview.
You'll need:
- A Phase 2 EKS, GKE or AKS cluster
kubectlconfigured for your cluster- Cloud CLI authenticated (
awsorgcloud) curlandjqinstalled- Sufficient cloud permissions to create networking resources and compute instances
Step 1: Onboard your cluster to OMNI
Onboarding deploys the OMNI components to your cluster. You can onboard in two ways.
Option A: During Phase 2 onboarding (recommended for new clusters)
If you're onboarding a cluster from Phase 1 (Read-only) to Phase 2 (Automation), you can enable OMNI at the same time.
-
In the Cast AI console, navigate to your cluster
-
Follow the standard Phase 2 onboarding flow
-
Select Extend cluster to other regions and cloud providers under Advanced settings
The script will be updated to include INSTALL_OMNI=true.
-
Copy and run the script in your terminal
-
Wait for the script to complete (typically 1-2 minutes)

Option B: Enable OMNI on an existing Phase 2 cluster
If you already have a Phase 2 cluster with cluster optimization enabled:
- Navigate to your cluster in the Cluster list
- Click on the ellipsis and choose Cast AI features
-
Under Other features, check the box for Extend cluster to other regions and cloud providers
-
Copy the updated script and run it in your terminal
-
Wait for the script to complete (typically 1-2 minutes)

Verify onboarding (optional)
Verify onboarding (optional)
After the script completes, verify OMNI is enabled:
kubectl get pods -n castai-omniYou should see OMNI components running, including:
liqo-*pods (controller manager, CRD replicator, fabric, IPAM, proxy, webhook)omni-agentpod
Example output:
NAME READY STATUS RESTARTS AGE
liqo-controller-manager-7cf59bcc64-xxxxx 1/1 Running 0 2m
liqo-crd-replicator-687bdc6f66-xxxxx 1/1 Running 0 2m
liqo-fabric-xxxxx 1/1 Running 0 2m
liqo-ipam-8667dbccbb-xxxxx 1/1 Running 0 2m
liqo-metric-agent-55cd8748c5-xxxxx 1/1 Running 0 2m
liqo-proxy-77c66dfb88-xxxxx 1/1 Running 0 2m
liqo-webhook-6f648484cc-xxxxx 1/1 Running 0 2m
omni-agent-595c4b97d9-xxxxx 1/1 Running 0 2mAll pods should be in Running status.
Step 2: Create and onboard an edge location
Edge locations define the regions where edge nodes can be provisioned. Each edge location is cluster-specific and requires its own setup.
- In the Cast AI console, navigate to Autoscaler → Edge locations
- Click Create edge location to open up the creation and configuration drawer
- Configure the edge location:
- Name: A descriptive name (e.g.,
aws-us-west-2orgcp-europe-west4) - Cloud provider: Select AWS or GCP or OCI
- Region: Select the target region
GCPFor GCP, providing the Project ID is also required.
- Name: A descriptive name (e.g.,
- Click Next
- Copy and run the provided script in your terminal to establish the connection with the edge location
NoteBefore running the script, ensure your cloud CLI is authenticated and configured for the correct account and region.
AWS
Set your AWS profile to ensure the script creates resources in the correct AWS account:
export AWS_PROFILE=<your-aws-profile> # Then run the provided onboarding scriptIf you're already using your default AWS credentials, you can skip setting the profile.
GCP
Set your active GCP project to ensure the script creates resources in the correct project:
gcloud config set project <your-project-id> # Then run the provided onboarding scriptThis should match the Project ID you provided when creating the edge location.
The script will:
- Create a VPC/network and subnet (if needed)
- Configure firewall rules and security groups
- Create service accounts or IAM users with appropriate permissions
- Register the edge location with Cast AI
Wait for the script to complete (typically 2-3 minutes).
After successful completion, the edge location appears in the Edge locations list with an Incomplete setup status and a notification confirms creation.

A newly created edgle location showing Incomplete setup status
Why "Incomplete setup"?An edge location shows Incomplete setup until it's added to at least one node template. This is expected behavior.
Skipping the scriptIf you skip running the script, the edge location is saved in a Pending state. You can return to complete this step later by accessing the edge location from the list.
Create additional edge locations (optional)
You can create multiple edge locations for the same cluster. Repeat the process above for each region where you want to provision edge nodes.
Step 3: Configure node templates for edge locations
Node templates control where the Autoscaler can provision nodes. To enable edge node provisioning, add edge locations to your node templates.
- Navigate to Autoscaler → Node templates
- Select an existing node template or create a new one
- In the node template editor, find the Edge locations section and check the box to Enable provisioning in edge locations
- Select one or more edge locations from the dropdown
- Click Save
When edge locations are selected:
- The Instance constraints section is updated to account for inventory from all selected edge locations
- The Available instances list includes instances from the main cluster region and all selected edge locations
- Autoscaler can now provision nodes in any of these locations based on cost and availability
Instance availability comparison
Before:

After:

After saving, the edge location status changes from Incomplete setup to In use.
Your cluster is now configured for edge node provisioning. The Autoscaler will automatically provision edge nodes as needed.
Edge node provisioning
Once configured, edge nodes are provisioned automatically by the Autoscaler based on:
- Cost optimization: Autoscaler compares Spot and On-Demand prices across the main cluster region and all edge locations configured in the node template
- Instance availability: Considers instances that are available in each region, including edge ones
- Node template constraints: Respects all CPU, memory, architecture, and other constraints otherwise defined in the node template, as one would expect
How edge nodes appear in your cluster
Cast AI Console
In the Cast AI Console, edge nodes are identified in the Nodes list via an additional External region label in the Node list:
Using kubectl
Edge nodes appear as virtual nodes in your cluster:
kubectl get nodesExample output:
NAME STATUS ROLE AGE VERSION
ip-192-168-56-192.eu-central-1.compute.internal Ready <none> 6h2m v1.30.14-eks-113cf36
cast-7f6821f2-b9fd-47e0-ab38-1f80c9c32dc0 Ready agent 6m20s v1.30.14-eks-b707fbb
# The 2nd node is an edge node with ROLE=agentEdge nodes can be identified by several characteristics:
Node labels:
liqo.io/type=virtual-node: Identifies the node as a Liqo virtual nodekubernetes.io/role=agent: Role designation for edge nodesomni.cast.ai/edge-location-name: Name of the edge locationomni.cast.ai/edge-id: Unique edge identifieromni.cast.ai/csp: Cloud provider of the edge (e.g.,gcp,aws)topology.kubernetes.io/region: Region where the edge node is located
Node taints:
virtual-node.omni.cast.ai/not-allowed=true:NoExecute: Applied to all edge nodes by default
ProviderID: Edge nodes have a special provider ID format:
castai-omni://<identifier-string>You can inspect an edge node to see all these identifiers:
kubectl describe node <node-name>Scheduling workloads on edge nodes
To enable workloads to run on edge nodes, label the namespace to allow offloading:
kubectl label ns <namespace-name> omni.cast.ai/enable-scheduling=trueWhen you deploy workloads to a labeled namespace, a mutating webhook automatically adds the required toleration to your pods, allowing them to be scheduled on edge nodes.
This label enables Liqo's offloading mechanism for the namespace.
WarningDo not offload the
defaultnamespace. Thedefaultnamespace exists in both the main cluster and edge clusters, and offloading it can cause unexpected behavior.
Manual toleration (optional)
While the toleration is added automatically for pods in labeled namespaces, you can also add it manually to your pod specs if needed:
tolerations:
- key: "virtual-node.omni.cast.ai/not-allowed"
operator: "Equal"
value: "true"
effect: "NoExecute"
Custom taintsIf your node template has additional custom taints beyond the default edge taint, you must manually add the corresponding tolerations to your pod specs. Only the default
virtual-node.omni.cast.ai/not-allowedtoleration is added automatically.
Workload compatibility
Not all workloads are suitable for running on edge nodes. Consider the following when deciding which workloads to offload:
Requirements (hard constraints):
- Linux
x86_64architecture only (ARM-based workloads are not supported) - Stateless workloads or workloads that don't depend on persistent volumes (PVs cannot be offloaded to edge nodes)
Recommendations:
- Workloads that can tolerate some additional network latency (cross-region or cross-cloud communication adds latency)
- Workloads with minimal to no dependencies on other in-cluster services
Given the above, workloads such as ML training jobs, or workloads that benefit from GPU availability more than low latency, and workloads where cost savings from cheaper GPU or compute instances justify the operational trade-offs, are prime candidates to be tested and offloaded to edge nodes.
Evictor behavior with edge nodes
The Evictor works with edge nodes but respects the edge node toleration requirement:
What Evictor can do:
- Evict workloads from edge nodes back to nodes in the main cluster when capacity is available
- Pack workloads across multiple edge nodes to optimize resource utilization
- Consider edge nodes in its bin-packing decisions
What Evictor cannot do:
- Place workloads on edge nodes unless they explicitly tolerate
virtual-node.omni.cast.ai/not-allowed=true:NoExecute
This means:
- Workloads without the edge toleration will never be moved to edge nodes by Evictor
- Workloads with the edge toleration can be evicted from the main cluster to edges (and vice versa)
- You maintain control over which workloads can run on edge nodes through tolerations
NoteOnly add the edge node toleration to workloads that are compatible with running in different regions or clouds. Consider all requirements when deciding which workloads to allow on edge nodes.
Edge node provisioning time
Edge nodes typically take the same amount of time to become ready as nodes in your main cluster region would.
For GPU instances, provisioning may take slightly longer due to driver installation.
Updated about 12 hours ago
