📣
Early Access Feature
This feature is in early access. It may undergo changes based on user feedback and continued development. We recommend testing in non-production environments first and welcome your feedback to help us improve.

What is OMNI?

OMNI extends your Kubernetes cluster to additional regions and cloud providers, enabling you to provision nodes beyond your cluster's primary region. With OMNI, Cast AI's Autoscaler can automatically select the most cost-effective location—whether in your main cluster region or configured edge locations—based on real-time pricing and instance availability.

Key benefits

Expanded GPU access: Find GPU capacity across multiple regions and clouds when your main cluster region has limited availability
Cost optimization across regions: Autoscaler compares Spot and On-Demand prices across your main cluster region and edge locations, provisioning nodes where costs are lowest
Cross-region: Extend EKS clusters to additional AWS regions or GKE clusters to additional GCP regions
Cross-cloud: Extend EKS clusters to GCP regions or GKE clusters to AWS regions
Unified management: Edge nodes appear as standard nodes in your cluster and are managed through already familiar Cast AI workflows

Supported configurations

Cluster requirements:

Phase 2 cluster (Automation enabled)
EKS or GKE or AKS

Edge locations:

Any AWS region (can be added to EKS or GKE or AKS clusters)
Any GCP region (can be added to EKS or GKE or AKS clusters)
Any OCI region (can be added to EKS or GKE or AKS clusters)

Instance types:

Spot Instances
On-Demand instances

GPU support:

Automatic driver installation for GPU instances
Support for NVIDIA GPUs across AWS and GCP and OCI
Future support planned for alternative cloud providers, extending the pool of available GPUs

How OMNI works

OMNI introduces two core concepts:

Edge location: A cloud region where edge nodes can be provisioned. Each edge location contains the networking configuration and credentials needed to create nodes in that region. Edge locations are cluster-specific.

Edge node: A Kubernetes node running outside your cluster's control plane region. Edge nodes appear as virtual nodes in your cluster and can run workloads just like standard nodes.

This is a high-level overview of the entire OMNI flow from onboarding to provisioning:

You onboard your Phase 2 cluster to OMNI by deploying OMNI components
You create and configure edge locations in the regions where you want to provision nodes
You specify the edge locations in your node templates
Autoscaler provisions nodes in the most cost-effective location based on your node template constraints, comparing prices across the main cluster region and selected edge locations
Workloads can run on edge nodes in any configured edge location

Architecture overview

The diagram below shows how edge nodes in different cloud regions appear as virtual nodes in your main cluster:

graph LR
    subgraph MainCluster["Main Cluster"]
        pn1[Physical Node]
        pn2[Physical Node]
        pn3[Physical Node]
        vn1[Virtual Node]
        vn2[Virtual Node]
        vn3[Virtual Node]
    end

    subgraph GCPEdge["GCP Edge Location"]
        gcp_node1[Edge Node]
        gcp_node2[Edge Node]
        gcp_node3[Edge Node]
    end

    subgraph AWSEdge["AWS Edge Location"]
        aws_node1[Edge Node]
    end

    vn1 --> gcp_node1
    vn1 --> gcp_node2
    vn2 --> gcp_node3
    vn3 --> aws_node1

Technical implementation

OMNI is powered by Liqo, an open-source project that enables Kubernetes multi-cluster topologies. Liqo establishes peering connections between your main cluster and edge nodes, allowing them to appear as virtual nodes. You'll see Liqo components deployed in the castai-omni namespace and Liqo-related labels on edge nodes (such as liqo.io/type=virtual-node).

Connecting to workloads on edge nodes

To make a workload accessible from other workloads, you can expose it using a Kubernetes Service, as you would in any setup.

Using Services, you can establish connectivity between workloads that communicate:

From the Main Cluster to an Edge Cluster
From an Edge Cluster to the Main Cluster
From an Edge Cluster to another Edge Cluster

In a nutshell, every workload can connect to the others just as they would in a cluster made up only of traditional (non-edge) nodes.

Connection limitations

Currently, we don’t support direct connectivity between edges. Consequently, if a workload on an edge needs to communicate with another one on a different edge, the traffic will always be forwarded through the main cluster. This approach may cause issues in multi-cluster scenarios where the edges are on other cloud providers than the main cluster, potentially leading to increased cloud networking costs.

Prerequisites

Before using OMNI, ensure you have:

A Phase 2 EKS or GKE or AKS cluster
Required tools installed: kubectl, cloud CLI (aws or gcloud), curl, jq
Appropriate cloud provider permissions for the regions where you want to create edge locations

Required cloud permissions

OMNI needs permissions to create and manage resources in each edge location. The onboarding script automates most of this setup, but your cloud account must have permissions to:

For AWS edge locations:

Create VPCs, subnets, internet gateways, and route tables
Create and manage security groups
Create IAM users and policies
Launch EC2 instances

AWS EC2 permissions

The following EC2 permissions are required:

* `ec2:DescribeAccountAttributes`
* `ec2:DescribeAddresses`
* `ec2:DescribeAvailabilityZones`
* `ec2:DescribeImages`
* `ec2:DescribeInstanceStatus`
* `ec2:DescribeInstances`
* `ec2:DescribeKeyPairs`
* `ec2:DescribeNatGateways`
* `ec2:DescribeNetworkAcls`
* `ec2:DescribeNetworkInterfaces`
* `ec2:DescribeRegions`
* `ec2:DescribeRouteTables`
* `ec2:DescribeSecurityGroups`
* `ec2:DescribeSubnets`
* `ec2:DescribeTags`
* `ec2:DescribeVolumes`
* `ec2:DescribeVpcAttribute`
* `ec2:DescribeVpcClassicLink`
* `ec2:DescribeVpcEndpoints`
* `ec2:DescribeVpcPeeringConnections`
* `ec2:DescribeVpcs`
* `ec2:CreateVpc`
* `ec2:DeleteVpc`
* `ec2:ModifyVpcAttribute`
* `ec2:CreateSubnet`
* `ec2:DeleteSubnet`
* `ec2:ModifySubnetAttribute`
* `ec2:CreateRouteTable`
* `ec2:DeleteRouteTable`
* `ec2:AssociateRouteTable`
* `ec2:DisassociateRouteTable`
* `ec2:CreateRoute`
* `ec2:DeleteRoute`
* `ec2:ReplaceRoute`
* `ec2:CreateInternetGateway`
* `ec2:DeleteInternetGateway`
* `ec2:AttachInternetGateway`
* `ec2:DetachInternetGateway`
* `ec2:CreateNatGateway`
* `ec2:DeleteNatGateway`
* `ec2:CreateEgressOnlyInternetGateway`
* `ec2:DeleteEgressOnlyInternetGateway`
* `ec2:AllocateAddress`
* `ec2:ReleaseAddress`
* `ec2:AssociateAddress`
* `ec2:DisassociateAddress`

For GCP edge locations:

Create networks, subnets, and firewall rules
Create service accounts, keys, and role bindings
Launch compute instances

GCP IAM Roles

The following IAM roles are required:

* `roles/compute.instanceAdmin.v1`
* `roles/iam.serviceAccountUser`

The edge location onboarding script will create the necessary cloud resources and configure appropriate permissions. You only need to ensure your cloud account has sufficient privileges to do so.

Limitations

Architecture constraint: Only the x86_64 architecture is supported for edge nodes. If edge locations are selected in a node template, the architecture must be set to x86_64.
No hibernation support: Clusters with OMNI enabled cannot use Cast AI's hibernation feature.
No Rebalancer support: OMNI edge nodes cannot be rebalanced at this time. This is one of the improvements that the team is hard at work to deliver to our customers.
Evictor support: The Evictor can evict workloads from edge nodes to the main cluster when capacity is available, and can pack workloads across multiple edge nodes. However, it will not place workloads on edge nodes unless they have the required virtual-node.omni.cast.ai/not-allowed=true:NoExecute toleration.
Node Configuration is not supported: Node Configurations for OMNI (Edge) nodes are automatically created for each edge location and cannot be edited. They are not visible in the Cast AI console and cannot be interacted with in any way.
Persistent volume limitations: Workloads that depend on persistent volumes (PVs) cannot be offloaded to edge nodes.
Limited observability: Kvisor is not deployed on edge nodes, so security/netflow monitoring are not available for edge workloads.

Terminology

Term	Definition
Main cluster	Your Phase 2 EKS or GKE or AKS cluster
Edge cluster	A lightweight K3s cluster running in an edge location that hosts edge nodes
Cluster region	The cloud region where your cluster control plane is deployed
Edge node	A Kubernetes node provisioned outside the main cluster's control plane region
Edge location	A cloud region configured for edge node provisioning, containing networking setup and credentials
Edge configuration	An Edge Node Configuration specific to an edge location (auto-created, not visible in the Console)

What's next

Ready to get started? See Getting started with OMNI for step-by-step setup instructions.