Custom edge locations

📣

Early Access Feature

This feature is in early access. It may undergo changes based on user feedback and continued development. We recommend testing in non-production environments first and welcome your feedback to help us improve.

Custom edge locations let you add any Linux VM or bare metal instance to your OMNI-enabled cluster — regardless of cloud provider. Instead of Cast AI managing cloud resources on your behalf (as with AWS, GCP, or OCI edge locations), you run a single install script on each instance. The castai-edge-initd agent handles joining the instance to your cluster automatically.

This is useful when you want to use smaller cloud providers, on-premise hardware, or any compute that Cast AI does not natively integrate with.

How it works

  1. You create a Custom edge location in the Cast AI console or via API. This registers a logical edge location that skips cloud provider reconciliation.
  2. For each compute instance you want to add, you fetch the edge init script from the API and run it as root.
  3. The script installs the castai-edge-initd binary and starts it as a systemd service. The agent joins the instance to your cluster as an edge node.

Before you begin

Requirements

  • OMNI must be enabled on your cluster. See Getting started with OMNI.
  • Each compute instance must meet the following requirements:
RequirementDetail
OSLinux
Kernel6.0 or later
Architecturex86_64 (amd64) or aarch64 (arm64)
AccessRoot (or sudo) access to run the install script

Network requirements

Configure firewall rules on each instance to allow the following traffic.

Egress:

DestinationProtocolPortPurpose
0.0.0.0/0TCP443Cast AI API (api.cast.ai) and HTTPS services
0.0.0.0/0TCP80HTTP services
0.0.0.0/0TCP8443k0smotron control plane
0.0.0.0/0TCP8132k0smotron konnectivity endpoint
0.0.0.0/0UDP51840WireGuard VPN

Ingress (between edge instances):

Allow all traffic between instances in the same custom edge location so they can communicate directly:

SourceProtocolPortPurpose
Other OMNI VMs in the same networkAllAllDirect inter-node communication

If tag-based filtering is not available, use private CIDR ranges: 10.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12.

Cross-VM traffic (for k0s CNI):

The following ports must be open between all instances depending on your CNI encapsulation mode:

ProtocolPortPurpose
TCP179BGP — always required for route exchange between nodes
UDP5555FOU encapsulation — required if FOU is selected
IP protocol 4IPIP encapsulation — required if IPIP is selected (not supported by all providers)
📘

Note

Only one encapsulation protocol (FOU or IPIP) needs to be allowed, depending on which one k0s is configured to use. If no encapsulation is used, only BGP (TCP 179) is required.

Step 1: Create a custom edge location

  1. Navigate to AutomationNode autoscalerConfigurationEdge Locations
  2. Click Create edge location
  3. Set Provider to Custom
  4. Enter a Name for the location (for example, nebius-gpu or on-prem-dc1)
  5. Click Next and copy the edge location ID — you'll need it in the next step

After creation, the edge location appears in the list with Ready status.

Step 2: Run the init script on each instance

For each compute instance you want to join to the edge location, fetch and run the init script as root. The script installs castai-edge-initd and registers the instance as an edge node.

#!/bin/bash
set -e
API_HOST="https://api.cast.ai"
API_KEY="<key>"
ORG_ID="<org>"
CLUSTER_ID="<cluster>"
EDGE_LOCATION_ID="<edge-location-id>"

curl --url "$API_HOST/omni-provisioner/v1beta/organizations/$ORG_ID/clusters/$CLUSTER_ID/edge-locations/$EDGE_LOCATION_ID:edgeInitdScript" \
     --header "X-API-Key: $API_KEY" \
     --header "content-type: application/json" | bash
⚠️

Warning

The script must be run as root on the instance. It installs a binary to /usr/local/bin/castai-edge-initd and creates a systemd service.

Repeat this for every instance you want to add to the edge location.

Optional: labels, taints, and GPU configuration

To configure node labels, taints, and GPU settings, create an env file on the instance and pass its path via INITD_EXTRA_ENV_FILE before piping to bash:

#!/bin/bash
set -e
API_HOST="https://api.cast.ai"
API_KEY="<key>"
ORG_ID="<org>"
CLUSTER_ID="<cluster>"
EDGE_LOCATION_ID="<edge-location-id>"

cat > ./edge.env <<EOF
INITD_KUBERNETES_LABELS=env=prod,team=infra
INITD_KUBERNETES_TAINTS=dedicated=gpu:NoSchedule,spot:NoExecute
INITD_GPU_CONFIG={"mig": {"partitionSizes": ["1g.5gb"]}}
EOF

curl --url "$API_HOST/omni-provisioner/v1beta/organizations/$ORG_ID/clusters/$CLUSTER_ID/edge-locations/$EDGE_LOCATION_ID:edgeInitdScript" \
     --header "X-API-Key: $API_KEY" \
     --header "content-type: application/json" | \
     INITD_EXTRA_ENV_FILE=$(pwd)/edge.env \
     bash

The env file supports the following variables, among others:

VariableDescriptionExample
INITD_KUBERNETES_LABELSComma-separated key=value labels applied to the nodeenv=prod,team=infra
INITD_KUBERNETES_TAINTSComma-separated key=value:effect taints applied to the nodededicated=gpu:NoSchedule,spot:NoExecute
INITD_GPU_CONFIGJSON GPU configuration (for MIG partitioning){"mig": {"partitionSizes": ["1g.5gb"]}}
HTTP_PROXYHTTP proxy for outbound connectionshttp://proxy.example.com:3128
HTTPS_PROXYHTTPS proxy for outbound connectionshttps://proxy.example.com:3128
NVME_DEVICESSpace-separated list of NVMe block devices to use as ephemeral storage for k0s and kubelet. If unset, all non-root NVMe devices are used automatically. Multiple devices are striped together using LVM./dev/nvme0n1 /dev/nvme1n1
📘

Note

INITD_GPU_CONFIG is for NVIDIA MIG partitioning. Partition sizes must match the physical GPU in the instance. See the NVIDIA MIG documentation for valid partition sizes per GPU model.

Verify the agent is running

After running the script, verify the castai-edge-initd service started successfully:

systemctl status castai-edge-initd

View agent logs:

journalctl -f -u castai-edge-initd

View the init script that was downloaded and executed:

cat /var/lib/castai-edge-initd/init_script

View the systemd service file:

cat /etc/systemd/system/castai-edge-initd.service

View k0sworker logs:

journalctl -f -u k0sworker

Scheduling workloads on edge nodes

To schedule workloads on a custom edge node, label the namespace with omni.cast.ai/enable-scheduling=true and use a nodeSelector targeting your edge location by name.

apiVersion: v1
kind: Namespace
metadata:
  labels:
    omni.cast.ai/enable-scheduling: "true"
  name: omni-test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: omni-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      nodeSelector:
        omni.cast.ai/edge-location-name: "my-custom-location"
      containers:
        - name: nginx
          image: nginx:latest
          ports:
            - containerPort: 80
              name: http

The namespace label enables Liqo's offloading mechanism and causes a mutating webhook to automatically inject the required edge node toleration into pods. Replace my-custom-location with the name you gave your edge location in Step 1.

Removing an edge node

To remove a specific edge node from the cluster, delete it via the API:

#!/bin/bash
set -e
API_HOST="https://api.cast.ai"
API_KEY="<key>"
ORG_ID="<org>"
CLUSTER_ID="<cluster>"
EDGE_LOCATION_ID="<edge-location-id>"
EDGE_ID="<edge-id>"

curl --request DELETE \
     --url "$API_HOST/omni-provisioner/v2beta/organizations/$ORG_ID/clusters/$CLUSTER_ID/edge-locations/$EDGE_LOCATION_ID/edges/$EDGE_ID" \
     --header "X-API-Key: $API_KEY" \
     --header "content-type: application/json"

To find EDGE_ID, list the edges for your edge location in the Cast AI console or via the API.

Troubleshooting

IssueCauseSolution
castai-edge-initd service fails to startScript not run as rootRe-run the script with sudo or as root
Service starts but node doesn't appear in clusterNetwork connectivity issueVerify egress ports 443, 8443, 8132, and UDP 51840 are open
Node appears but pods won't scheduleNamespace not labeled for offloadingLabel the namespace: kubectl label ns <namespace> omni.cast.ai/enable-scheduling=true
MIG partitioning not appliedInvalid partition sizeCheck the GPU model and use a valid partition size from the NVIDIA documentation