Custom edge locations
Early Access FeatureThis feature is in early access. It may undergo changes based on user feedback and continued development. We recommend testing in non-production environments first and welcome your feedback to help us improve.
Custom edge locations let you add any Linux VM or bare metal instance to your OMNI-enabled cluster — regardless of cloud provider. Instead of Cast AI managing cloud resources on your behalf (as with AWS, GCP, or OCI edge locations), you run a single install script on each instance. The castai-edge-initd agent handles joining the instance to your cluster automatically.
This is useful when you want to use smaller cloud providers, on-premise hardware, or any compute that Cast AI does not natively integrate with.
How it works
- You create a Custom edge location in the Cast AI console or via API. This registers a logical edge location that skips cloud provider reconciliation.
- For each compute instance you want to add, you fetch the edge init script from the API and run it as root.
- The script installs the
castai-edge-initdbinary and starts it as a systemd service. The agent joins the instance to your cluster as an edge node.
Before you begin
Requirements
- OMNI must be enabled on your cluster. See Getting started with OMNI.
- Each compute instance must meet the following requirements:
| Requirement | Detail |
|---|---|
| OS | Linux |
| Kernel | 6.0 or later |
| Architecture | x86_64 (amd64) or aarch64 (arm64) |
| Access | Root (or sudo) access to run the install script |
Network requirements
Configure firewall rules on each instance to allow the following traffic.
Egress:
| Destination | Protocol | Port | Purpose |
|---|---|---|---|
| 0.0.0.0/0 | TCP | 443 | Cast AI API (api.cast.ai) and HTTPS services |
| 0.0.0.0/0 | TCP | 80 | HTTP services |
| 0.0.0.0/0 | TCP | 8443 | k0smotron control plane |
| 0.0.0.0/0 | TCP | 8132 | k0smotron konnectivity endpoint |
| 0.0.0.0/0 | UDP | 51840 | WireGuard VPN |
Ingress (between edge instances):
Allow all traffic between instances in the same custom edge location so they can communicate directly:
| Source | Protocol | Port | Purpose |
|---|---|---|---|
| Other OMNI VMs in the same network | All | All | Direct inter-node communication |
If tag-based filtering is not available, use private CIDR ranges: 10.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12.
Cross-VM traffic (for k0s CNI):
The following ports must be open between all instances depending on your CNI encapsulation mode:
| Protocol | Port | Purpose |
|---|---|---|
| TCP | 179 | BGP — always required for route exchange between nodes |
| UDP | 5555 | FOU encapsulation — required if FOU is selected |
| IP protocol 4 | — | IPIP encapsulation — required if IPIP is selected (not supported by all providers) |
NoteOnly one encapsulation protocol (FOU or IPIP) needs to be allowed, depending on which one k0s is configured to use. If no encapsulation is used, only BGP (TCP 179) is required.
Step 1: Create a custom edge location
- Navigate to Automation → Node autoscaler → Configuration → Edge Locations
- Click Create edge location
- Set Provider to Custom
- Enter a Name for the location (for example,
nebius-gpuoron-prem-dc1) - Click Next and copy the edge location ID — you'll need it in the next step
After creation, the edge location appears in the list with Ready status.
Step 2: Run the init script on each instance
For each compute instance you want to join to the edge location, fetch and run the init script as root. The script installs castai-edge-initd and registers the instance as an edge node.
#!/bin/bash
set -e
API_HOST="https://api.cast.ai"
API_KEY="<key>"
ORG_ID="<org>"
CLUSTER_ID="<cluster>"
EDGE_LOCATION_ID="<edge-location-id>"
curl --url "$API_HOST/omni-provisioner/v1beta/organizations/$ORG_ID/clusters/$CLUSTER_ID/edge-locations/$EDGE_LOCATION_ID:edgeInitdScript" \
--header "X-API-Key: $API_KEY" \
--header "content-type: application/json" | bash
WarningThe script must be run as root on the instance. It installs a binary to
/usr/local/bin/castai-edge-initdand creates a systemd service.
Repeat this for every instance you want to add to the edge location.
Optional: labels, taints, and GPU configuration
To configure node labels, taints, and GPU settings, create an env file on the instance and pass its path via INITD_EXTRA_ENV_FILE before piping to bash:
#!/bin/bash
set -e
API_HOST="https://api.cast.ai"
API_KEY="<key>"
ORG_ID="<org>"
CLUSTER_ID="<cluster>"
EDGE_LOCATION_ID="<edge-location-id>"
cat > ./edge.env <<EOF
INITD_KUBERNETES_LABELS=env=prod,team=infra
INITD_KUBERNETES_TAINTS=dedicated=gpu:NoSchedule,spot:NoExecute
INITD_GPU_CONFIG={"mig": {"partitionSizes": ["1g.5gb"]}}
EOF
curl --url "$API_HOST/omni-provisioner/v1beta/organizations/$ORG_ID/clusters/$CLUSTER_ID/edge-locations/$EDGE_LOCATION_ID:edgeInitdScript" \
--header "X-API-Key: $API_KEY" \
--header "content-type: application/json" | \
INITD_EXTRA_ENV_FILE=$(pwd)/edge.env \
bashThe env file supports the following variables, among others:
| Variable | Description | Example |
|---|---|---|
INITD_KUBERNETES_LABELS | Comma-separated key=value labels applied to the node | env=prod,team=infra |
INITD_KUBERNETES_TAINTS | Comma-separated key=value:effect taints applied to the node | dedicated=gpu:NoSchedule,spot:NoExecute |
INITD_GPU_CONFIG | JSON GPU configuration (for MIG partitioning) | {"mig": {"partitionSizes": ["1g.5gb"]}} |
HTTP_PROXY | HTTP proxy for outbound connections | http://proxy.example.com:3128 |
HTTPS_PROXY | HTTPS proxy for outbound connections | https://proxy.example.com:3128 |
NVME_DEVICES | Space-separated list of NVMe block devices to use as ephemeral storage for k0s and kubelet. If unset, all non-root NVMe devices are used automatically. Multiple devices are striped together using LVM. | /dev/nvme0n1 /dev/nvme1n1 |
Note
INITD_GPU_CONFIGis for NVIDIA MIG partitioning. Partition sizes must match the physical GPU in the instance. See the NVIDIA MIG documentation for valid partition sizes per GPU model.
Verify the agent is running
After running the script, verify the castai-edge-initd service started successfully:
systemctl status castai-edge-initdView agent logs:
journalctl -f -u castai-edge-initdView the init script that was downloaded and executed:
cat /var/lib/castai-edge-initd/init_scriptView the systemd service file:
cat /etc/systemd/system/castai-edge-initd.serviceView k0sworker logs:
journalctl -f -u k0sworkerScheduling workloads on edge nodes
To schedule workloads on a custom edge node, label the namespace with omni.cast.ai/enable-scheduling=true and use a nodeSelector targeting your edge location by name.
apiVersion: v1
kind: Namespace
metadata:
labels:
omni.cast.ai/enable-scheduling: "true"
name: omni-test
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: omni-test
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
nodeSelector:
omni.cast.ai/edge-location-name: "my-custom-location"
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
name: httpThe namespace label enables Liqo's offloading mechanism and causes a mutating webhook to automatically inject the required edge node toleration into pods. Replace my-custom-location with the name you gave your edge location in Step 1.
Removing an edge node
To remove a specific edge node from the cluster, delete it via the API:
#!/bin/bash
set -e
API_HOST="https://api.cast.ai"
API_KEY="<key>"
ORG_ID="<org>"
CLUSTER_ID="<cluster>"
EDGE_LOCATION_ID="<edge-location-id>"
EDGE_ID="<edge-id>"
curl --request DELETE \
--url "$API_HOST/omni-provisioner/v2beta/organizations/$ORG_ID/clusters/$CLUSTER_ID/edge-locations/$EDGE_LOCATION_ID/edges/$EDGE_ID" \
--header "X-API-Key: $API_KEY" \
--header "content-type: application/json"To find EDGE_ID, list the edges for your edge location in the Cast AI console or via the API.
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
castai-edge-initd service fails to start | Script not run as root | Re-run the script with sudo or as root |
| Service starts but node doesn't appear in cluster | Network connectivity issue | Verify egress ports 443, 8443, 8132, and UDP 51840 are open |
| Node appears but pods won't schedule | Namespace not labeled for offloading | Label the namespace: kubectl label ns <namespace> omni.cast.ai/enable-scheduling=true |
| MIG partitioning not applied | Invalid partition size | Check the GPU model and use a valid partition size from the NVIDIA documentation |
Updated about 1 hour ago
