Overview

Container Live Migration

🚧

Early Access Feature

Container live migration is currently in early access for AWS EKS clusters. This feature works seamlessly with Cast AI's Evictor to provide zero-downtime workload optimization.

What is container live migration?

Container live migration enables seamless relocation of running workloads between nodes without any downtime or service interruption. Unlike traditional Kubernetes pod eviction, which terminates and restarts applications, live migration preserves the complete runtime state, including memory contents, process state, and active network connections.

This capability transforms how you can manage stateful workloads in Kubernetes, allowing you to perform node maintenance, optimize resource utilization, and handle infrastructure changes without impacting critical applications that cannot tolerate restarts.

Key benefits

Zero downtime for critical workloads
Move stateful applications like databases and long-running processes without interruption or data loss. Learn more about the business impact in our blog post.

Intelligent node optimization
Enable Cast AI's Evictor to optimize cluster utilization by moving workloads off underutilized nodes, creating opportunities for significant cost savings without service disruption. Traditional bin-packing approaches were limited by stateful workloads—live migration removes these constraints.

Preserved network connections
Maintain active TCP connections and session state during migration, ensuring minimal interruption for client applications and ongoing transactions. Applications with strict timeout requirements may need appropriate timeout configurations to handle the brief migration window.

How it works

Container live migration uses advanced checkpoint and restore technology to seamlessly transfer running pods between nodes:

  1. Workload Assessment: Cast AI's live controller automatically scans your cluster and identifies workloads eligible for live migration, applying appropriate labels based on workload characteristics.

  2. Evictor: When Cast AI's Evictor identifies bin-packing opportunities, it determines the best approach for each workload on whether to live-migrate or evict it.

  3. State Transfer: The system transfers memory pages and process state to the destination node while the application continues running, minimizing freeze time.

  4. Network Preservation: A forked version of the AWS VPC CNI ensures pods maintain their IP addresses and TCP connections remain intact throughout the migration.

  5. Seamless Handover: The application is briefly paused while the final state is transferred, and the workload resumes on the new node with full continuity.*
    *Migration duration varies based on application memory usage, instance type, and network throughput.

The entire process leverages CRIU (Checkpoint/Restore In Userspace) technology with Cast AI's orchestration layer to ensure reliable, efficient migrations with minimal application impact.

*Incremental transfer is limited to the x86 processor architecture.

Watch Container Live Migration in Action

Integration with Cast AI's Evictor

Container live migration is designed to work seamlessly with Cast AI's Evictor. The integration automatically identifies which workloads can benefit from live migration versus traditional eviction, chooses the most appropriate method for each application based on workload labels and characteristics, and provides fallback protection where a live migration failure would turn into a traditional eviction of the pod.

This combination enables continuous cluster optimization without the typical constraints of stateful workloads, allowing you to achieve higher cluster utilization and cost savings while maintaining the reliability requirements of your most critical applications.

Requirements and compatibility

Container live migration requires specific infrastructure configurations and has compatibility constraints that you should understand before implementation.

For complete details on supported workload types, storage requirements, instance type compatibility, and current limitations, see Container Live Migration Requirements and Limitations.

Getting started

Ready to enable zero-downtime workload optimization? Container live migration is available as an early access feature:

  1. One-click configuration: Enable the feature through your node templates with a simple checkbox
  2. Let automation handle the rest: Cast AI automatically identifies eligible workloads and optimizes your cluster, leveraging live migration

Get Started with Container Live Migration

Technical resources

External resources