GPU sharing allows multiple workloads to utilize GPU resources more efficiently by enabling GPUs to be shared among different processes or workloads. Cast AI supports two primary methods for GPU sharing, each optimized for different use cases and requirements.

GPU sharing methods

Time-slicing

Time-slicing allows multiple workloads to share a single physical GPU through rapid context switching. This approach enables better GPU utilization for workloads that don't continuously require GPU resources.

Best for:

Development and testing environments
Workloads with intermittent GPU usage
Cost optimization when workloads don't need dedicated GPU access
Scenarios where hardware isolation is not required

Key characteristics:

Software-based sharing through context switching
Memory shared between all processes
Equal time allocation across workloads
Simple configuration through node templates

Learn more about time-slicing →

Multi-Instance GPU (MIG)

MIG partitions powerful GPUs into smaller, hardware-isolated instances. Each MIG instance provides dedicated memory, cache, and compute resources with guaranteed performance.

Best for:

Production workloads requiring guaranteed resources
Multi-tenant environments needing isolation
Workloads with consistent GPU requirements
Scenarios requiring fault tolerance between workloads

Key characteristics:

Hardware-level isolation
Dedicated resources per instance
Quality of service guarantees
Available on select NVIDIA GPUs (Ampere architecture and newer)

Learn more about MIG →

Combining sharing methods

Time-slicing and MIG can be combined for maximum resource utilization. This powerful combination allows multiple workloads to time-share each MIG partition, dramatically increasing the number of workloads that can run per physical GPU.

Example: A single A100 GPU with 7 MIG partitions and 4× time-sharing can support 28 concurrent workloads (7 × 4 = 28).

Choosing the right sharing method

Consideration	Time-slicing	MIG
Isolation	Software-based	Hardware-based
Resource guarantees	Shared, no guarantees	Dedicated, guaranteed
Setup complexity	Simple	Moderate
GPU requirements	Any NVIDIA GPU	Ampere architecture or newer
Best use case	Development, testing, variable workloads	Production, multi-tenant, consistent workloads

Getting started

Review your workload requirements and choose the appropriate sharing method
Configure GPU sharing in your node templates
Deploy your workloads with the appropriate node selectors and tolerations
Monitor GPU utilization with GPU metrics

For general GPU setup and driver installation, see the GPU instances documentation.