GPU sharing
GPU sharing allows multiple workloads to utilize GPU resources more efficiently by enabling GPUs to be shared among different processes or workloads. Cast AI supports two primary methods for GPU sharing, each optimized for different use cases and requirements.
GPU sharing methods
Time-slicing
Time-slicing allows multiple workloads to share a single physical GPU through rapid context switching. This approach enables better GPU utilization for workloads that don't continuously require GPU resources.
Best for:
- Development and testing environments
- Workloads with intermittent GPU usage
- Cost optimization when workloads don't need dedicated GPU access
- Scenarios where hardware isolation is not required
Key characteristics:
- Software-based sharing through context switching
- Memory shared between all processes
- Equal time allocation across workloads
- Simple configuration through node templates
Multi-Instance GPU (MIG)
MIG partitions powerful GPUs into smaller, hardware-isolated instances. Each MIG instance provides dedicated memory, cache, and compute resources with guaranteed performance.
Best for:
- Production workloads requiring guaranteed resources
- Multi-tenant environments needing isolation
- Workloads with consistent GPU requirements
- Scenarios requiring fault tolerance between workloads
Key characteristics:
- Hardware-level isolation
- Dedicated resources per instance
- Quality of service guarantees
- Available on select NVIDIA GPUs (Ampere architecture and newer)
Combining sharing methods
Time-slicing and MIG can be combined for maximum resource utilization. This powerful combination allows multiple workloads to time-share each MIG partition, dramatically increasing the number of workloads that can run per physical GPU.
Example: A single A100 GPU with 7 MIG partitions and 4× time-sharing can support 28 concurrent workloads (7 × 4 = 28).
Choosing the right sharing method
| Consideration | Time-slicing | MIG |
|---|---|---|
| Isolation | Software-based | Hardware-based |
| Resource guarantees | Shared, no guarantees | Dedicated, guaranteed |
| Setup complexity | Simple | Moderate |
| GPU requirements | Any NVIDIA GPU | Ampere architecture or newer |
| Best use case | Development, testing, variable workloads | Production, multi-tenant, consistent workloads |
Getting started
- Review your workload requirements and choose the appropriate sharing method
- Configure GPU sharing in your node templates
- Deploy your workloads with the appropriate node selectors and tolerations
- Monitor GPU utilization with GPU metrics
For general GPU setup and driver installation, see the GPU instances documentation.
Updated about 2 hours ago
