GPU sharing

GPU sharing allows multiple workloads to utilize GPU resources more efficiently by enabling GPUs to be shared among different processes or workloads. Cast AI supports two primary methods for GPU sharing, each optimized for different use cases and requirements.

GPU sharing methods

Time-slicing

Time-slicing allows multiple workloads to share a single physical GPU through rapid context switching. This approach enables better GPU utilization for workloads that don't continuously require GPU resources.

Best for:

  • Development and testing environments
  • Workloads with intermittent GPU usage
  • Cost optimization when workloads don't need dedicated GPU access
  • Scenarios where hardware isolation is not required

Key characteristics:

  • Software-based sharing through context switching
  • Memory shared between all processes
  • Equal time allocation across workloads
  • Simple configuration through node templates
Learn more about time-slicing →

Multi-Instance GPU (MIG)

MIG partitions powerful GPUs into smaller, hardware-isolated instances. Each MIG instance provides dedicated memory, cache, and compute resources with guaranteed performance.

Best for:

  • Production workloads requiring guaranteed resources
  • Multi-tenant environments needing isolation
  • Workloads with consistent GPU requirements
  • Scenarios requiring fault tolerance between workloads

Key characteristics:

  • Hardware-level isolation
  • Dedicated resources per instance
  • Quality of service guarantees
  • Available on select NVIDIA GPUs (Ampere architecture and newer)

Learn more about MIG →

Combining sharing methods

Time-slicing and MIG can be combined for maximum resource utilization. This powerful combination allows multiple workloads to time-share each MIG partition, dramatically increasing the number of workloads that can run per physical GPU.

Example: A single A100 GPU with 7 MIG partitions and 4× time-sharing can support 28 concurrent workloads (7 × 4 = 28).

Choosing the right sharing method

ConsiderationTime-slicingMIG
IsolationSoftware-basedHardware-based
Resource guaranteesShared, no guaranteesDedicated, guaranteed
Setup complexitySimpleModerate
GPU requirementsAny NVIDIA GPUAmpere architecture or newer
Best use caseDevelopment, testing, variable workloadsProduction, multi-tenant, consistent workloads

Getting started

  1. Review your workload requirements and choose the appropriate sharing method
  2. Configure GPU sharing in your node templates
  3. Deploy your workloads with the appropriate node selectors and tolerations
  4. Monitor GPU utilization with GPU metrics

For general GPU setup and driver installation, see the GPU instances documentation.