GPU utilization

GPU utilization monitoring helps you track and optimize the usage of GPU resources across your Kubernetes workloads. This feature, which is part of Cast AI's Cost Monitoring suite, provides detailed insights into GPU usage patterns, memory waste, and associated costs at both cluster and workload levels.

Prerequisites

Before you can access GPU utilization metrics, ensure that:

  1. Your cluster has GPU-enabled nodes
  2. You have workloads running on GPU nodes
  3. The GPU metrics exporter is installed and running in your cluster

Accessing GPU metrics

Once the prerequisites are met, GPU utilization metrics become available in the Cast AI console:

  1. The cluster dashboard - shows aggregate GPU metrics

  1. The workloads report under cost monitoring - provides detailed per-workload GPU metrics

Cluster dashboard metrics

The cluster dashboard displays key aggregate GPU metrics:

In addition to the above, you will find a GPU utilization graph in the dashboard, which highlights average GPU and GPU Memory utilization across your selected time window:

Workload-level metrics

When you select a specific workload from the workloads report, you'll see detailed GPU metrics organized into three tabs: utilization, memory waste, and cost waste. Each view provides different insights into how your workload uses its GPU resources

Utilization

The utilization view shows how effectively your workload uses its allocated GPU compute capacity. An interactive graph tracks the average GPU utilization over time, helping you identify patterns in resource usage.

Utilization tab

Utilization tab

Memory waste

Switch to the memory waste tab to understand how your workload manages GPU memory. The view breaks down available versus used memory and visualizes memory usage trends. This information helps identify if you're over-provisioning GPU memory, which is directly impacting your costs. The graph shows both used and available memory over time, making it easy to spot opportunities for optimization.

Memory waste tab

Memory waste tab

Cost waste

The cost waste tab translates your GPU usage into financial terms. It breaks down costs by instance lifecycle: on-demand, spot, and fallback instances - and shows how these costs accumulate over time.

Cost waste tab

Cost waste tab