Skip to main content

Profiling GPU workloads

The zymtrace GPU profiler extends the zymtrace eBPF profiler to provide deep visibility into CUDA-enabled applications. It traces GPU performance issues—such as kernel stalls and memory bottlenecks—directly to the PyTorch code, CUDA kernels, native code, or scheduler threads that launched them.

It requires the zymtrace profiler to be installed and running first.

important

The GPU profiler is only supported on AMD64/x86_64 architecture and supports CUDA 12.x and higher.

Configuring GPU Workloads​

Choose the configuration method that matches your deployment:

The zymtrace approach to GPU profiling is similar to OpenTelemetry's automatic instrumentation,but with a key difference: instead of using Kubernetes Custom Resource Definitions (CRDs) to mutate pods, zymtrace delivers GPU profiler libraries via a DaemonSet, extracting them to a shared host path. Workloads access these libraries by mounting the shared volume.

important

ensure you have the zymtrace profiler installed and running before enabling the GPU profiler.

After installing the zymtrace profiler with GPU support via Helm, you can configure your application containers to use the GPU profiler:

  1. Mount the GPU profiler library into your container:
volumes:
- name: zymtrace-gpu-profiler
hostPath:
path: /var/lib/zymtrace/profiler
type: Directory
containers:
- name: your-container
volumeMounts:
- name: zymtrace-gpu-profiler
mountPath: /opt/zymtrace/profiler
readOnly: true
  1. Set the required environment variables:
env:
- name: RUST_LOG
value: "zymtracecudaprofiler=info"
value: "false"
- name: CUDA_INJECTION64_PATH
value: "/opt/zymtrace/profiler/libzymtracecudaprofiler.so"

Example Kubernetes Job​

Below is an example of a Kubernetes Job that uses the GPU profiler with a PyTorch application:

apiVersion: batch/v1
kind: Job
metadata:
name: pytorch-cifar-training
labels:
app: pytorch-cifar
spec:
parallelism: 1
backoffLimit: 5
template:
metadata:
labels:
app: pytorch-cifar
spec:
restartPolicy: OnFailure
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: zymtrace-gpu-profiler
hostPath:
path: /var/lib/zymtrace/profiler
type: Directory
containers:
- name: pytorch-cifar
image: pytorch/pytorch:latest
volumeMounts:
- name: zymtrace-gpu-profiler
mountPath: /opt/zymtrace/profiler
readOnly: true
env:
- name: RUST_LOG
value: "zymtracecudaprofiler=info"
- name: ZYMTRACE_CUDAPROFILER__PRINT_STATS
value: "true"
- name: ZYMTRACE_CUDAPROFILER__QUIET
value: "false"
- name: CUDA_INJECTION64_PATH
value: "/opt/zymtrace/profiler/libzymtracecudaprofiler.so"
command:
- "python"
- "-c"
- |
# PyTorch CIFAR training code
import torch
# ... rest of training code ...
resources:
limits:
nvidia.com/gpu: 1
requests:
memory: "4Gi"
cpu: "2"

Verification​

You can verify that the GPU profiler is working by checking for specific log entries:

# Check the application logs for GPU profiler status
kubectl logs -n your-namespace <pod-name>

You should also see periodic statistics like this in the logs:

2025-04-28T15:27:46.401606Z  INFO zymtracecudaprofiler: buffers_processed: 0.0/s (total: 0) | bytes_sent: 0.0/s (total: 0) | cubins_loaded: 0.0/s (total: 19) | cubins_size: 0.0/s (total: 44567632) | io_errors: 0.0/s (total: 0) | lru_misses: 15856.5/s (total: 261865856) |

Troubleshooting​

If you encounter issues with the GPU profiler, check the following:

  1. Ensure the zymtrace profiler is running and has successfully extracted libraries to the shared volume
  2. Verify the volume mount paths are correct
  3. Check that the CUDA_INJECTION64_PATH points to the correct library path
  4. Make sure your application is running on NVIDIA GPUs
  5. Check the contents of /var/lib/zymtrace/profiler to verify the libraries are present
  6. Review the application logs for any errors or warnings from the GPU profiler