Profiling GPU workloads
The zymtrace GPU profiler extends the zymtrace eBPF profiler to provide deep visibility into CUDA-enabled applications. It traces GPU performance issues—such as kernel stalls and memory bottlenecks—directly to the PyTorch code, CUDA kernels, native code, or scheduler threads that launched them.
It requires the zymtrace profiler to be installed and running first.
The GPU profiler is only supported on AMD64/x86_64 architecture and supports CUDA 12.x and higher.
Configuring GPU Workloads​
Choose the configuration method that matches your deployment:
- Kubectl
- Docker
The zymtrace approach to GPU profiling is similar to OpenTelemetry's automatic instrumentation,but with a key difference: instead of using Kubernetes Custom Resource Definitions (CRDs) to mutate pods, zymtrace delivers GPU profiler libraries via a DaemonSet, extracting them to a shared host path. Workloads access these libraries by mounting the shared volume.
ensure you have the zymtrace profiler installed and running before enabling the GPU profiler.
After installing the zymtrace profiler with GPU support via Helm, you can configure your application containers to use the GPU profiler:
- Mount the GPU profiler library into your container:
volumes:
- name: zymtrace-gpu-profiler
hostPath:
path: /var/lib/zymtrace/profiler
type: Directory
containers:
- name: your-container
volumeMounts:
- name: zymtrace-gpu-profiler
mountPath: /opt/zymtrace/profiler
readOnly: true
- Set the required environment variables:
env:
- name: RUST_LOG
value: "zymtracecudaprofiler=info"
value: "false"
- name: CUDA_INJECTION64_PATH
value: "/opt/zymtrace/profiler/libzymtracecudaprofiler.so"
Example Kubernetes Job​
Below is an example of a Kubernetes Job that uses the GPU profiler with a PyTorch application:
apiVersion: batch/v1
kind: Job
metadata:
name: pytorch-cifar-training
labels:
app: pytorch-cifar
spec:
parallelism: 1
backoffLimit: 5
template:
metadata:
labels:
app: pytorch-cifar
spec:
restartPolicy: OnFailure
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: zymtrace-gpu-profiler
hostPath:
path: /var/lib/zymtrace/profiler
type: Directory
containers:
- name: pytorch-cifar
image: pytorch/pytorch:latest
volumeMounts:
- name: zymtrace-gpu-profiler
mountPath: /opt/zymtrace/profiler
readOnly: true
env:
- name: RUST_LOG
value: "zymtracecudaprofiler=info"
- name: ZYMTRACE_CUDAPROFILER__PRINT_STATS
value: "true"
- name: ZYMTRACE_CUDAPROFILER__QUIET
value: "false"
- name: CUDA_INJECTION64_PATH
value: "/opt/zymtrace/profiler/libzymtracecudaprofiler.so"
command:
- "python"
- "-c"
- |
# PyTorch CIFAR training code
import torch
# ... rest of training code ...
resources:
limits:
nvidia.com/gpu: 1
requests:
memory: "4Gi"
cpu: "2"
For Docker deployments, you need to:
- Start the zymtrace profiler:
sudo docker run --pid=host --privileged --net=host \
-v /etc/machine-id:/etc/machine-id:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /sys/kernel/debug:/sys/kernel/debug:ro \
-v /var/lib/zymtrace/profiler:/opt/zymtrace-cuda-profiler \
--rm -d --name zymtrace-profiler ghcr.io/zystem-io/zymtrace-pub-profiler:latest \
--disable-tls --collection-agent <host>:80 # Point to your UI/gateway service
The volume mount -v /var/lib/zymtrace/profiler:/opt/zymtrace-cuda-profiler
is critical. When the profiler container starts, it extracts the necessary CUDA profiler libraries to this directory, making them available to your GPU workloads.
- Once the profiler has successfully extracted the libraries to the shared volume, run your GPU application by mounting the same host volume and pointing the
CUDA_INJECTION64_PATH
to this mounted location to enable GPU profiling
docker run --gpus all \
-v /var/lib/zymtrace/profiler:/opt/zymtrace/profiler:ro \
-e RUST_LOG="zymtracecudaprofiler=info" \
-e CUDA_INJECTION64_PATH="/opt/zymtrace/profiler/libzymtracecudaprofiler.so" \
your-gpu-image
To verify the CUDA profiler libraries were successfully extracted to the shared volume, you can check the contents of the /var/lib/zymtrace/profiler
directory on your host:
ls -la /var/lib/zymtrace/profiler
You should see files including libzymtracecudaprofiler.so
Verification​
You can verify that the GPU profiler is working by checking for specific log entries:
- Kubectl
- Docker
# Check the application logs for GPU profiler status
kubectl logs -n your-namespace <pod-name>
You should also see periodic statistics like this in the logs:
2025-04-28T15:27:46.401606Z INFO zymtracecudaprofiler: buffers_processed: 0.0/s (total: 0) | bytes_sent: 0.0/s (total: 0) | cubins_loaded: 0.0/s (total: 19) | cubins_size: 0.0/s (total: 44567632) | io_errors: 0.0/s (total: 0) | lru_misses: 15856.5/s (total: 261865856) |
# Check the container logs
docker logs <container-id>
You should also see periodic statistics like this in the logs:
2025-04-28T15:27:46.401606Z INFO zymtracecudaprofiler: buffers_processed: 0.0/s (total: 0) | bytes_sent: 0.0/s (total: 0) | cubins_loaded: 0.0/s (total: 19) | cubins_size: 0.0/s (total: 44567632) | io_errors: 0.0/s (total: 0) | lru_misses: 15856.5/s (total: 261865856) |
You can also verify that the profiler libraries were correctly extracted to the host volume:
# Check the contents of the shared volume directory
ls -la /var/lib/zymtrace/profiler
# You should see
# libzymtracecudaprofiler.so
Troubleshooting​
If you encounter issues with the GPU profiler, check the following:
- Ensure the zymtrace profiler is running and has successfully extracted libraries to the shared volume
- Verify the volume mount paths are correct
- Check that the CUDA_INJECTION64_PATH points to the correct library path
- Make sure your application is running on NVIDIA GPUs
- Check the contents of
/var/lib/zymtrace/profiler
to verify the libraries are present - Review the application logs for any errors or warnings from the GPU profiler