Profiling GPU workloads

The zymtrace GPU profiler extends the zymtrace eBPF profiler to provide deep visibility into CUDA-enabled applications. It traces GPU performance issues—such as kernel stalls and memory bottlenecks—directly to the PyTorch code, CUDA kernels, native code, or scheduler threads that launched them.

It requires the zymtrace profiler to be installed and running first.

important

The GPU profiler is only supported on AMD64/x86_64 architecture and supports CUDA 12.x and higher.

Configuring GPU Workloads

Choose the configuration method that matches your deployment:

Kubectl
Docker

Kubernetes GPU Configuration

The zymtrace approach to GPU profiling is similar to OpenTelemetry's automatic instrumentation,but with a key difference: instead of using Kubernetes Custom Resource Definitions (CRDs) to mutate pods, zymtrace delivers GPU profiler libraries via a DaemonSet, extracting them to a shared host path. Workloads access these libraries by mounting the shared volume.

important

ensure you have the zymtrace profiler installed and running before enabling the GPU profiler.

After installing the zymtrace profiler with GPU support via Helm, you can configure your application containers to use the GPU profiler:

Mount the GPU profiler library into your container:

volumes:
  - name: zymtrace-gpu-profiler
    hostPath:
      path: /var/lib/zymtrace/profiler
      type: Directory
containers:
  - name: your-container
    volumeMounts:
      - name: zymtrace-gpu-profiler
        mountPath: /opt/zymtrace/profiler
        readOnly: true

Set the required environment variables:

env:
  - name: RUST_LOG
    value: "zymtracecudaprofiler=info"
    value: "false"
  - name: CUDA_INJECTION64_PATH
    value: "/opt/zymtrace/profiler/libzymtracecudaprofiler.so"

Example Kubernetes Job

Below is an example of a Kubernetes Job that uses the GPU profiler with a PyTorch application:

apiVersion: batch/v1
kind: Job
metadata:
  name: pytorch-cifar-training
  labels:
    app: pytorch-cifar
spec:
  parallelism: 1
  backoffLimit: 5
  template:
    metadata:
      labels:
        app: pytorch-cifar
    spec:
      restartPolicy: OnFailure
      tolerations:
        - key: "nvidia.com/gpu"
          operator: "Exists"
          effect: "NoSchedule"
      volumes:
        - name: zymtrace-gpu-profiler
          hostPath:
            path: /var/lib/zymtrace/profiler
            type: Directory
      containers:
        - name: pytorch-cifar
          image: pytorch/pytorch:latest
          volumeMounts:
            - name: zymtrace-gpu-profiler
              mountPath: /opt/zymtrace/profiler
              readOnly: true
          env:
            - name: RUST_LOG
              value: "zymtracecudaprofiler=info"
            - name: ZYMTRACE_CUDAPROFILER__PRINT_STATS
              value: "true"
            - name: ZYMTRACE_CUDAPROFILER__QUIET
              value: "false"
            - name: CUDA_INJECTION64_PATH
              value: "/opt/zymtrace/profiler/libzymtracecudaprofiler.so"
          command:
            - "python"
            - "-c"
            - |
              # PyTorch CIFAR training code
              import torch
              # ... rest of training code ...
          resources:
            limits:
              nvidia.com/gpu: 1
            requests:
              memory: "4Gi"
              cpu: "2"

Docker GPU Configuration

For Docker deployments, you need to:

Start the zymtrace profiler:

sudo docker run --pid=host --privileged --net=host \
-v /etc/machine-id:/etc/machine-id:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /sys/kernel/debug:/sys/kernel/debug:ro \
-v /var/lib/zymtrace/profiler:/opt/zymtrace-cuda-profiler \
--rm -d --name zymtrace-profiler ghcr.io/zystem-io/zymtrace-pub-profiler:25.6.11 \
--disable-tls --collection-agent <host>:80  # Point to your UI/gateway service

important

The volume mount -v /var/lib/zymtrace/profiler:/opt/zymtrace-cuda-profiler is critical. When the profiler container starts, it extracts the necessary CUDA profiler libraries to this directory, making them available to your GPU workloads.

Once the profiler has successfully extracted the libraries to the shared volume, run your GPU application by mounting the same host volume and pointing the CUDA_INJECTION64_PATH to this mounted location to enable GPU profiling

docker run --gpus all \
  -v /var/lib/zymtrace/profiler:/opt/zymtrace/profiler:ro \
  -e RUST_LOG="zymtracecudaprofiler=info" \
  -e CUDA_INJECTION64_PATH="/opt/zymtrace/profiler/libzymtracecudaprofiler.so" \
  your-gpu-image

tip

To verify the CUDA profiler libraries were successfully extracted to the shared volume, you can check the contents of the /var/lib/zymtrace/profiler directory on your host:

ls -la /var/lib/zymtrace/profiler

You should see files including libzymtracecudaprofiler.so

Verification

You can verify that the GPU profiler is working by checking for specific log entries:

Kubectl
Docker

Kubernetes Verification

# Check the application logs for GPU profiler status
kubectl logs -n your-namespace <pod-name>

You should also see periodic statistics like this in the logs:

2025-04-28T15:27:46.401606Z  INFO zymtracecudaprofiler: buffers_processed: 0.0/s (total: 0) | bytes_sent: 0.0/s (total: 0) | cubins_loaded: 0.0/s (total: 19) | cubins_size: 0.0/s (total: 44567632) | io_errors: 0.0/s (total: 0) | lru_misses: 15856.5/s (total: 261865856) |

Docker Verification

# Check the container logs
docker logs <container-id>

You should also see periodic statistics like this in the logs:

2025-04-28T15:27:46.401606Z  INFO zymtracecudaprofiler: buffers_processed: 0.0/s (total: 0) | bytes_sent: 0.0/s (total: 0) | cubins_loaded: 0.0/s (total: 19) | cubins_size: 0.0/s (total: 44567632) | io_errors: 0.0/s (total: 0) | lru_misses: 15856.5/s (total: 261865856) |

You can also verify that the profiler libraries were correctly extracted to the host volume:

# Check the contents of the shared volume directory
ls -la /var/lib/zymtrace/profiler

# You should see 
# libzymtracecudaprofiler.so

Troubleshooting

If you encounter issues with the GPU profiler, check the following:

Ensure the zymtrace profiler is running and has successfully extracted libraries to the shared volume
Verify the volume mount paths are correct
Check that the CUDA_INJECTION64_PATH points to the correct library path
Make sure your application is running on NVIDIA GPUs
Check the contents of /var/lib/zymtrace/profiler to verify the libraries are present
Review the application logs for any errors or warnings from the GPU profiler

Configuring GPU Workloads​

Kubernetes GPU Configuration​

Example Kubernetes Job​

Docker GPU Configuration​

Verification​

Kubernetes Verification​

Docker Verification​

Troubleshooting​

Configuring GPU Workloads

Kubernetes GPU Configuration

Example Kubernetes Job

Docker GPU Configuration

Verification

Kubernetes Verification

Docker Verification

Troubleshooting