Skip to main content

What is zymtrace?

zymtrace is a zero-friction continuous profiler for both general-purpose and GPU-accelerated workloads. It helps you optimize performance without code changes, recompilation, or system restarts.

Deep Dive

Read our GPU profiling launch blog post to see how zymtrace helps organizations unlock efficient AI by identifying performance bottlenecks in CUDA kernels, optimizing inference batch sizes, and eliminating idle GPU cycles—all with zero friction.

Landing page

Key benefits​

  • Zero-Friction Deployment: No code changes, no recompilation
  • Complete Visibility: Unified view across CPU and GPU boundaries
  • Performance Optimization: Increase throughput by up to 300% with targeted improvements
  • Cost Reduction: Lower cloud spend by maximizing GPU & CPU utilization
  • Energy Efficiency: Reduce power consumption and environmental impact

How it works​

zymtrace builds complete stack traces across heterogeneous computing environments—from high-level application code through native functions and CUDA kernels, down to the Linux kernel and GPU instructions. What sets it apart is the ability to correlate GPU traces with exact CPU code paths, bridging the gap that makes optimization of heterogeneous workloads challenging.

Refer to the architecture page for details on how it works.

Supported stack​

  • Acceleration: NVIDIA CUDA, 12.x
  • ML Frameworks: PyTorch * JAV
  • CPU Profilng support langauges: Python, C/C++, Java, Go, Rust, Node.js, Ruby, PHP, .NET, Perl and Zig.
  • Environments: Cloud, containers, Kubernetes, on-premises