What is Zymtrace?
zymtrace is a zero-friction continuous profiler for both general-purpose and GPU-accelerated workloads. It helps you optimize performance without code changes, recompilation, or system restarts.
Read our GPU profiling launch blog post to see how zymtrace helps organizations unlock efficient AI by identifying performance bottlenecks in CUDA kernels, optimizing inference batch sizes, and eliminating idle GPU cycles, all with zero friction.

Key benefits​
- Zero-Friction Deployment: No code changes, no recompilation
- Complete Visibility: Unified view across CPU and GPU boundaries
- Performance Optimization: Increase throughput by up to 300% with targeted improvements
- Cost Reduction: Lower cloud spend by maximizing GPU & CPU utilization
- Energy Efficiency: Reduce power consumption and environmental impact
How it works​
zymtrace builds complete stack traces across heterogeneous computing environments, from high-level application code through native functions and CUDA kernels, down to the Linux kernel and GPU instructions. What sets it apart is the ability to correlate GPU traces with exact CPU code paths, bridging the gap that makes optimization of heterogeneous workloads challenging.
Refer to the architecture page for details on how it works.
Supported stack​
- Acceleration: NVIDIA CUDA 12.x and above. AWS Neuron, AMD ROCm, and TPUs in progress.
- ML Frameworks: PyTorch, JAX
- CPU Profiling: Python, C/C++, Java, Go, Rust, Node.js, Ruby, PHP, .NET, Perl
- Environments: Cloud, containers, Kubernetes, on-premises
For detailed language versions and minimum requirements, see Supported Languages.
Open standards, no lock-in​
zymtrace is self-hosted. Your profiling data lives in your infrastructure, in databases you control. You can bring existing instances for all three storage backends:
| Storage | Purpose | |
|---|---|---|
| ClickHouse | Profiling events and analytics | Guide |
| PostgreSQL | User data, metadata, project configuration | Guide |
| S3 / MinIO | Native debug symbols | Guide |
Because the data is in your ClickHouse, you can query it with SQL at any time, independently of the zymtrace UI.
Every piece of data visible in the UI is also accessible via a REST API. Click API Explorer inside your zymtrace instance to browse and test every endpoint interactively. zymtrace also implements the Model Context Protocol, letting AI agents in your IDE query profiling data using natural language. Learn more.
OpenTelemetry: zymtrace is fully OTel-compliant for resource attributes and metadata. The zymtrace founders were core contributors to the OTel Profiles specification (recently reached alpha), part of the team that donated the profiler and helped define it inside the OpenTelemetry project. By default the profiler uses a custom wire protocol that is 6x more compute-efficient than the OTel Profiles proto. You can switch between the zymtrace and OTel Profiles protocol at any time without losing data. See Profiler ENV & CLI Args.