Symbolization
What is symbolization?​
Symbolization is particularly critical for zymtrace because we operate as a whole-system profiler. Unlike application performance monitoring (APM) tools and language-specific profilers that require code changes, runtime instrumentation, or language-specific agents, zymtrace reads raw memory addresses directly from the system without touching your application's runtime or requiring any modifications to your code.
This non-intrusive approach means we collect instruction addresses exactly as they appear in memory during execution. Without symbolization, these profiles would show only hexadecimal addresses that are impossible to interpret or act upon. Symbolization transforms these raw instruction addresses into meaningful source code locations, bridging the gap between low-level system data and high-level developer understanding.
Native frames collected during software profiling require symbols to be useful. A symbol for a particular frame is a list of <file name, function name, source line> triplets. It is a list because compiler inlining may associate multiple functions and source lines to a single machine frame.
Native frames collected by the zymtrace whole-system profiler may originate from:
- CUDA runtime and NVIDA GPU kernels
- System libraries (part of container images or the host OS)
- Operating system daemons
- Third-party software
- Native libraries used by in-house software
The symbolization challenge​
Most production workloads strip debug symbols from binaries to reduce container image size and improve load times. Even when symbols are present, their quality varies significantly - many contain only basic function names without file locations or line numbers. This creates a gap between the raw addresses collected during profiling and the detailed source mappings developers need for effective optimization.