Symbolization
What is symbolization?​
Symbolization is particularly critical for zymtrace because we operate as a whole-system profiler. Unlike application performance monitoring (APM) tools and language-specific profilers that require code changes, runtime instrumentation, or language-specific agents, zymtrace reads raw memory addresses directly from the system without touching your application's runtime or requiring any modifications to your code.
This non-intrusive approach means we collect instruction addresses exactly as they appear in memory during execution. Without symbolization, these profiles would show only hexadecimal addresses that are impossible to interpret or act upon. Symbolization transforms these raw instruction addresses into meaningful source code locations, bridging the gap between low-level system data and high-level developer understanding.
Native frames collected during software profiling require symbols to be useful. A symbol for a particular frame is a list of <file name, function name, source line> triplets. It is a list because compiler inlining may associate multiple functions and source lines to a single machine frame.
Native frames collected by the zymtrace whole-system profiler may originate from:
- CUDA runtime and NVIDA GPU kernels
- System libraries (part of container images or the host OS)
- Operating system daemons
- Third-party software
- Native libraries used by in-house software
The symbolization challenge​
Most production workloads strip debug symbols from binaries to reduce container image size and improve load times. Even when symbols are present, their quality varies significantly - many contain only basic function names without file locations or line numbers. This creates a gap between the raw addresses collected during profiling and the detailed source mappings developers need for effective optimization.
Our symbolization approach​
zymtrace addresses missing or low-quality symbols through a robust three-tier approach:
- Automatic symbol upload – The profiler automatically uploads symbols if available on the local system
- Global symbolization service – Falls back to pre-processed symbols for public libraries when local symbols are missing or insufficient
- Manual symbol upload – CLI tool for uploading proprietary symbols manually or via CI/CD pipeline for maximum quality
Our solution provides:
Symbol types and quality levels​
All symbols processed and stored by zymtrace—whether uploaded automatically, manually, or via the global symbolization service—are saved in the GSYM format. GSYM (Generic Symbol Format) is designed for compactness and fast lookups, making it ideal for large-scale symbolization databases and high-performance profiling workflows.
The quality and completeness of symbolization depends on the type of application and available debug information:
Interpreted languages​
Applications written in interpreted languages (Python, JavaScript, Ruby, etc.) have file names and line numbers extracted directly from process memory. The exception is .NET applications, which require additional symbol processing.
Native applications​
Native applications present varying levels of symbol quality depending on available debug information. Understanding these different symbol sources helps explain why some profiles show complete source locations while others only display function names.