zymtrace Changelog

v26.2.6

profiler: support dynamically linked NodeJS
profiler: support Perl 5.42
profiler: refactor kernel symbolization
slight memory use reduction
symbol information is now updated upon BPF program loads, similar to kernel modules
profiler: handle failures to place uprobes more gracefully
profiler: support Go 1.26.0
profiler: fix bug where script name was not propagated to all GPU traces
profiler: check PCI devices to determine if NVIDIA GPUs are present
cudaprofiler: fix rare bug that could cause profiles to not be exported after a long time, even though they were captured by the implant
cudaprofiler: fix rare segfault due to an Nvidia bug in CUPTI 12/13
cudaprofiler: update CUPTI version used for CUDA 13 workloads to 13.1
web+profiler: allow to group flamegraph by GPU
web+profiler: add -cluster-name option and ability to filter by it in the UI
ui: add new metrics dashboard pages
ui: fix an issue where a custom font size could cause parts of the UI to not be rendered correctly
ui: fix an issue that could cause the last value of a series to not be shown correctly in the metrics chart
ui: fix an issue that could cause the query bar to show different percentage values between basic and advanced query modes
ui: add indicator for active event kind in query bar
ui: add project scoping for agent config rules
ui: add MIG metrics dashboard to the GPU metrics page
ui: add support for value autocomplete inside query input in agent config
ui: add a description for each of the available agent config settings
ui: fixed an issue that could cause display options to not be persisted correctly when accepting divergent options from the URL
assistant: fix MCP access from the assistant
assistant: add Gemini 3+ preview models

v26.2.5

profiler: correctly ensure periodic resend of frame metadata

v26.2.4

profiler: retry loads of NVML to support environments where the Nvidia driver is installed after profiler startup

v26.2.3

ui: fix an issue that could cause the core count inside the host stats to show with decimal digits
profiler: performance and stability improvements for alloc profiling

v26.2.2

backend: implement new metrics schema
mcp: support for allocation profiles
profiler: fix bug in Java alloc profiling if multiple different JDKs are used simultaneously
profiler: fix duplicate frames in native unwinder on ARM64
profiler: improve performance of zing (alloc) symbolization by ~10-20%
profiler: improve export and retry behavior when running into size limits
profiler: support JVMTI based alloc profiling for HotSpot JDK (Zulu, Amazon builds, etc) for x86
ui: fix an issue that could cause the host stats to not be shown correctly in the host stats overview
ui: fix an issue where entity actions could potentially not include all relevant filters when they were executed
web: add endpoint to store/load assistant config

v26.2.1

ui: fixed an issue that could cause the wrong URL params to be applied when using the actions inside the top entities table
profiler: make errors encountered during metrics startup non-fatal
profiler: ensure periodic frame / exe data resends for when function records are TTL'ed out in ClickHouse
web: improve default NodeJS frame filters
profiler: improve behavior when frame and executable reporting fails due to network errors
- retry requests and invalidate internal caches to accelerate resends

v26.2.0

profiler: support getting GPU utilization for MIG devices
profiler: support .NET 9.0 and 10.0
profiler: support Node v24.11.1 and up
profiler: support Node on ARM64
profiler: don't load network probes on unsupported systems
profiler: significantly improve Java alloc profiling for Zing
web: fix OpenAPI schema for web metrics filter
web: allow changing settings and secrets with disabled auth
assistant: limit the flamegraph data to below 200k tokens
profiler/web: support dynamic agent configuration via the web UI (beta)
- add support for per host and per process settings (via CEL expressions also used for our search)
- add support for updating config settings without restarting the agent
- only some settings are supported as of yet, the remaining ones will be added incrementally

v26.1.2

assistant: add suggested prompts separated by CPU and GPU context
profiler: add support for metrics reporting for AWS Neuron (Inferentia/Trainium) devices
- requires a supported EC2 instance host with one or more Neuron devices
- enabled via the -enable-neuron-metrics flag
profiler: add (per process) network throughput metrics
- these are enabled by default, and can be disabled via -enable-network-metrics=false
- by default, they exclude local traffic (127.0.0.1 etc), which can be included with -track-local-traffic
profiler: make I/O throughput metrics per process
profiler: add support for SGLang as application type for automated metrics detection and extraction
- enabled (or disabled) via the -enable-sglang-metrics flag
all: replace jemalloc with mimalloc
- allows deploying the backend on non-4K page machines

v26.1.1

web: allow kernel functions to be viewed in top functions page
web: add CEL array type support
web: add ability to group by deployment name
web: improve metrics performance using pre-filtering by meta_id
ui: fix an issue that could cause the sandwich graph on the top-function page to not show up correctly after re-loading the data
ui: fix handling of non-ascii chars in metric attributes
ui: fix querybar autocomplete suggestions not showing up correctly in Firefox
ui: fix an issue that caused the middle truncation of text to not work correctly in Firefox
profiler/symdb: improve handling of file size limit if -dwarf is passed
profiler: add allocation profiling for Java (Zing + OpenJDK)
- enabled via -alloc-profile <size> flag, instructing the profiler to profile allocations every <size> bytes
- for example, -alloc-profile 8mib
profiler: reduce likelihood of incorrect executable paths (e.g. just /) showing up
- this could happen due to the kernel reporting incorrect paths directly after execve

v26.1.0

assistant: add support for PDF uploads
assistant: show consumed tokens at bottom of answer
assistant: add support for image uploads
profiler: fix edge cases in metrics collection
- processes that were fully idle for a full collection cycle would sometimes report incorrect metrics data
- systems that were launching a lot of processes might've reported incorrect metrics data
- both of those bugs are fixed now
profiler: fix GPU metrics collection in the presence of more than one GPU
profiler: automatically detect and extract metrics from supported applications
- currently vLLM is supported
- enabled (or disabled) via the -enable-vllm-metrics flag
web: add metric activity aggregation endpoint
cudaprofiler: fix several bugs
- added better handling of CUDA graphs
- fix stack trace correlation when PC offsets are disabled for GPU profiling
- fix segfault (within CUDA's code) on old, buggy CUDA driver versions
cudaprofiler: use (and ship) CUPTI 13 if runtime CUDA version supports it
ingest: fix unbalanced metrics sharding
helm: adjust service config to route by request, not connection
- ensures that MCP requests are always routed to the same web replica
ui: integrate API explorer into the web app
ui: sort "[other]" entry to the bottom in metric charts/tooltips

v25.12.6

profiler: add module names to Python <module> frames
profiler: rework stack unwinding mechanism
- can unwind stacks that are much much longer than before now
- some Python / Java stack traces were cut off in the middle before
profiler: fix an issue where in rare cases the profiler could crash during shutdown when GPU metrics collection is enabled
cudaprofiler: significant performance improvements
cudaprofiler: fix possible segfault on certain CUDA versions
cudaprofiler: print version on load if logging is enabled to ease troubleshooting
web: fix off by one in /events endpoint in the public API (beta)
web: updated top field values endpoint with pagination support
web: fix bug that could cause simplified function names to not appear on the top function page
web: reduce histogram query memory consumption
backend: fix decoding of older license keys with zing-jdk feature
assistant: improve user-facing error messages
assistant: simplify custom LLM configuration
assistant: call tree + improved default prompts
ui: hide Python runtime frames by default
ui: add fullscreen mode for diff top functions view
ui: add search support for flamegraph on top functions view
ui: initial AI Assistant for CPU flamegraph
ui: add buttons to export CPU / GPU events, and profiler diagnostic data in the Support dialog
- these can be sent to the zymtrace team for support
ui: add documentation links to agent installation steps
ui: add service and kernel version to host details page
ui: add Slack support option to Support dialog

v25.12.5

migrate: fix migration checksum checks for manually managed distributed Clickhouse deployments

v25.12.4

profiler: add ARM64 support for cudaprofiler
profiler: don't export PC samples by default
- these are still captured, and used to disassemble instructions
- but as they're high cardinality, exporting the addresses is turned off by default now, to reduce resource costs of ingest and clickhouse
- this can still be enabled by running the workload with env ZYMTRACE_CUDAPROFILER__ENABLE_PC_OFFSETS=true
profiler: fix unwinding for PGO builds of interpreters
- the previous fix for this had a small bug which is fixed now
ui: add differential functions view for GPU functions
ui: fix a bug that could cause wrong GPUs to show up on container/pod details pages
ui: fix an issue that could cause the wrong filter to be applied when using the actions inside the chart tooltip

v25.12.3

ui: fixed an issue that could prevent navigation from the "Top Entities" section on the Efficiency IQ page
profiler: remove special coloring of cuda launch frames

v25.12.2

profiler: fix bug that could cause cuda launch function frames to appear on CPU flamegraph
ui: adjust truncation of legend entries in charts to use end truncation instead of middle truncation
ui: apply sorting to tooltip entries in metric charts
ui: add select all and deselect all buttons to language filter dropdown
gateway: update Envoy to v1.36.3
- Versions ≥ v1.50.0 support automatically raising ulimits to their hard limit, which is important in K8S clusters using containerd ≥ 2.0, where default limits are very conservative now
storage: don't use EXCHANGE TABLE ClickHouse DDL
- allows deploying ClickHouse on filesystems without support for atomic file renames (renameat2)
ui: display CPU utilization metrics in cores instead of percentages

v25.12.1

ui: add light mode support
ui: show description of SASS instruction on hover for GPU profiles
ui: fixed an issue that could cause chart legend entries to not be truncated correctly in Firefox
ui: enable transport compression for flamegraph WASM blob
backend: replace gRPC health checks with HTTP health checks
symdb: retry auto upload for broken executables after a day
- interval can be configured via SYMDB__BROKEN_EXECUTABLE_RETRY_AFTER env variable (in seconds)
profiler: make auto upload size limits configurable
- these limits only apply if -dwarf is given
- ZYMTRACE_MAX_SYMBFILE_SIZE (for the symbol file) and ZYMTRACE_MAX_INPUT_FILE_SIZE (for the size of the binary file that we extract symbols from) can be used to configure this
- both are in bytes
profiler: add support for NVIDIA MIG devices (for metrics)
profiler: fix bug that could cause available compute graph (and idle time of a machine) to drop significantly, if GPU profiling was active and GPU was heavily used
cudaprofiler: fix rare segfaults in CUDA caused by incompatible version of CUPTI already being loaded in some processes
- PyTorch based processes sometimes would load an incompatible libcupti.so, which could lead to segfaults
cudaprofiler: enable sampling of kernel launches by default
cudaprofiler: expose stack trace sampling config via env vars

v25.11.6

mcp: improve time ranges and limit retries if no data found
profiler: fix unwinding for LTO/PGO builds of interpreters
- .cold parts of a split interpreter loop are now supported
profiler: add source location mapping for GPU profiling with PC sampling
profiler: add arguments to control how NVML is located
- -nvml-path allows the NVML path to be specified explicitly
- -nvml-auto-scan allows opting into an automatic scan for the NVML library
profiler: fixed an issue where in rare cases the profiler could crash during symbol extraction
cudaprofiler: flush PC samples more frequently, and track hardware buffer being full
cudaprofiler: support disassembling more SASS instructions
web/profiler: allow top GPU stall reasons to be viewed in top functions list
ui: show description of stall reason in tooltip for GPU flamegraph
- makes the stall reasons more descriptive, and offers possible solutions for reducing their impact
ui: improve click-to-copy handling for text elements
ui: add script/container name support for GPU consumer metrics
ui: persist aggregation and language settings in both URL and local storage
- allows sharing links without overwriting local settings; conflicting settings can be accepted, discarded, or kept diverged

v25.11.5

profiler: fix bug that caused GPU implant instrumentation to not work when profiler runs in docker container

v25.11.4

profiler: support zing 25.01 with JDK 11 and 17 (21 was already supported)
symblib: fix Rust function name demangling which failed in some cases
mcp: suggest using local time instead of UTC
profiler: reduce allocations in parseFDE()
profiler: place uprobes on GPU implant dynamically (https://github.com/zystem-io/zymtrace/pull/1453)
- implant no longer needs to be mapped into the kubernetes container (for the profiler)
ui: fix rendering bug in flamegraph where child node could be wider than its parent
cudaprofiler: several improvements (https://github.com/zystem-io/zymtrace/pull/1448)
- fix bug with force flushing incomplete kernels
- don't delay process synchronizations during startup for CUDA processes, leading to better CPU stack traces for the first few frames
- support building with CUDA 13
- allow to bypass GPU presence checks in profiler
profiler: support zing 25.01 with jdk 11 and 17 (21 was already supported) (https://github.com/zystem-io/zymtrace/pull/1441)
symblib: Fix Rust function name demangling which failed in some cases
mcp: Suggest using local time instead of UTC
profiler: Reduce allocations in parseFDE()
- brings parseFDE down from 60% of all allocs to 0.3%
mcp: enable collapse_go_system_frames, collapse_jvm_threads, filter_error_events, filter_unreported to reduce number of tokens in the flamegraph response
ui: add new display option collapse_go_system_frames
- default enabled and inverted in UI: "Show Go system frames"
- aggregates GC frames into "Garbage Collector"
- aggregates scheduler frames into "Scheduler"
ui: fix frame filter for java standard library functions
- it was previously filtering out functions too aggressively
profiler: improve zing symbolization
- adds support for the GC in zing, leading to more debug information being resolved, and thus deeper / longer stack traces
profiler: place uprobes on GPU implant dynamically
- implant is now detected and instrumented regardless of its path
cudaprofiler: several improvements
- improve flushing of CUPTI activity records
- don't delay process synchronizations during startup for CUDA processes, leading to better CPU stack traces for the first few frames
- support building with CUDA 13
- allow to bypass GPU presence checks in profiler

v25.11.3

mcp: change flamegraph culling to root-based culling as in the UI
ui: new display option 'filter_error_frames'
- filters error frames by default, thus allows the profiler to send error frames by default to improve CPU usage accuracy
all: add oidc/local auth support along with service tokens

v25.11.2

profiler: increase the maximum number of unwound frames from 128 to 256
- this avoids unwind errors with long stack traces and thus improves CPU attribution
profiler: improve log message when falling back from BTF to binary analysis
profiler: print out system info if attaching to tracepoints fails
mcp: add topentities as tool, resource and resource template

v25.11.0

profiler: reworked PID reporting mechanism, significantly reducing CPU usage
- Especially high impact on systems that spawn many short-lived processes
profiler: prefer DWARF symbols over Go symbols during automatic symbol upload
- Improves symbol quality for Go executables with DWARF debug info when running the profiler with -dwarf argument
profiler: more efficient stack delta extraction for native executables
- Significantly reduces the peak memory usage of the profiler in the presence of large native executables
ui: add new display option collapse_jvm_threads
- aggregates GC threads into "Garbage Collector" (see with grouped by "Thread Name" in the flamegraph)
- aggregates all GC frames into a single one called "Garbage Collector"
- also aggregates JVM JIT frames and threads in the same fashion
- turned on by default

v25.10.12

profiler: optimize performance

v25.10.11

ui: switch to relative mode if matches is used on column incompatible with absolute mode

v25.10.10

profiler: fixed an issue that prevented the script name attribute to be set for GPU traces
ui: support matches for regex matches in advanced query mode

v25.10.9

web: better errors if regex syntax is invalid in matches CEL query
backend: add local login and RBAC support with CRUDs
profiler: fix version matching logic for zing offsets
profiler: add support for more zing versions
- we now additionally support these versions
  - JDK 1.8.X with zing 24.02.X
  - JDK 1.8.X with zing 24.08.X
  - JDK 1.8.X with zing 25.02.X
  - JDK 11.0.X with zing 23.02.X
  - JDK 11.0.X with zing 23.08.X
  - JDK 11.0.X with zing 24.02.X
  - JDK 11.0.X with zing 24.08.X
  - JDK 11.0.X with zing 25.02.X
  - JDK 17.0.X with zing 24.02.X
  - JDK 21.0.X with zing 24.01.X
  - JDK 21.0.X with zing 24.03.X
  - JDK 21.0.X with zing 24.04.X
  - JDK 21.0.X with zing 24.05.X
  - JDK 21.0.X with zing 24.07.X
  - JDK 21.0.X with zing 24.09.X
  - JDK 21.0.X with zing 24.10.X
  - JDK 21.0.X with zing 24.12.X
  - JDK 21.0.X with zing 25.01.X

v25.10.8

ui: fix possible crash during flamegraph rendering

v25.10.7

profiler: add support for OpenJDK 25
profiler: fix retry logic for fetching container name(s)
gpu: support CUkernels properly in addition to CUfuncs
ui: add detail pages for namespace, pod and deployment
profiler: fix aggregations for metrics (script names were wrongly merged)

v26.2.6​

v26.2.5​

v26.2.4​

v26.2.3​

v26.2.2​

v26.2.1​

v26.2.0​

v26.1.2​

v26.1.1​

v26.1.0​

v25.12.6​

v25.12.5​

v25.12.4​

v25.12.3​

v25.12.2​

v25.12.1​

v25.11.6​

v25.11.5​

v25.11.4​

v25.11.3​

v25.11.2​

v25.11.0​

v25.10.12​

v25.10.11​

v25.10.10​

v25.10.9​

v25.10.8​

v25.10.7​

v26.2.6

v26.2.5

v26.2.4

v26.2.3

v26.2.2

v26.2.1

v26.2.0

v26.1.2

v26.1.1

v26.1.0

v25.12.6

v25.12.5

v25.12.4

v25.12.3

v25.12.2

v25.12.1

v25.11.6

v25.11.5

v25.11.4

v25.11.3

v25.11.2

v25.11.0

v25.10.12

v25.10.11

v25.10.10

v25.10.9

v25.10.8

v25.10.7