Skip to main content

zymtrace Changelog

v26.1.0​

  • assistant: add support for PDF uploads
  • assistant: show consumed tokens at bottom of answer
  • assistant: add support for image uploads
  • profiler: fix edge cases in metrics collection
    • processes that were fully idle for a full collection cycle would sometimes report incorrect metrics data
    • systems that were launching a lot of processes might've reported incorrect metrics data
    • both of those bugs are fixed now
  • profiler: fix GPU metrics collection in the presence of more than one GPU
  • profiler: automatically detect and extract metrics from supported applications
    • currently vLLM is supported
    • enabled (or disabled) via the -enable-vllm-metrics flag
  • web: add metric activity aggregation endpoint
  • cudaprofiler: fix several bugs
    • added better handling of CUDA graphs
    • fix stack trace correlation when PC offsets are disabled for GPU profiling
    • fix segfault (within CUDA's code) on old, buggy CUDA driver versions
  • cudaprofiler: use (and ship) CUPTI 13 if runtime CUDA version supports it
  • ingest: fix unbalanced metrics sharding
  • helm: adjust service config to route by request, not connection
    • ensures that MCP requests are always routed to the same web replica
  • ui: integrate API explorer into the web app
  • ui: sort "[other]" entry to the bottom in metric charts/tooltips

v25.12.6​

  • profiler: add module names to Python <module> frames
  • profiler: rework stack unwinding mechanism
    • can unwind stacks that are much much longer than before now
    • some Python / Java stack traces were cut off in the middle before
  • profiler: fix an issue where in rare cases the profiler could crash during shutdown when GPU metrics collection is enabled
  • cudaprofiler: significant performance improvements
  • cudaprofiler: fix possible segfault on certain CUDA versions
  • cudaprofiler: print version on load if logging is enabled to ease troubleshooting
  • web: fix off by one in /events endpoint in the public API (beta)
  • web: updated top field values endpoint with pagination support
  • web: fix bug that could cause simplified function names to not appear on the top function page
  • web: reduce histogram query memory consumption
  • backend: fix decoding of older license keys with zing-jdk feature
  • assistant: improve user-facing error messages
  • assistant: simplify custom LLM configuration
  • assistant: call tree + improved default prompts
  • ui: hide Python runtime frames by default
  • ui: add fullscreen mode for diff top functions view
  • ui: add search support for flamegraph on top functions view
  • ui: initial AI Assistant for CPU flamegraph
  • ui: add buttons to export CPU / GPU events, and profiler diagnostic data in the Support dialog
    • these can be sent to the zymtrace team for support
  • ui: add documentation links to agent installation steps
  • ui: add service and kernel version to host details page
  • ui: add Slack support option to Support dialog

v25.12.5​

  • migrate: fix migration checksum checks for manually managed distributed Clickhouse deployments

v25.12.4​

  • profiler: add ARM64 support for cudaprofiler
  • profiler: don't export PC samples by default
    • these are still captured, and used to disassemble instructions
    • but as they're high cardinality, exporting the addresses is turned off by default now, to reduce resource costs of ingest and clickhouse
    • this can still be enabled by running the workload with env ZYMTRACE_CUDAPROFILER__ENABLE_PC_OFFSETS=true
  • profiler: fix unwinding for PGO builds of interpreters
    • the previous fix for this had a small bug which is fixed now
  • ui: add differential functions view for GPU functions
  • ui: fix a bug that could cause wrong GPUs to show up on container/pod details pages
  • ui: fix an issue that could cause the wrong filter to be applied when using the actions inside the chart tooltip

v25.12.3​

  • ui: fixed an issue that could prevent navigation from the "Top Entities" section on the Efficiency IQ page
  • profiler: remove special coloring of cuda launch frames

v25.12.2​

  • profiler: fix bug that could cause cuda launch function frames to appear on CPU flamegraph
  • ui: adjust truncation of legend entries in charts to use end truncation instead of middle truncation
  • ui: apply sorting to tooltip entries in metric charts
  • ui: add select all and deselect all buttons to language filter dropdown
  • gateway: update Envoy to v1.36.3
    • Versions ≥ v1.50.0 support automatically raising ulimits to their hard limit, which is important in K8S clusters using containerd ≥ 2.0, where default limits are very conservative now
  • storage: don't use EXCHANGE TABLE ClickHouse DDL
    • allows deploying ClickHouse on filesystems without support for atomic file renames (renameat2)

v25.12.1​

  • ui: add light mode support
  • ui: show description of SASS instruction on hover for GPU profiles
  • ui: fixed an issue that could cause chart legend entries to not be truncated correctly in Firefox
  • ui: enable transport compression for flamegraph WASM blob
  • backend: replace gRPC health checks with HTTP health checks
  • symdb: retry auto upload for broken executables after a day
    • interval can be configured via SYMDB__BROKEN_EXECUTABLE_RETRY_AFTER env variable (in seconds)
  • profiler: make auto upload size limits configurable
    • these limits only apply if -dwarf is given
    • ZYMTRACE_MAX_SYMBFILE_SIZE (for the symbol file) and ZYMTRACE_MAX_INPUT_FILE_SIZE (for the size of the binary file that we extract symbols from) can be used to configure this
    • both are in bytes
  • profiler: add support for NVIDIA MIG devices (for metrics)
  • profiler: fix bug that could cause available compute graph (and idle time of a machine) to drop significantly, if GPU profiling was active and GPU was heavily used
  • cudaprofiler: fix rare segfaults in CUDA caused by incompatible version of CUPTI already being loaded in some processes
    • PyTorch based processes sometimes would load an incompatible libcupti.so, which could lead to segfaults
  • cudaprofiler: enable sampling of kernel launches by default
  • cudaprofiler: expose stack trace sampling config via env vars

v25.11.6​

  • mcp: improve time ranges and limit retries if no data found
  • profiler: fix unwinding for LTO/PGO builds of interpreters
    • .cold parts of a split interpreter loop are now supported
  • profiler: add source location mapping for GPU profiling with PC sampling
  • profiler: add arguments to control how NVML is located
    • -nvml-path allows the NVML path to be specified explicitly
    • -nvml-auto-scan allows opting into an automatic scan for the NVML library
  • profiler: fixed an issue where in rare cases the profiler could crash during symbol extraction
  • cudaprofiler: flush PC samples more frequently, and track hardware buffer being full
  • cudaprofiler: support disassembling more SASS instructions
  • web/profiler: allow top GPU stall reasons to be viewed in top functions list
  • ui: show description of stall reason in tooltip for GPU flamegraph
    • makes the stall reasons more descriptive, and offers possible solutions for reducing their impact
  • ui: improve click-to-copy handling for text elements
  • ui: add script/container name support for GPU consumer metrics
  • ui: persist aggregation and language settings in both URL and local storage
    • allows sharing links without overwriting local settings; conflicting settings can be accepted, discarded, or kept diverged

v25.11.5​

  • profiler: fix bug that caused GPU implant instrumentation to not work when profiler runs in docker container

v25.11.4​

  • profiler: support zing 25.01 with JDK 11 and 17 (21 was already supported)
  • symblib: fix Rust function name demangling which failed in some cases
  • mcp: suggest using local time instead of UTC
  • profiler: reduce allocations in parseFDE()
  • profiler: place uprobes on GPU implant dynamically (https://github.com/zystem-io/zymtrace/pull/1453)
    • implant no longer needs to be mapped into the kubernetes container (for the profiler)
  • ui: fix rendering bug in flamegraph where child node could be wider than its parent
  • cudaprofiler: several improvements (https://github.com/zystem-io/zymtrace/pull/1448)
    • fix bug with force flushing incomplete kernels
    • don't delay process synchronizations during startup for CUDA processes, leading to better CPU stack traces for the first few frames
    • support building with CUDA 13
    • allow to bypass GPU presence checks in profiler
  • profiler: support zing 25.01 with jdk 11 and 17 (21 was already supported) (https://github.com/zystem-io/zymtrace/pull/1441)
  • symblib: Fix Rust function name demangling which failed in some cases
  • mcp: Suggest using local time instead of UTC
  • profiler: Reduce allocations in parseFDE()
    • brings parseFDE down from 60% of all allocs to 0.3%
  • mcp: enable collapse_go_system_frames, collapse_jvm_threads, filter_error_events, filter_unreported to reduce number of tokens in the flamegraph response
  • ui: add new display option collapse_go_system_frames
    • default enabled and inverted in UI: "Show Go system frames"
    • aggregates GC frames into "Garbage Collector"
    • aggregates scheduler frames into "Scheduler"
  • ui: fix frame filter for java standard library functions
    • it was previously filtering out functions too aggressively
  • profiler: improve zing symbolization
    • adds support for the GC in zing, leading to more debug information being resolved, and thus deeper / longer stack traces
  • profiler: place uprobes on GPU implant dynamically
    • implant is now detected and instrumented regardless of its path
  • cudaprofiler: several improvements
    • improve flushing of CUPTI activity records
    • don't delay process synchronizations during startup for CUDA processes, leading to better CPU stack traces for the first few frames
    • support building with CUDA 13
    • allow to bypass GPU presence checks in profiler

v25.11.3​

  • mcp: change flamegraph culling to root-based culling as in the UI
  • ui: new display option 'filter_error_frames'
    • filters error frames by default, thus allows the profiler to send error frames by default to improve CPU usage accuracy
  • all: add oidc/local auth support along with service tokens

v25.11.2​

  • profiler: increase the maximum number of unwound frames from 128 to 256
    • this avoids unwind errors with long stack traces and thus improves CPU attribution
  • profiler: improve log message when falling back from BTF to binary analysis
  • profiler: print out system info if attaching to tracepoints fails
  • mcp: add topentities as tool, resource and resource template

v25.11.0​

  • profiler: reworked PID reporting mechanism, significantly reducing CPU usage
    • Especially high impact on systems that spawn many short-lived processes
  • profiler: prefer DWARF symbols over Go symbols during automatic symbol upload
    • Improves symbol quality for Go executables with DWARF debug info when running the profiler with -dwarf argument
  • profiler: more efficient stack delta extraction for native executables
    • Significantly reduces the peak memory usage of the profiler in the presence of large native executables
  • ui: add new display option collapse_jvm_threads
    • aggregates GC threads into "Garbage Collector" (see with grouped by "Thread Name" in the flamegraph)
    • aggregates all GC frames into a single one called "Garbage Collector"
    • also aggregates JVM JIT frames and threads in the same fashion
    • turned on by default

v25.10.12​

  • profiler: optimize performance

v25.10.11​

  • ui: switch to relative mode if matches is used on column incompatible with absolute mode

v25.10.10​

  • profiler: fixed an issue that prevented the script name attribute to be set for GPU traces
  • ui: support matches for regex matches in advanced query mode

v25.10.9​

  • web: better errors if regex syntax is invalid in matches CEL query
  • backend: add local login and RBAC support with CRUDs
  • profiler: fix version matching logic for zing offsets
  • profiler: add support for more zing versions
    • we now additionally support these versions
      • JDK 1.8.X with zing 24.02.X
      • JDK 1.8.X with zing 24.08.X
      • JDK 1.8.X with zing 25.02.X
      • JDK 11.0.X with zing 23.02.X
      • JDK 11.0.X with zing 23.08.X
      • JDK 11.0.X with zing 24.02.X
      • JDK 11.0.X with zing 24.08.X
      • JDK 11.0.X with zing 25.02.X
      • JDK 17.0.X with zing 24.02.X
      • JDK 21.0.X with zing 24.01.X
      • JDK 21.0.X with zing 24.03.X
      • JDK 21.0.X with zing 24.04.X
      • JDK 21.0.X with zing 24.05.X
      • JDK 21.0.X with zing 24.07.X
      • JDK 21.0.X with zing 24.09.X
      • JDK 21.0.X with zing 24.10.X
      • JDK 21.0.X with zing 24.12.X
      • JDK 21.0.X with zing 25.01.X

v25.10.8​

  • ui: fix possible crash during flamegraph rendering

v25.10.7​

  • profiler: add support for OpenJDK 25
  • profiler: fix retry logic for fetching container name(s)
  • gpu: support CUkernels properly in addition to CUfuncs
  • ui: add detail pages for namespace, pod and deployment
  • profiler: fix aggregations for metrics (script names were wrongly merged)