Multi-GPU Profiling
zymtrace is designed from the ground up for multi-GPU, multi-host, multi-workload profiling. There is nothing special you need to do: simply deploy the profiler on every node you want to observe and all data flows into the same backend, ready to be explored as a unified view or sliced by any dimension you care about.
How It Works​
The zymtrace profiler runs as a lightweight, zero-friction agent on each host. When deployed as a Kubernetes DaemonSet it is placed on every node in the cluster automatically. For bare-metal or VM deployments you install one profiler binary per host. Either way, every agent sends its data to the same backend ingest service, which stores and indexes it in ClickHouse together with all the OTEL resource attributes that identify where the data came from (host name, pod name, namespace, container, etc.).
There are no extra flags to enable multi-GPU support. The same command you use for a single node works across an entire fleet.
Because every profiler agent reports to the same backend, adding more GPU nodes never requires changes to your backend setup. Just deploy the agent on the new host and data appears automatically.
Filtering and Grouping Data​
Once data is flowing in from multiple nodes, use the filter system to slice it any way you need. The filter bar is available on every page (Efficiency IQ, Top Functions, Top Entities, Flamegraphs, and Diff) and anything you set persists as you navigate between views.
Key Attributes for Multi-GPU Workflows​
| Attribute | Example Value | Use Case |
|---|---|---|
host.name | gpu-node-42 | Isolate a single physical or virtual machine |
cluster.name | training-cluster-prod | Scope to an entire cluster |
gpu.name | NVIDIA H100 80GB HBM3 | Filter by GPU model across all nodes |
k8s.namespace.name | ml-production | Scope to a whole Kubernetes namespace |
k8s.deployment.name | vllm-deployment | Compare all replicas of a deployment |
k8s.pod.name | vllm-server-0 | Focus on one pod across any node it runs on |
container.name | inference-worker | Drill into a specific container |
user.tag | gpu_type:h100 | Filter by your own custom tags |
For the full list of supported filter attributes see Filtering Data.
Fleet-Wide Analysis with Top Entities​
The Top Entities view is the natural starting point for fleet-wide GPU analysis. Switch to the Hosts grouping to see every GPU node ranked by CPU or GPU resource consumption in a single list. From there you can:
- Spot outliers: a node consuming far more or far less than its peers is worth investigating.
- Drill into a specific node: click any host to see its GPU metrics (utilization, memory, power, NVLink throughput) and navigate directly to flamegraphs or top functions for that node.
- Group by Namespace, Deployment, or Pod: understand how GPU resources are distributed across teams, services, or replicas.
Comparing GPU Nodes with Diff​
If you want to compare the performance of two GPU nodes, for example an H100 versus an A100 running the same workload, use the Diff view. Apply a filter for each node in the two time windows being compared to get a function-level breakdown of what differs.