Install zymtrace profiler
The zymtrace profiler is a lightweight, OpenTelemetry compliant agent that collects performance profiles from your applications and systems with minimal overhead (<1% CPU
and ~256MB RAM
). It can be deployed in various ways to suit your infrastructure needs. This guide covers installation using Kubernetes manifests, Helm charts, Docker containers, or direct binary installation.
Please review the prerequisites before beginning, particularly if you are working in an airgapped environment.
Installation methods​
Choose the installation method that best suits your environment. Each method provides the same functionality with different deployment characteristics.
- Kubectl
- Helm
- Docker
- Binary
Install with Kubectl​
The profiler agent is deployed as a DaemonSet.
-
Create a namespace
kubectl create namespace zymtrace
-
Deploy
kubectl apply -n zymtrace -f https://helm.zystem.io/k8s-manifests/profiler/zymtrace-profiler.yaml
Important: If you need GPU profiling capabilities, we recommend using the Helm installation method instead, which provides more configuration options for enabling GPU profiling.
Collection Agent Configuration
By default, the collection agent is set to zymtrace-gateway.zymtrace.svc.cluster.local:80
.
Remember that the collection agent should point to the zymtrace gateway service. If you're installing the profiling agent on a different cluster than the one hosting the backend services, you'll need to modify this setting. In this case, we recommend downloading the configuration file first.
curl -O https://helm.zystem.io/k8s-manifests/profiler/zymtrace-profiler.yaml
Install with Helm​
# Add the zystem repository
helm repo add zymtrace https://helm.zystem.io
# List available charts and versions
helm search repo zymtrace --versions
# Install
helm install profiler zymtrace/profiler \
--create-namespace \
--namespace zymtrace \
--set profiler.args[0]="--collection-agent=zymtrace-gateway.zymtrace.svc.cluster.local:80" \
--set profiler.args[1]="--disable-tls" \
--set profiler.args[2]="--project=colossus" \
--set profiler.args[3]="--tags=prod;us-east" \
--set "profiler.env.HTTPS_PROXY=http://username:password@proxy:port"
Enabling GPU Profiling​
GPU profiling is only available on AMD64/x86_64 architecture.
To enable GPU profiling capabilities with Helm, add the cudaProfiler.enabled
setting.
Optionally, include the --enable-gpu-metrics
flag to collect GPU metrics as well as shown below:
# Install with GPU profiling and metrics enabled
helm install profiler zymtrace/profiler \
--create-namespace \
--namespace zymtrace \
--set profiler.cudaProfiler.enabled=true \
--set profiler.args[0]="--collection-agent=zymtrace-gateway.zymtrace.svc.cluster.local:80" \
--set profiler.args[1]="--disable-tls" \
--set profiler.args[2]="--enable-gpu-metrics" # Optional: remove to profile CUDA without metrics
This will automatically:
- Deploy the necessary GPU profiling libraries
- Configure the volume mounts for sharing libraries between containers
- Extract the profiling libraries to the shared host path
- Make GPU profiling available to all containers that mount the shared volume
- If defined, enable GPU metrics collection (power usage, memory utilization, temperature, and performance metrics)
The --enable-gpu-metrics
flag is recommended for comprehensive GPU monitoring, but you can remove it if you only want CUDA profiling without metrics collection. You can profile CUDA applications without collecting GPU metrics.
You can also collect only GPU metrics without profiling.
Using custom-values.yaml​
Using custom values
You can create a custom values file with your configurations:
profiler:
args:
- "--collection-agent=zymtrace-gateway.zymtrace.svc.cluster.local:80" # Point to your gateway service
- "--disable-tls"
- "--enable-gpu-metrics"
- "--project=my-project-123"
- "--tags=prod;ha100;fp16"
# Enable GPU profiling
cudaProfiler:
enabled: true
hostMountPath: "/var/lib/zymtrace/profiler" # Default path
env:
HTTPS_PROXY: "http://user:[email protected]:8080"
Then install using:
helm install profiler zymtrace/profiler \
--create-namespace \
--namespace zymtrace \
-f custom-values.yaml
Next Step: Hook up your CUDA application​
After enabling the GPU profiling module in the profiler, connect it to your CUDA application by referring to the GPU Profiler documentation.
Install with Docker​
docker run --cgroupns=host --pid=host --privileged --net=host \
-v /etc/machine-id:/etc/machine-id:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /sys/kernel/debug:/sys/kernel/debug:ro \
--rm -d --name zymtrace-profiler ghcr.io/zystem-io/zymtrace-pub-profiler:25.9.4 \
--disable-tls --collection-agent <host>:8080
Be sure to update the collection agent value to point to your gateway service.
Enabling GPU Profiling​
GPU profiling is only available on AMD64/x86_64 architecture.
To enable GPU profiling capabilities with Docker, mount an additional volume.
Optionally, include the --enable-gpu-metrics
flag to collect GPU metrics as shown below.
docker run --cgroupns=host --pid=host --privileged --net=host \
-v /etc/machine-id:/etc/machine-id:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /sys/kernel/debug:/sys/kernel/debug:ro \
-v /var/lib/zymtrace/profiler:/opt/zymtrace-cuda-profiler \
--rm -d --name zymtrace-profiler ghcr.io/zystem-io/zymtrace-pub-profiler:25.9.4 \
--disable-tls
--collection-agent <host>:8080
--enable-gpu-metrics # Optional: remove to profile CUDA without metrics
This will automatically:
- Extract the necessary CUDA profiler libraries to the host path (
/var/lib/zymtrace/profiler
) - Make GPU profiling available to your GPU workloads
- Enable GPU metrics collection when the flag is included
The --enable-gpu-metrics
flag is recommended for comprehensive GPU monitoring, but you can remove it if you only want CUDA profiling without metrics collection. You can profile CUDA applications without collecting GPU metrics.
You can also collect only GPU metrics without profiling.
To verify the profiler libraries were extracted correctly:
# Check the contents of the shared volume directory
ls -la /var/lib/zymtrace/profiler
# You should see
# libzymtracecudaprofiler.so
Next Step: Hook up your CUDA application​
After enabling the GPU profiling module in the profiler, connect it to your CUDA application by referring to the GPU Profiler documentation.
Install with Binary​
- x86_64
- ARM64
x86_64 Installation​
# Download the binaries
curl -LO https://dl.zystem.io/zymtrace/25.9.4/amd64/zymtrace-profiler.tar.gz
# Extract them
sudo tar -xzvf zymtrace-profiler.tar.gz -C / --no-same-owner
# Run the agent with optional GPU metrics
sudo /opt/zymtrace/profiler/zymtrace-profiler --collection-agent localhost:8080 --disable-tls --enable-gpu-metrics # Optional: remove to disable GPU metrics
Enabling GPU Profiling​
Unlike Helm and Docker installations, there are no explicit steps to enable GPU profiling with the binary installation. You can directly start profiling your CUDA application:
env RUST_LOG="zymtracecudaprofiler=info" \
CUDA_INJECTION64_PATH="/opt/zymtrace/profiler/libzymtracecudaprofiler.so" \
python -u matmul.py
The --enable-gpu-metrics
flag is recommended for comprehensive GPU monitoring, but you can remove it if you only want CUDA profiling without metrics collection. You can profile CUDA applications without collecting GPU metrics.
You can also collect only GPU metrics without profiling.
ARM64 Installation​
You can only profile CPU workloads and collect GPU metrics on ARM64. GPU profiling is only available on AMD64/x86_64 architecture.
# Download the binaries
curl -LO https://dl.zystem.io/zymtrace/25.9.4/arm64/zymtrace-profiler.tar.gz
# Extract them
sudo tar -xzvf zymtrace-profiler.tar.gz -C / --no-same-owner
# Run the agent with optional GPU metrics
sudo /opt/zymtrace/profiler/zymtrace-profiler --collection-agent localhost:8080 --disable-tls --enable-gpu-metrics # Optional: remove to disable GPU metrics
The --enable-gpu-metrics
flag enables GPU monitoring (power, memory, temperature, performance) on systems with GPUs. Remove it if you don't need GPU metrics collection.
Be sure to update the --collection-agent
value to point to your zymtrace gateway service.
Management​
These commands help you monitor and maintain your zymtrace profiler installation. Use them to check the agent's status and logs.
- Kubectl
- Helm
- Docker
- Systemd
Kubectl Management​
# Check agent status
kubectl get pods -n zymtrace -l app=zymtrace,component=profiler
# View agent logs
kubectl logs -f -n zymtrace -l app=zymtrace,component=profiler
Helm Management​
# Check profiler status (not agent)
helm status profiler -n zymtrace
# Upgrade charts, force a new image pull
helm upgrade profiler zymtrace/profiler --n zymtrace --set global.imagePullPolicy=Always --reuse-values
# Get pod status
kubectl get pods -n zymtrace -l app=zymtrace,component=profiler
# Upgrade profiler (not agent)
helm upgrade profiler zymtrace/profiler -n zymtrace
# View profiler values (not agent)
helm get values profiler -n zymtrace
# Uninstall profiler (not agent)
helm uninstall profiler -n zymtrace
Docker Management​
# View running agents
docker ps | grep zymtrace/profiler
# View agent logs
docker logs <container-id>
# Stop agent
docker stop <container-id>
# Remove agent container
docker rm <container-id>
Setting up systemd service​
To manage the profiler as a systemd service:
-
Create the zymtrace directory:
sudo mkdir -p /opt/zymtrace
-
Move the downloaded agent binary to the installation directory:
sudo mv zymtrace-profiler /opt/zymtrace/
-
Make the binary executable:
sudo chmod +x /opt/zymtrace/zymtrace-profiler
-
Create a systemd service file:
sudo vi /etc/systemd/system/zymtrace.service
-
Copy and paste the following configuration into the file:
[Unit]
Description=zymtrace profiler service
After=network.target
[Service]
Type=simple
ExecStart=/opt/zymtrace/zymtrace-profiler --collection-agent localhost:8080 --disable-tls
Restart=always
RestartSec=10
WorkingDirectory=/opt/zymtrace
[Install]
WantedBy=multi-user.target
Enabling GPU Metrics Collection​
To enable GPU metrics collection, add the --enable-gpu-metrics
flag to the ExecStart line:
[Unit]
Description=zymtrace profiler service
After=network.target
[Service]
Type=simple
ExecStart=/opt/zymtrace/zymtrace-profiler --collection-agent localhost:8080 --disable-tls --enable-gpu-metrics
Restart=always
RestartSec=10
WorkingDirectory=/opt/zymtrace
[Install]
WantedBy=multi-user.target
This enables GPU metrics collection including power usage, memory utilization, temperature, and performance metrics. For more details, see the GPU Metrics documentation.
Be sure to update the --collection-agent
value in the systemd service file to point to your zymtrace gateway service.
-
Enable the service
# Reload daemon
sudo systemctl daemon-reload
# Start agent
sudo systemctl start zymtrace
# Enable agent start on boot
sudo systemctl enable zymtrace
# Check agent status
sudo systemctl status zymtrace
Management commands​
Once the systemd service is set up, use these commands to manage the profiler:
# Check systemd service status
sudo systemctl status zymtrace
# View service logs
sudo journalctl -u zymtrace -f
# Restart service
sudo systemctl restart zymtrace
# Stop service
sudo systemctl stop zymtrace