On-premise symbolization
Overview​
The zymtrace global symbolization service mirrors and processes debuginfo from Linux distro repos and serves the processed symbols via our CDN at https://symbols.zymtrace.com
. The backend services automatically look up native symbols through this CDN, enabling efficient symbolization with zero configuration required from users.
For customers who do not have direct access to the internet, you can either:
- Allow firewall access to
https://symbols.zymtrace.com
(the easiest approach), or - Clone the entire bucket locally into an S3-compatible storage.
Since zymtrace already ships with MinIO
, you can use this as your local s3-compatible blob storage. The majority of our customers operate in air-gapped environments, so we provide several methods to clone the symbol bucket.
Configuration​
To use on-premise symbolization, update the Helm configuration:
custom-values.yaml
globalSymbolization:
enabled: false
config:
bucketName: ""
accessKey: ""
secretKey: ""
region: ""
endpoint: ""
region
is specific to AWS S3. We will automatically construct the URL without explicitly setting an endpoint.
but first, you need to clone the bucket.
Clone the bucket​
This guide provides commands and procedures for cloning the zystem-symbolblobs
bucket from Google Cloud Storage. The bucket contains symbol files used by zymtrace for providing function names and line numbers in application traces.
Source bucket: https://zystem-symbolblobs.storage.googleapis.com
Recommended destination bucket: zymtrace-symbols
This bucket contains ~1TB of data and growing. Ensure your destination storage has sufficient capacity (2-3 TB) and expect the transfer to take several hours depending on your network bandwidth.
- gsutil
- rclone
- AWS DataSync
- Google Cloud Storage Transfer Service
Requirements​
- Google Cloud SDK installed
- Sufficient storage space
Overview​
gsutil
is Google's command-line tool for working with Cloud Storage. This method is particularly efficient for GCS-to-GCS transfers.
Installation​
# Install Google Cloud SDK
curl https://sdk.cloud.google.com | bash
# Initialize and authenticate
gcloud init
Command​
# Create destination bucket if needed
gsutil mb -c standard -l us-central1 gs://zymtrace-symbols
# Clone with maximum performance
gsutil -o "GSUtil:parallel_process_count=2" \
-o "GSUtil:parallel_thread_count=5" \
-m rsync -r gs://zystem-symbolblobs gs://zymtrace-symbols
Performance Considerations​
- The
-m
flag enables multi-threading, critical for TB-scale transfers parallel_process_count
spawns multiple processes for parallel operationsparallel_thread_count
uses multiple threads per process- Adjust these values based on your system's CPU cores and available memory
- For memory-constrained systems, reduce these values to prevent out-of-memory errors
Monitoring Progress​
During the transfer, gsutil will display progress statistics:
Copying gs://zystem-symbolblobs/file1.pb [1/1000 files][ 10.5 MiB/ 10.5 MiB]
...
You can also use a second terminal to check the destination bucket size:
gsutil du -sh gs://zymtrace-symbols
Error Handling​
If the transfer is interrupted, simply run the same command again. The rsync
operation will resume where it left off, copying only files that weren't successfully transferred.
Requirements​
- The latest rclone is installed
- Sufficient storage space
- Destination storage system configured in rclone
Overview​
Rclone is used for syncing files and directories between various cloud storage providers. It's particularly useful for:
- Transferring between different storage providers (e.g., GCS to AWS, Azure, or MinIO)
- Environments where gsutil isn't available or preferred
- Greater control over transfer parameters
Installation​
# Install rclone on Linux
curl https://rclone.org/install.sh | sudo bash
# Verify installation
rclone --version
Configuration​
First, configure rclone to access the source GCS bucket with anonymous access:
# Setup source config
cat >> ~/.config/rclone/rclone.conf << EOF
[gcs-public]
type = s3
provider = GCS
anonymous = true
endpoint = storage.googleapis.com
EOF
# Configure your destination if not already done
# Example: rclone config
Basic Command (Same Storage Provider)​
# Create destination bucket if needed
rclone mkdir destination:zymtrace-symbols
# Fast clone command
rclone sync gcs-public:zystem-symbolblobs destination:zymtrace-symbols \
--transfers=64 \
--checkers=128 \
--buffer-size=128M \
--s3-chunk-size=64M \
--s3-upload-concurrency=16 \
--fast-list \
--progress
Cross-Cloud Transfer Commands​
GCS to AWS S3​
# Ensure AWS remote is configured
# rclone config
rclone sync gcs-public:zystem-symbolblobs aws:zymtrace-symbols \
--transfers=64 \
--checkers=128 \
--buffer-size=128M \
--s3-chunk-size=64M \
--s3-upload-concurrency=16 \
--fast-list \
--progress
GCS to Azure Blob Storage​
# Ensure Azure remote is configured
# rclone config
rclone sync gcs-public:zystem-symbolblobs azureblob:zymtrace-symbols \
--transfers=64 \
--checkers=128 \
--buffer-size=128M \
--fast-list \
--progress
GCS to MinIO​
# Ensure MinIO remote is configured
# rclone config
rclone sync gcs-public:zystem-symbolblobs minio:zymtrace-symbols \
--transfers=64 \
--checkers=128 \
--buffer-size=128M \
--s3-chunk-size=64M \
--s3-upload-concurrency=16 \
--fast-list \
--progress
Parameter Explanation​
--transfers=64
: Number of files to transfer in parallel (adjust based on your system)--checkers=128
: Number of checkers to run in parallel (for listing directories)--buffer-size=128M
: Size of in-memory buffer for transfers--s3-chunk-size=64M
: Size of multipart chunks for S3 uploads--s3-upload-concurrency=16
: Number of parts to upload in parallel--fast-list
: Use recursive listing (faster for buckets with many objects)--progress
: Display real-time transfer statistics
Monitoring and Resuming​
- The progress display shows transfer rate, ETA, and percentage complete
- If interrupted, running the same command will resume where it left off
- For additional logging:
--log-file=rclone.log --log-level=INFO
Troubleshooting​
- Memory issues: Reduce
--transfers
and--buffer-size
- Timeouts: Add
--timeout=2h
for longer operation timeout - Network errors: Add
--retries=10
to automatically retry failed transfers
AWS DataSync is a managed data transfer service that simplifies, automates, and accelerates moving data between storage systems and AWS storage services.
Refer to the official AWS guide
Google Cloud Storage Transfer Service is a managed service specifically designed for large-scale data transfers. You can use either the Google Cloud Console (UI) or command line to set up and manage transfers.
Refer to the GCP docs.
- Console Method
- Command Line Method
Console Setup Steps​
- Navigate to Google Cloud Console and open Storage Transfer Service
- Click "Create Transfer Job"
- Configure source:
- Select "Cloud Storage bucket" as source type
- Enter
zystem-symbolblobs
as the source bucket
- Configure destination:
- Select "Cloud Storage bucket" as destination type
- Enter
zymtrace-symbols
as the destination bucket
- Set schedule options (Run now or schedule recurring transfers)
- Configure advanced settings if needed
- Click "Create" to start the transfer
Monitoring​
- Go to Transfer Jobs page to view progress
- Click on job ID to see detailed logs and statistics
- Real-time metrics are available in Cloud Monitoring
Command​
# Using gcloud CLI
gcloud transfer jobs create \
--source=gcs,gs://zystem-symbolblobs \
--destination=gcs,gs://zymtrace-symbols
Advanced Options​
# Schedule a daily transfer
gcloud transfer jobs create \
--source=gcs,gs://zystem-symbolblobs \
--destination=gcs,gs://zymtrace-symbols \
--schedule-starts=2023-01-01T00:00:00Z \
--schedule-repeats-every=24h
Monitoring via CLI​
# List all transfer jobs
gcloud transfer jobs list
# Check status of a specific job
gcloud transfer jobs describe JOB_NAME
After Cloning​
Once you've successfully cloned the symbol bucket, update your custom-values.yaml
with the appropriate bucket information and deploy your zymtrace instance. The system will now use your local symbol storage instead of the internet-hosted CDN.