Storage configuration
zymtrace uses ClickHouse, Postgres and S3-compatible Object Storage as its primary storage backends. For a deeper understanding of how these components interact, refer to the architecture section.
To accommodate different operational needs, zymtrace provides two deployment modes:
- Create Mode (
create
): Deploys and manages the storage service within your cluster. This is ideal for quick setup and seamless integration with zymtrace. - Existing Mode (
use_existing
): Connects to your existing ClickHouse, Postgres, or S3 storage, eliminating the need to manage additional infrastructure.
This guide walks you through configuring ClickHouse, Postgres, and S3-compatible storage in both modes.
ClickHouse Configuration​
- Create Mode
- Use Existing
Deploy new ClickHouse instance​
This mode deploys and manages ClickHouse within your cluster.
ClickHouse Create Mode Configuration
clickhouse:
mode: "create"
create:
image:
repository: clickhouse/clickhouse-server
tag: "25.3.2.39"
config:
user: "clickhouse"
password: "clickhouse123"
database: "zymtrace"
service:
http:
port: 8123
native:
port: 9000
replicas: 1
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
storage:
type: "persistent" # "persistent" or "empty_dir"
size: 30Gi
className: ""
Connect to existing ClickHouse​
This mode connects to your existing ClickHouse cluster.
Database Setup Options
You have two options for setting up the required databases:
- Manual Setup : Create databases and users manually using SQL commands.
- Automatic Setup: Enable
autoCreateDBs
to let zymtrace create databases during migration
Option 1: Manual Database Setup​
When connecting to an existing ClickHouse cluster, we strongly recommend that you create a dedicated user, assign appropriate permissions, and create a new database for zymtrace
.
This script will drop the zymtrace_profiling
database if it exists. Run it only if this is your first time setting up zymtrace with ClickHouse, or if you intentionally want to reset the database, knowing that this will permanently delete any existing data.
-- Clean slate: remove any existing profiling database
DROP DATABASE IF EXISTS zymtrace_profiling;
CREATE DATABASE zymtrace_profiling;
CREATE USER IF NOT EXISTS zymtrace_user
IDENTIFIED WITH sha256_password BY 'YOUR NEW PASSWORD HERE';
GRANT
SELECT,
INSERT,
UPDATE,
ALTER,
DELETE,
CREATE,
DROP,
SHOW,
OPTIMIZE,
TRUNCATE
ON zymtrace_profiling.* TO zymtrace_user;
Make sure to replace YOUR NEW PASSWORD HERE
with a secure password.
Reference - https://clickhouse.com/docs/sql-reference/statements/grant
Option 2: Automatic Database Setup​
Alternatively, you can enable autoCreateDBs: true
in your configuration to let zymtrace automatically create the required databases during migration. This option requires that your ClickHouse user has CREATE permissions on the server.
When to use automatic setup:
- Development environments where quick setup is preferred
- When you have administrative access to grant broad CREATE permissions
- Testing scenarios where database recreation is acceptable
When to use manual setup:
- Production environments requiring strict permission control
- When following security best practices with minimal required permissions
- Enterprise environments with database administration policies
When using ClickHouse Cloud, ensure you use port 9440
and enable secure connection
ClickHouse Use Existing Configuration
clickhouse:
mode: "use_existing"
use_existing:
host: "" # host:nativePort
user: ""
password: ""
database: "zymtrace"
secure: false # Enable TLS/secure connection
autoCreateDBs: false # When true, zymtrace migration will automatically create the required databases.
# NOTE: For autoCreateDBs to work, the database user must have CREATE permission.
# Grant with: GRANT CREATE ON *.* TO zymtrace_user;
Postgres Configuration​
- Create Mode
- Use Existing
Deploy new Postgres instance​
This mode deploys and manages Postgres within your cluster.
Postgres Create Mode Configuration
postgres:
mode: "create"
create:
config:
user: "postgres"
password: "postgres123"
service:
port: 5432
resources:
requests:
cpu: "200m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1024Mi"
storage:
type: "persistent"
size: 20Gi
className: ""
Connect to existing Postgres​
This mode connects to your existing Postgres database or GCP Cloud SQL.
- Standard Postgres
- GCP Cloud SQL
Standard Postgres Configuration
postgres:
mode: "use_existing"
use_existing:
host: "" # host:port
user: ""
password: ""
database: "zymtrace" # Database name
secure: false # Enable TLS/secure connection
autoCreateDBs: false # When true, zymtrace migration will automatically create the required databases.
# NOTE: For autoCreateDBs to work, the database user must have CREATEDB permission.
# Grant with: ALTER USER "your-user" CREATEDB;
Setting up the Postgres use_existing mode
Postgres setups can vary based on security needs like data classification and role usage. The example below is a simplified guideline.
Note that unlike typical applications, which require only INSERT
and SELECT
permissions, the zymtrace database migration job requires DDL access.
If you're setting up zymtrace for the first time, the most straightforward approach is to create one role with both DDL and DML permissions.
The commands below assume you're connected to the database using a role with superuser privileges.
CREATE ROLE zystem LOGIN PASSWORD 'metsyz';
CREATE DATABASE zymtrace_identity OWNER zystem;
CREATE DATABASE zymtrace_symdb OWNER zystem;
\c zymtrace_identity
ALTER SCHEMA public OWNER TO zystem;
\c zymtrace_symdb
ALTER SCHEMA public OWNER TO zystem;
GCP Cloud SQL Configuration
postgres:
mode: "gcp_cloudsql"
gcp_cloudsql:
instance: "" # PROJECT:REGION:INSTANCE format, e.g. zymtrace-cloudsql-psql-1
user: "" # IAM account, e.g [email protected] (without gserviceaccount.com suffix)
database: "zymtrace" # Database prefix for zymtrace_identity and zymtrace_profiling databases
autoCreateDBs: false # When true, zymtrace migration will automatically create the required databases.
# NOTE: For autoCreateDBs to work, the IAM database user must have CREATEDB permission.
# Grant with: ALTER USER "[email protected]" CREATEDB;
workloadIdentity:
enabled: true # Enable Workload Identity for authentication
proxy:
image:
repository: gcr.io/cloud-sql-connectors/cloud-sql-proxy
tag: "2.15.0"
# Use nodeSelector if you created a dedicated node pool with cloud-platform scope
nodeSelector:
cloud.google.com/gke-nodepool: cloudsql-pool
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "556Mi"
port: 5432
serviceAccount: "zymtrace-cloudsql-sa" # Kubernetes service account bound to GCP service account via Workload Identity
Prerequisites for GCP Cloud SQL:
- Set up Workload Identity between your Kubernetes service account and GCP service account
- Grant the GCP service account Cloud SQL Client role
- Ensure the IAM database user has appropriate permissions in your Cloud SQL instance
For detailed setup instructions including creating the Cloud SQL instance, configuring Workload Identity, and setting up IAM authentication, refer to the GCP Cloud SQL setup guide.
S3-Compatible Object Storage Configuration​
- Create Mode
- Use Existing
Deploy new MinIO instance​
This mode deploys and manages MinIO within your cluster.
MinIO Create Mode Configuration
storage:
mode: "create"
create:
image:
repository: minio/minio
tag: "RELEASE.2024-12-18T13-15-44Z"
config:
user: "minio"
password: "minio123"
service:
api:
port: 9000
console:
port: 9001
replicas: 1
resources:
requests:
cpu: "200m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
storage:
type: "persistent"
size: 20Gi
className: ""
buckets:
symbols: "zymtrace-symdb"
Connect to existing S3-compatible storage​
This mode connects to your existing MinIO, AWS S3, or Google Cloud Storage.
- MinIO
- AWS S3
- Google Cloud Storage
MinIO Configuration
MinIO is a high-performance, S3-compatible object storage solution that can be deployed on-premises or in the cloud. Configure your existing MinIO instance with the following settings:
storage:
mode: "use_existing"
use_existing:
type: "minio"
minio:
endpoint: "" # must be a url, http/s
user: ""
password: ""
buckets:
symbols: "zymtrace-symdb"
Required fields:
endpoint
: Complete URL to your MinIO server (e.g.,https://minio.example.com
orhttp://192.168.1.100:9000
)user
: MinIO access keypassword
: MinIO secret key
AWS S3 Configuration
storage:
mode: "use_existing"
use_existing:
type: "s3"
s3:
region: ""
accessKey: ""
secretKey: ""
buckets:
symbols: "zymtrace-symdb"
Google Cloud Storage Configuration
storage:
mode: "use_existing"
use_existing:
type: "gcs"
gcs:
endpoint: "https://storage.googleapis.com" # GCS endpoint, defaults to https://storage.googleapis.com
accessKey: ""
secretKey: ""
buckets:
symbols: "zymtrace-symdb"
Applying the configuration​
Once you've updated custom_values.yaml
with the appropriate configuration, deploy the backend using Helm:
helm upgrade backend zymtrace/backend -f custom_values.yaml