Storage configuration

zymtrace uses ClickHouse, Postgres and S3-compatible Object Storage as its primary storage backends. For a deeper understanding of how these components interact, refer to the architecture section.

To accommodate different operational needs, zymtrace provides two deployment modes:

Create Mode (create): Deploys and manages the storage service within your cluster. This is ideal for quick setup and seamless integration with zymtrace.
Existing Mode (use_existing): Connects to your existing ClickHouse, Postgres, or S3 storage, eliminating the need to manage additional infrastructure.

This guide walks you through configuring ClickHouse, Postgres, and S3-compatible storage in both modes.

ClickHouse Configuration

Create Mode
Use Existing

Deploy new ClickHouse instance

This mode deploys and manages ClickHouse within your cluster.

ClickHouse Create Mode Configuration

clickhouse:
  mode: "create"
  create:
    image:
      repository: clickhouse/clickhouse-server
      tag: "25.3.2.39"
    config:
      user: "clickhouse"
      password: "clickhouse123"
      database: "zymtrace"
    service:
      http:
        port: 8123
      native:
        port: 9000
    replicas: 1
    resources:
      requests:
        cpu: "500m"
        memory: "1Gi"
      limits:
        cpu: "2000m"
        memory: "4Gi"
    storage:
      type: "persistent"  # "persistent" or "empty_dir"
      size: 30Gi
      className: ""

Connect to existing ClickHouse

This mode connects to your existing ClickHouse cluster.

Database Setup Options

You have two options for setting up the required databases:

Manual Setup : Create databases and users manually using SQL commands.
Automatic Setup: Enable autoCreateDBs to let zymtrace create databases during migration

Option 1: Manual Database Setup

When connecting to an existing ClickHouse cluster, we strongly recommend that you create a dedicated user, assign appropriate permissions, and create a new database for zymtrace.

Caution:

This script will drop the zymtrace_profiling database if it exists. Run it only if this is your first time setting up zymtrace with ClickHouse, or if you intentionally want to reset the database, knowing that this will permanently delete any existing data.

-- Clean slate: remove any existing profiling database
DROP DATABASE IF EXISTS zymtrace_profiling;

CREATE DATABASE zymtrace_profiling;

CREATE USER IF NOT EXISTS zymtrace_user
IDENTIFIED WITH sha256_password BY 'YOUR NEW PASSWORD HERE';

GRANT 
    SELECT,
    INSERT,
    UPDATE,
    ALTER,
    DELETE,
    CREATE,
    DROP,
    SHOW,
    OPTIMIZE,
    TRUNCATE
ON zymtrace_profiling.* TO zymtrace_user;

Make sure to replace YOUR NEW PASSWORD HERE with a secure password.

Reference - https://clickhouse.com/docs/sql-reference/statements/grant

Option 2: Automatic Database Setup

Alternatively, you can enable autoCreateDBs: true in your configuration to let zymtrace automatically create the required databases during migration. This option requires that your ClickHouse user has CREATE permissions on the server.

When to use automatic setup:

Development environments where quick setup is preferred
When you have administrative access to grant broad CREATE permissions
Testing scenarios where database recreation is acceptable

When to use manual setup:

Production environments requiring strict permission control
When following security best practices with minimal required permissions
Enterprise environments with database administration policies

info

When using ClickHouse Cloud, ensure you use port 9440 and enable secure connection

ClickHouse Use Existing Configuration

clickhouse:
  mode: "use_existing"
  use_existing:
    host: ""  # host:nativePort
    user: ""
    password: ""
    database: "zymtrace"
    secure: false  # Enable TLS/secure connection
    autoCreateDBs: false  # When true, zymtrace migration will automatically create the required databases. 
    # NOTE: For autoCreateDBs to work, the database user must have CREATE permission. 
    # Grant with: GRANT CREATE ON *.* TO zymtrace_user;

Postgres Configuration

Create Mode
Use Existing

Deploy new Postgres instance

This mode deploys and manages Postgres within your cluster.

Postgres Create Mode Configuration

postgres:
  mode: "create"
  create:
    config:
      user: "postgres"
      password: "postgres123"
    service:
      port: 5432
    resources:
      requests:
        cpu: "200m"
        memory: "512Mi"
      limits:
        cpu: "1000m"
        memory: "1024Mi"
    storage:
      type: "persistent"
      size: 20Gi
      className: ""

Connect to existing Postgres

This mode connects to your existing Postgres database or GCP Cloud SQL.

Standard Postgres
GCP Cloud SQL

Standard Postgres Configuration

postgres:
  mode: "use_existing"
  use_existing:
    host: "" # host:port
    user: ""
    password: ""
    database: "zymtrace"  # Database name
    secure: false  # Enable TLS/secure connection
    autoCreateDBs: false  # When true, zymtrace migration will automatically create the required databases. 
    # NOTE: For autoCreateDBs to work, the database user must have CREATEDB permission. 
    # Grant with: ALTER USER "your-user" CREATEDB;

Setting up the Postgres use_existing mode

Postgres setups can vary based on security needs like data classification and role usage. The example below is a simplified guideline.

Note that unlike typical applications, which require only INSERT and SELECT permissions, the zymtrace database migration job requires DDL access.

If you're setting up zymtrace for the first time, the most straightforward approach is to create one role with both DDL and DML permissions.

The commands below assume you're connected to the database using a role with superuser privileges.

CREATE ROLE zystem LOGIN PASSWORD 'metsyz';

CREATE DATABASE zymtrace_identity OWNER zystem;
CREATE DATABASE zymtrace_symdb OWNER zystem;

\c zymtrace_identity
ALTER SCHEMA public OWNER TO zystem;

\c zymtrace_symdb
ALTER SCHEMA public OWNER TO zystem;

GCP Cloud SQL Configuration

postgres:
  mode: "gcp_cloudsql"
  gcp_cloudsql:
    instance: "" # PROJECT:REGION:INSTANCE format, e.g. zymtrace-cloudsql-psql-1
    user: "" # IAM account, e.g [email protected] (without gserviceaccount.com suffix)
    database: "zymtrace" # Database prefix for zymtrace_identity and zymtrace_profiling databases
    autoCreateDBs: false  # When true, zymtrace migration will automatically create the required databases. 
    # NOTE: For autoCreateDBs to work, the IAM database user must have CREATEDB permission.
    # Grant with: ALTER USER "[email protected]" CREATEDB;
    workloadIdentity:
      enabled: true  # Enable Workload Identity for authentication
    proxy:
      image:
        repository: gcr.io/cloud-sql-connectors/cloud-sql-proxy
        tag: "2.15.0"
      # Use nodeSelector if you created a dedicated node pool with cloud-platform scope
      nodeSelector:
        cloud.google.com/gke-nodepool: cloudsql-pool
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"
        limits:
          cpu: "500m"
          memory: "556Mi"
      port: 5432
    serviceAccount: "zymtrace-cloudsql-sa" # Kubernetes service account bound to GCP service account via Workload Identity

Prerequisites for GCP Cloud SQL:

Set up Workload Identity between your Kubernetes service account and GCP service account
Grant the GCP service account Cloud SQL Client role
Ensure the IAM database user has appropriate permissions in your Cloud SQL instance

For detailed setup instructions including creating the Cloud SQL instance, configuring Workload Identity, and setting up IAM authentication, refer to the GCP Cloud SQL setup guide.

S3-Compatible Object Storage Configuration

Create Mode
Use Existing

Deploy new MinIO instance

This mode deploys and manages MinIO within your cluster.

MinIO Create Mode Configuration

storage:
  mode: "create"
  create:
    image:
      repository: minio/minio
      tag: "RELEASE.2024-12-18T13-15-44Z"
    config:
      user: "minio"
      password: "minio123"
    service:
      api:
        port: 9000
      console:
        port: 9001
    replicas: 1
    resources:
      requests:
        cpu: "200m"
        memory: "512Mi"
      limits:
        cpu: "1000m"
        memory: "1Gi"
    storage:
      type: "persistent"
      size: 20Gi
      className: ""
  buckets:
    symbols: "zymtrace-symdb"

Connect to existing S3-compatible storage

This mode connects to your existing MinIO, AWS S3, or Google Cloud Storage.

MinIO
AWS S3
Google Cloud Storage

MinIO Configuration

MinIO is a high-performance, S3-compatible object storage solution that can be deployed on-premises or in the cloud. Configure your existing MinIO instance with the following settings:

storage:
  mode: "use_existing"
  use_existing:
    type: "minio"
    minio:
      endpoint: "" # must be a url, http/s
      user: ""
      password: ""
  buckets:
    symbols: "zymtrace-symdb"

Required fields:

endpoint: Complete URL to your MinIO server (e.g., https://minio.example.com or http://192.168.1.100:9000)
user: MinIO access key
password: MinIO secret key

AWS S3 Configuration

storage:
  mode: "use_existing"
  use_existing:
    type: "s3"
    s3:
      region: ""
      accessKey: ""
      secretKey: ""
  buckets:
    symbols: "zymtrace-symdb"

Google Cloud Storage Configuration

storage:
  mode: "use_existing"
  use_existing:
    type: "gcs"
    gcs:
      endpoint: "https://storage.googleapis.com" # GCS endpoint, defaults to https://storage.googleapis.com
      accessKey: ""
      secretKey: ""
  buckets:
    symbols: "zymtrace-symdb"

Applying the configuration

Once you've updated custom_values.yaml with the appropriate configuration, deploy the backend using Helm:

helm upgrade backend zymtrace/backend -f custom_values.yaml

ClickHouse Configuration​

Deploy new ClickHouse instance​

Connect to existing ClickHouse​

Option 1: Manual Database Setup​

Option 2: Automatic Database Setup​

Postgres Configuration​

Deploy new Postgres instance​

Connect to existing Postgres​

S3-Compatible Object Storage Configuration​

Deploy new MinIO instance​

Connect to existing S3-compatible storage​

Applying the configuration​