Skip to main content
pgstream is a command-line tool for streaming PostgreSQL data changes to various targets. This document covers all available commands and their usage.

Installation

curl -L https://github.com/xataio/pgstream/releases/latest/download/pgstream.linux.amd64 -o pgstream
chmod +x pgstream
sudo mv pgstream /usr/local/bin/

go install github.com/xataio/pgstream@latest

git clone https://github.com/xataio/pgstream.git
cd pgstream
go build -o pgstream ./cmd

brew tap xataio/pgstream
brew install pgstream

Global Flags

These flags are available for all commands:
Description: .env or .yaml config file to use with pgstream if any
Default: -
Description: Log level for the application. One of trace, debug, info, warn, error, fatal, panic
Default: debug
Description: Show help information
Default: -

Commands

init

Initialises pgstream, creating the replication slot and the relevant tables/functions/triggers under the configured internal pgstream schema. It performs the same operations as the --init flag on the run command.
pgstream init [flags]
Description: The init command prepares your PostgreSQL database for streaming by:
  • Creating a logical replication slot with the specified name
  • Creating the internal pgstream schema for tracking changes
  • Setting up necessary functions and triggers for change data capture
  • Configuring the database objects required for logical replication
Prerequisites:
  • PostgreSQL must have wal_level = logical
  • User must have replication privileges (REPLICATION role)
  • max_replication_slots must allow for additional slots
  • User must have privileges to create schemas, functions, and triggers
Flags:
  • --postgres-url - Source postgres URL where pgstream setup will be run
  • --replication-slot - Name of the postgres replication slot to be created by pgstream on the source url
Examples:
pgstream init -c config.yaml
pgstream init -c config.env

run

Run starts a continuous data stream from the configured source to the configured target.
pgstream run [flags]
Description: The run command is the main operation mode for pgstream. It:
  • Establishes a connection to the source database
  • Connects to the existing replication slot (created by init)
  • Continuously reads WAL events from the replication stream
  • Processes and transforms data according to configuration
  • Streams changes to configured targets (Kafka, PostgreSQL, Elasticsearch, OpenSearch)
  • Runs continuously until interrupted (Ctrl+C) or receives a termination signal
  • Gracefully shuts down on SIGTERM/SIGINT
  • Resumes from the last confirmed WAL position
Prerequisites:
  • Database must be initialized with pgstream init
  • Replication slot must exist and be available
  • Target systems must be accessible and properly configured
  • Source database must have logical replication enabled
Flags:
  • --source - Source type. One of postgres, kafka
  • --source-url - Source URL
  • --target - Target type. One of postgres, opensearch, elasticsearch, kafka
  • --target-url - Target URL
  • --replication-slot - Name of the postgres replication slot for pgstream to connect to
  • --snapshot-tables - List of tables to snapshot if initial snapshot is required, in the format <schema>.<table>. If not specified, the schema public will be assumed. Wildcards are supported
  • --reset - Whether to reset the target before snapshotting (only for postgres target)
  • --profile - Whether to expose a /debug/pprof endpoint on localhost:6060
  • --init - Whether to initialize pgstream before starting replication
  • --dump-file - File where the pg_dump output will be written if initial snapshot is enabled when using pgdump/restore
Examples:
pgstream run --source kafka --source-url <kafka-url> --target elasticsearch --target-url <elasticsearch-url>
pgstream run --config config.yaml --log-level info
pgstream run --config config.env
Output Files (when --profile is enabled):
  • cpu.prof - CPU profiling data for performance analysis
  • mem.prof - Memory allocation profiling data

snapshot

Snapshot performs a snapshot of the configured source Postgres database into the configured target.
pgstream snapshot [flags]
Description: The snapshot command creates a point-in-time copy of database tables. It:
  • Connects to the source PostgreSQL database
  • Reads all existing data from specified tables/schemas
  • Transforms and streams the data to configured targets
  • Exits after completing the snapshot operation
Prerequisites:
  • Source PostgreSQL database must be accessible
  • Target system must be accessible and properly configured
  • User must have SELECT privileges on tables to be snapshotted
  • For PostgreSQL targets: user must have write privileges
Flags:
  • --postgres-url - Source postgres database to perform the snapshot from
  • --target - Target type. One of postgres, opensearch, elasticsearch, kafka
  • --target-url - Target URL
  • --tables - List of tables to snapshot, in the format <schema>.<table>. If not specified, the schema public will be assumed. Wildcards are supported
  • --reset - Whether to reset the target before snapshotting (only for postgres target)
  • --profile - Whether to produce CPU and memory profile files, as well as exposing a /debug/pprof endpoint on localhost:6060
  • --dump-file - File where the pg_dump output will be written
Examples:
pgstream snapshot --config config.yaml --log-level info
pgstream snapshot --config config.env
Use Cases:
  • Bulk data export for analytics
  • Creating test datasets
  • Backfilling data after system setup
Output Files (when --profile is enabled):
  • cpu.prof - CPU profiling data for performance analysis
  • mem.prof - Memory allocation profiling data

status

Checks the status of pgstream initialisation and provided configuration.
pgstream status [flags]
Description: The status command provides information about:
  • Replication slot status
  • Internal pgstream schema and objects status
  • Overall streaming infrastructure health
  • Configuration validation results
Prerequisites:
  • Access to the source PostgreSQL database
  • Replication slot should exist
Flags:
  • --postgres-url - Source postgres URL where pgstream has been initialised
  • --replication-slot - Name of the postgres replication slot created by pgstream on the source url
  • --json - Output the status in JSON format
Examples:
pgstream status -c pg2pg.env
pgstream status -c pg2pg.yaml --json
Sample Output:
✅ SUCCESS  pgstream status check encountered no issues
Initialisation status:
 - Pgstream schema exists: true
 - Pgstream schema_log table exists: true
 - Migration current version: 8
 - Migration status: success
 - Replication slot name: pgstream_postgres_slot
 - Replication slot plugin: wal2json
 - Replication slot database: postgres
Config status:
 - Valid: true
Transformation rules status:
 - Valid: true
Source status:
 - Reachable: true

validate

Validate different parts of the pgstream configuration.
pgstream validate <subcommand> [flags]
Description: The validate command allows you to validate specific aspects of your pgstream configuration before running it. Currently supports validating transformation rules.

validate rules

Validates transformation rules against the provided Postgres database schema.
pgstream validate rules [flags]
Description: The validate rules command checks your transformation rules for:
  • Column existence and type compatibility
  • Table and schema references
  • Rule syntax and structure
  • Compatibility with the source database schema
  • Overall validity before applying them in production
Prerequisites:
  • Access to the source PostgreSQL database
  • Transformation rules defined in configuration or separate rules file
Flags:
  • --postgres-url - Source postgres URL to validate the rules against
  • --rules-file, -f - Path to a YAML file containing the transformation rules to validate
  • --json - Output the validation status in JSON format
Examples:
pgstream validate rules -c pg2pg.env
pgstream validate rules -c pg2pg.yaml --json
Sample Output:
✅ SUCCESS  transformation rules are valid
Transformation rules status:
 - Valid: true
 - Errors: []
Use Cases:
  • Pre-deployment validation of transformation rules
  • Testing rule changes against production schema
  • CI/CD pipeline integration for rule validation
  • Debugging transformation rule issues

destroy

It destroys any pgstream setup, removing the replication slot and all the relevant tables/functions/triggers, along with the internal pgstream schema.
pgstream destroy [flags]
Description: The destroy command cleans up all resources created by pgstream init:
  • Drops the replication slot
  • Removes the internal pgstream schema and all its objects
  • Removes all pgstream-related functions and triggers
  • ⚠️ Warning: This is destructive and will lose replication position
Prerequisites:
  • Access to the source PostgreSQL database
  • User must have privileges to drop schemas, functions, and replication slots
  • pgstream should be initialized (objects should exist to be destroyed)
Flags:
  • --postgres-url - Source postgres URL where pgstream destroy will be run
  • --replication-slot - Name of the postgres replication slot to be deleted by pgstream from the source url
Examples:
pgstream destroy -c config.yaml
pgstream destroy -c config.env
⚠️ Important Notes:
  • This will stop any running pgstream instances using these resources
  • You will lose the current replication position
  • All pgstream tracking data will be permanently removed
  • You can recreate resources later with pgstream init

version

Displays version information for pgstream.
pgstream version
Description: Shows the current version of the pgstream binary. Example:
$ pgstream version
pgstream version v0.8.1

Configuration

pgstream uses YAML or .env configuration files. The configuration can be specified via:
  1. Command-line flag: --config /path/to/config.yaml or -c /path/to/config.env
  2. Environment variable: PGSTREAM_CONFIG=/path/to/config.yaml
For more information about the configuration options, check out the configuration documentation.

Examples

Complete Setup Workflow

pgstream run --source postgres --source-url "postgres://user:pass@localhost:5432/source_db" --target postgres --target-url "postgres://user:pass@localhost:5432/target_db" --replication-slot "pgstream_slot" --snapshot-tables "public.*" --init

Configuration File Workflow

cat > config.yaml <<EOF
source:
  postgres:
    url: "postgres://user:pass@localhost:5432/source_db"
    replication:
      replication_slot: "pgstream_slot"

target:
  postgres:
    url: "postgres://user:pass@localhost:5432/target_db"
EOF

pgstream init -c config.yaml

pgstream status -c config.yaml

pgstream run -c config.yaml

Development with Profiling

pgstream snapshot -c config.yaml --profile

pgstream run -c config.yaml --profile

go tool pprof http://localhost:6060/debug/pprof/profile
go tool pprof http://localhost:6060/debug/pprof/heap

Multi-target Streaming

pgstream run --source postgres --source-url "postgres://user:pass@localhost:5432/source_db" --target kafka --target-url "localhost:9092" --replication-slot "pgstream_slot"

pgstream run --source kafka --source-url "localhost:9092" --target elasticsearch --target-url "http://localhost:9200" --replication-slot "pgstream_slot"

pgstream run --source kafka --source-url "localhost:9092" --target opensearch --target-url "http://localhost:9200" --replication-slot "pgstream_slot"

Environment Variable Configuration

export PGSTREAM_POSTGRES_LISTENER_URL="postgres://user:pass@localhost:5432/source_db"
export PGSTREAM_POSTGRES_REPLICATION_SLOT_NAME="pgstream_slot"
export PGSTREAM_LOG_LEVEL="info"

pgstream init
pgstream status
pgstream run --source postgres --target kafka --target-url "localhost:9092"

Troubleshooting

Common Issues

1. Initialization Failures
psql "postgres://user:password@host:port/database" -c "SHOW wal_level;"
psql "postgres://user:password@host:port/database" -c "SHOW max_replication_slots;"

psql "postgres://user:password@host:port/database" -c "SELECT rolreplication FROM pg_roles WHERE rolname = 'your_user';"
2. Replication Slot Conflicts
pgstream status --postgres-url "postgres://user:pass@host:port/db" --replication-slot "slot_name"

pgstream destroy --postgres-url "postgres://user:pass@host:port/db" --replication-slot "slot_name"
3. Status Check Issues
pgstream status -c config.yaml --json

pgstream status --postgres-url "postgres://user:pass@host:port/db" --replication-slot "slot_name"

Command-specific Troubleshooting

Init Command:
pgstream init --postgres-url "postgres://user:pass@host:port/db" --replication-slot "slot_name" --log-level debug
Run Command:
pgstream run -c config.yaml --init

pgstream run -c config.yaml --snapshot-tables "public.users,public.orders"
Snapshot Command:
pgstream snapshot -c config.yaml --reset

pgstream snapshot -c config.yaml --profile --dump-file custom_dump.sql

Best Practices

  • Use run with the --init flag to ensure pgstream can properly replicate schema changes
  • snapshot only requires read access on your Postgres source, so it’s a good alternative for non invasive syncs
  • If you need a snapshot and replication, use run with initial snapshot to prevent data loss
  • Use the status command to validate your transformer configuration and your pgstream replication setup
  • Use destroy carefully, since it will remove everything used by pgstream, including the replication slot

Getting Help

For immediate help with any command:
pgstream --help
pgstream init --help
pgstream run --help
pgstream snapshot --help
pgstream status --help
pgstream destroy --help