pgstream v0.4.0: Postgres-to-Postgres replication, snapshots & transformations

Learn how the latest features in pgstream refine Postgres replication with near real-time data capture, consistent snapshots, and column-level transformations.

Written by

Esther Minano Sanz

Published on

March 27, 2025

We're proud to announce the release of v0.4 of pgstream, our open-source CDC tool for Postgres! 🚀 In this blog post, we'll dive into some of the key features packed into this latest release, and look at what the future holds!

You can find the complete release notes on the Github v0.4.0 release page.

#

What is pgstream?

pgstream is an open source CDC(Change Data Capture) tool and library that offers Postgres replication support with DDL changes. Some of its key features include:

  • Replication of DDL changes: schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
  • Modular deployment configuration: pgstream modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. However, it can also easily integrate with Kafka for more complex use cases.
  • Out of the box replication outputs:
    • Elasticsearch/Opensearch: replication to search stores with special handling of field IDs to minimise re-indexing.
    • Webhooks: subscribe and receive webhook notifications whenever your source data changes.

For more details on how pgstream works under the hood, check out the full documentation.

The new release brings three powerful features to pgstream: Postgres to Postgres replication, snapshots, and transformers. Let's dive into each of them!

If you want a sneak peek, this demo shows all three in action 💪.

Demo
Demo
Postgres to Postgres
Postgres to Postgres

This release adds a new pgstream output to support replication to Postgres databases. The Postgres processor processes the WAL events and translates them into SQL queries, which are then sent to the target Postgres database, ensuring efficient, near real-time replication of both data and schema changes. The implementation includes:

  • Batch processing: queries are grouped into batches to optimize performance and reduce I/O overhead.
  • Schema change handling: Automatically replicates schema changes (e.g., adding or renaming columns).
  • Memory guarding: Prevents replication lag by using configurable memory guards.

In contrast to existing Postgres tools, such as pg_dump/ pg_restore and Postgres logical replication, pgstream allows to keep track of schema changes as well as data, making the replication pipeline more robust.

Imagine you're replicating data from a production Postgres database to a staging environment for testing. By using the following configuration, pgstream will start listening on the replication slot on the source database and replicate changes to the target staging database in near real time.

# Listener config
PGSTREAM_POSTGRES_LISTENER_URL="postgres://prod_user:password@prod-db:5432/prod"

# Processor config
PGSTREAM_POSTGRES_WRITER_TARGET_URL="postgres://staging_user:password@staging-db:5432/staging"
PGSTREAM_POSTGRES_WRITER_BATCH_SIZE=100
PGSTREAM_POSTGRES_WRITER_BATCH_TIMEOUT=1s

Check out the Postgres to Postgres tutorial for more details on how to set up and run replication with pgstream.

Snapshot to Postgres
Snapshot to Postgres

With the addition of snapshots to pgstream, it's now possible to capture a consistent view of your Postgres database at a specific point in time, be it as an initial snapshot of the database before starting replication, or as a stand alone process when replication is not needed.

The snapshot listener produces the snapshot in different ways for the schema and the data in order to optimise the implementation for the different outputs:

  • Schema snapshots: Capture the database schema using the pgstream.schema_log table or Postgres utilities pg_dump/ pg_restore for Postgres to Postgres replication.
  • Data snapshots: Use transaction snapshot IDs to obtain a stable view of the data, allowing to process tables and rows in parallel. When combined with replication, the current LSN is stored before starting the snapshot to ensure any WAL events that occur during the snapshot can be replayed.
Snapshot sequence
Snapshot sequence

For more details on how the snapshots are implemented, check out the snapshot section in the pgstream documentation.

You're setting up a new analytics pipeline and need to initialize the target Postgres database with existing data. Snapshots ensure that all data is captured before replication begins.

# Listener config
# URL of the PostgreSQL database we want to snapshot
PGSTREAM_POSTGRES_LISTENER_URL="postgres://prod_user:password@prod-db:5432/prod"
PGSTREAM_POSTGRES_LISTENER_INITIAL_SNAPSHOT_ENABLED=true
# List of tables we want to snapshot. If the tables are not schema qualified, the public schema will be assumed.
# Wildcards are supported.
PGSTREAM_POSTGRES_INITIAL_SNAPSHOT_TABLES="*"

# Processor config
PGSTREAM_POSTGRES_WRITER_TARGET_URL="postgres://analytics_user:password@analytics-db:5432/analytics"

You want to have a realistic staging database, without the need for continuous replication, so you perform snapshots weekly.

# Listener config
PGSTREAM_POSTGRES_SNAPSHOT_LISTENER_URL="postgres://prod_user:password@prod-db:5432/prod"
PGSTREAM_POSTGRES_SNAPSHOT_TABLES="*"

# Processor config
PGSTREAM_POSTGRES_WRITER_TARGET_URL="postgres://staging_user:password@staging-db:5432/staging"

Check out the snapshot tutorial for more details on how to set up and perform snapshots with pgstream.

Transformer diagram
Transformer diagram

Transformers in pgstream allow you to modify column values during replication or snapshots. This is particularly useful for anonymizing sensitive data, which is critical for:

  • Compliance: Meeting legal and regulatory requirements by masking or modifying PII before it reaches unauthorized systems.
  • Security: Preventing sensitive data leaks by ensuring that only de-identified information is replicated.
  • Testing & development: Providing developers with realistic datasets while safeguarding personal information.
  • Analytics & reporting: Enabling insights from anonymized data without exposing private user details.

The pgstream implementation of transformers acts as a wrapper around the existing processors, which means they can be applied to any available outputs (Elasticsearch/OpenSearch, webhooks, Postgres). It leverages open source libraries, such as Greenmask and Neosync, and it can integrate with custom transformers.

The column transformation rules are defined in a YAML configuration file.

transformations:
  - schema: public
    table: users
    column_transformers:
      email:
        name: neosync_email
        parameters:
          preserve_length: true # Ensures the transformed email has the same length as the original.
          preserve_domain: true # Keeps the domain of the original email intact.
          email_type: fullname # Specifies the type of email transformation.

To see the list of supported transformers check out the transformers section of the pgstream documentation.

Let's say we need to anonymize email addresses before replicating user data to a staging database. We can configure a pgstream transformer to modify the email field while keeping other information intact by pointing to the previous example transformation rules in the pgstream configuration.

# Listener config
PGSTREAM_POSTGRES_LISTENER_URL="postgres://prod_user:password@prod-db:5432/prod"

# Processor config
# Path to the transformer rules yaml file
PGSTREAM_TRANSFORMER_RULES_FILE="transformer_rules.yaml"
PGSTREAM_POSTGRES_WRITER_TARGET_URL="postgres://staging_user:password@staging-db:5432/staging"

Check out the transformers tutorial for more details on how to set up and use transformers with pgstream.

With the latest features discussed in this blogpost, you can build robust, compliant, and efficient data workflows. Whether you're replicating data to downstream systems, anonymizing sensitive information, or creating snapshots, pgstream has the tools you need.

If you have any suggestions or questions, you can reach out to us on Discord or follow us on X / Twitter. We welcome any feedback in issues, or contributions via pull requests! 💜

Ready to get started? Check out the pgstream documentation for more details.

Start free,
pay as you grow

Xata provides the best free plan in the industry. It is production ready by default and doesn't pause or cool-down. Take your time to build your business and upgrade when you're ready to scale.

Free plan includes
  • Single team member
  • 10 database branches
  • High availability
  • 15 GB data storage
Start freeExplore all plans
Free plan includes
  • Single team member
  • 10 database branches
  • High availability
  • 15 GB data storage

Sign up to our newsletter

By subscribing, you agree with Xata’s Terms of Service and Privacy Policy.

Copyright © 2025 Xatabase Inc.
All rights reserved.

Product

Feature requestsPricingStatusAI solutions