pgstream v0.8.1: hstore transformer, roles snapshotting, CLI improvements and more

We're proud to announce the release of v0.8.1 of pgstream, our open-source CDC (Change Data Capture) tool for Postgres! 🚀 Let’s explore some of the standout features we’ve rolled out since the last update!

You can find changelog details on the pgstream releases Github page.

What is pgstream?

pgstream is an open source CDC tool and library that offers Postgres replication support with DDL changes. Some of its key features include:

Replication of DDL changes: Schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
Modular deployment configuration: pgstream modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. However, it can also easily integrate with Kafka for more complex use cases.
Out of the box supported targets:
- Postgres: Replication to Postgres databases with support for schema changes and batch processing.
- Elasticsearch/Opensearch: Replication to search stores with special handling of field IDs to minimize re-indexing.
- Webhooks: Subscribe and receive webhook notifications whenever your source data changes.
Snapshots: Capture a consistent view of your Postgres database at a specific point in time, either as an initial snapshot before starting replication or as a standalone process when replication is not needed.asd
Column transformations: Modify column values during replication or snapshots, which is particularly useful for anonymizing sensitive data.

For more details on how pgstream works under the hood, check out the full documentation.

What's new?

This release enhances pgstream’s usability with a new hstore transformer for greater flexibility and role snapshotting powered by pg_dumpall for a smoother snapshot experience. It also includes numerous fixes and improvements inspired by feedback from the pgstream community 💜.

🔐 hstore transformer

Building on the many transformers introduced in earlier releases, this update expands pgstream’s transformation capabilities with the new hstore transformer. Requested by members of the pgstream community, it applies a list of given set and delete operations to the hstore data to be transformed.

The hstore transformer is compatible with Postgres data type hstore, and like our template and json transformers, it supports a variety of useful template functions:

Use .GetValue to refer to the value for the specified key.
Use .GetDynamicValue "<column_name>" to refer to some other column value.
Standard Go template functions
greenmask's large set of core functions including masking function by go-masker and various random data generator functions powered by the open source library faker.
sprig's many useful helper functions.

Let's take a closer look with an example transformation.

Example configuration

With the configuration shown above, pgstream transforms the hstore values in the attributes column of the users table as follows:

Email masking: If the key "email" exists, its value is replaced with a masked version using the email masking function. If the key is missing, it's ignored, since error_not_exist option is not given and it defaults to false.
Key deletion: The key-value pair with key "public_key" is deleted. If the key doesn't exist, an error is thrown, because error_not_exist is explicitly set to true.
Value masking: The value for "private_key" is fully masked using the default masking function provided by go-masker, which is integrated into pgstream's templating engine.
New key insertion: The key "newKey" is updated to "newValue". Since error_not_exist is false by default, and the key doesn't exist in the example, a new key-value pair is added.

Example input

Example output

See the docs for more details.

👥 Roles snapshotting

As mentioned at the start of this article, this release brings several improvements to the snapshot process in pgstream. One of the key enhancements is the introduction of roles snapshotting for Postgres target, which leverages pg_dumpall to capture and recreate relevant roles.

With this update, pgstream will now automatically create all necessary roles on the target, preserving their privileges. This applies only to roles that are directly associated with the schemas or tables being snapshotted, e.g through ownership or granted privileges.

Roles snapshotting is enabled by default, but it can be configured in the snapshot settings. If needed, you can disable it entirely, or set it to no_passwords mode. In the latter case, pg_dumpall will be invoked with the --no-role-passwords flag, ensuring that role passwords are excluded from the dump.

Example config:

📸 excluded_tables option for snapshots

Another snapshot improvement that comes with this release is the excluded_tables option. When using the excluded_tables option alongside tables, the two act like a set difference: all tables and schemas specified in tables will be included in the snapshot except those listed in excluded_tables. This improvement makes snapshot configuration more concise and manageable, as you no longer need to explicitly list every table to be included, just define the broader set in tables and exclude the few you don't want.

Example config:

With this config, pgstream will snapshot all the tables in the public schema, except for test2.

⌨️ CLI improvements

We’ve also made several improvements to the pgstream CLI in this release:

The pgstream run command now supports the --init flag, allowing you to initialize and run pgstream in a single step.
The init command is now idempotent, if pgstream is already initialized, re-running it will have no effect and won’t result in errors.
The old tear-down command has been deprecated in favor of destroy. The new pgstream destroy performs the same cleanup actions as tear-down previously did.
Lastly, we've added a --dump-file option to both the snapshot and run commands. This lets you specify a file path where the pg_dump output will be saved for debugging purposes.

💜 Community requests

We've added support for the citext type in the neosync_email transformer, as requested by a pgstream community member. You can now seamlessly transform email columns that use the citext data type! See the docs for more details!
We've addressed many fix requests by pgstream community, such as the replication tables fix, or the case when snapshotting with timestamps with infinity value.

We appreciate community suggestions and request and looking forward to having even more of them!

Conclusion

With the latest features discussed in the blogpost, you can build robust, compliant, and efficient data workflows. Whether you're replicating data to downstream systems, anonymizing sensitive information, or creating snapshots, pgstream has the tools you need.

If you have any suggestions or questions, you can reach out to us on Discord or follow us on X / Twitter or Bluesky. We welcome any feedback in issues, or contributions via pull requests! 💜

Ready to get started? Check out the pgstream documentation for more details.

pgstream v0.8.1: hstore transformer, roles snapshotting, CLI improvements and more

What is pgstream?

What's new?

🔐 hstore transformer

Example configuration

Example input

Example output

👥 Roles snapshotting

📸 excluded_tables option for snapshots

⌨️ CLI improvements

💜 Community requests

Conclusion

Related Posts

pgstream v0.7.1: JSON transformers, progress tracking and wildcard support for snapshots

pgstream v0.6.0: Template transformers, observability, and performance improvements

pgstream v0.5.0: New transformers, YAML configuration, CLI refactoring & table filtering

pgstream v0.4.0: Postgres-to-Postgres replication, snapshots & transformations

Introducing pgstream: Postgres replication with DDL changes