pgstream v0.7.1: JSON transformers, progress tracking and wildcard support for snapshots
Learn how pgstream v0.7.1 transforms JSON data, improves snapshot experience with progress tracking and wildcard support.
Author
Ahmet GedemenliDate published
We're proud to announce the release of v0.7.1
of pgstream, our open-source CDC (Change Data Capture) tool for Postgres! 🚀 In this blog post, we'll discuss some of the key features packed into this latest release, and look at what the future holds!
You can see the latest updates on pgstream releases Github page
What is pgstream?
pgstream is an open source CDC tool and library that offers Postgres replication support with DDL changes. Some of its key features include:
- Replication of DDL changes: Schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
- Modular deployment configuration: pgstream modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. However, it can also easily integrate with Kafka for more complex use cases.
- Out of the box supported targets:
- Postgres: Replication to Postgres databases with support for schema changes and batch processing.
- Elasticsearch/Opensearch: Replication to search stores with special handling of field IDs to minimize re-indexing.
- Webhooks: Subscribe and receive webhook notifications whenever your source data changes.
- Snapshots: Capture a consistent view of your Postgres database at a specific point in time, either as an initial snapshot before starting replication or as a standalone process when replication is not needed.
- Column transformations: Modify column values during replication or snapshots, which is particularly useful for anonymizing sensitive data.
For more details on how pgstream works under the hood, check out the full documentation.
What's new?
This update focuses on improving the usability of pgstream, by adding JSON transformer for increased flexibility; and by adding progress tracking and wildcard support for a better snapshot experience.
🔐 JSON transformer
After introducing many useful transformers in previous releases, we're continuing to expand transformation capabilities of pgstream in this release, by adding the JSON transformer, which executes a list of given operations on the json data to be transformed.
JSON transformer is compatible with Postgres data types json
and jsonb
and it is powered by the open source library sjson and all the set
and delete
operations must follow the syntax rules of sjson.
JSON transformer supports a variety of useful functions:
- Use
.GetValue
to refer to the value at the specified path. - Use
.GetDynamicValue "<column_name>"
to refer to some other column value. - Standard Go template functions
-
greenmask
's huge set of core functions includingmasking
function by go-masker and various random data generator functions powered by the open source library faker. - sprig's many useful helper functions.
Let's take a closer look with an example transformation.
Example configuration
With the above config pgstream
transforms the json values in the column user_info_json
of the table users
by:
- First, traversing all the items in the array named "purchases", and for each element, setting value to "-" for key "item".
- Then, deleting the object named "country" under the top-level object "address".
- Completely masking the "city" value under object "address", using
go-masker
's default masking function supported bypgstream
's templating. - Finally, setting the user's "lastname" after fetching it from some other column named "lastname", using dynamic values support. Assuming there's such column, having the lastname info for users.
Example input
Example output
See the docs for more details.
🔭 Snapshot progress tracking
As we mentioned at the beginning of this article, this release includes some improvements for the snapshot process. In order to make snapshotting with pgstream
a better experience, we have introduced a progress tracking bar for snapshots, making use of the open source library progressbar. We love open source projects 💜

*️⃣ Wildcard support for snapshots
Another snapshot improvement that comes with pgstream v0.7.1
is the wildcard support for snapshot schema names. We already had support for wildcard table names, e.g public.*
, but with this release we now have support for *.*
.
With this config, pgstream
will be snapshotting all the tables in all schemas, of course except for pg_
schemas.
Please note that *
actually means public.*
and to refer to all tables in all schemas *.*
should be used. For now we do not support wildcard schema name with an actual table name, e.g *.table_1
Conclusion
With the latest features discussed in the blogpost, you can build robust, compliant, and efficient data workflows. Whether you're replicating data to downstream systems, anonymizing sensitive information, or creating snapshots, pgstream has the tools you need.
If you have any suggestions or questions, you can reach out to us on Discord or follow us on X / Twitter or Bluesky. We welcome any feedback in issues, or contributions via pull requests! 💜
Ready to get started? Check out the pgstream documentation for more details.
Related Posts
pgstream v0.6.0: Template transformers, observability, and performance improvements
Learn how pgstream v0.6 simplifies complex data transformations with custom templates, enhances observability and improves snapshot performance.
pgstream v0.5.0: New transformers, YAML configuration, CLI refactoring & table filtering
Improved user experience with new transformers, YAML configuration, CLI refactoring and table filtering.
pgstream v0.4.0: Postgres-to-Postgres replication, snapshots & transformations
Learn how the latest features in pgstream refine Postgres replication with near real-time data capture, consistent snapshots, and column-level transformations.
Introducing pgstream: Postgres replication with DDL changes
Today we’re excited to expand our open source Postgres platform with pgstream, a CDC command line tool and library for PostgreSQL with replication support for DDL changes to any provided output.