8.3k

pgstream v0.7.1: JSON transformers, progress tracking and wildcard support for snapshots

Learn how pgstream v0.7.1 transforms JSON data, improves snapshot experience with progress tracking and wildcard support.

Author

Ahmet Gedemenli

Date published

We're proud to announce the release of v0.7.1 of pgstream, our open-source CDC (Change Data Capture) tool for Postgres! 🚀 In this blog post, we'll discuss some of the key features packed into this latest release, and look at what the future holds!

You can see the latest updates on pgstream releases Github page

What is pgstream?

pgstream is an open source CDC tool and library that offers Postgres replication support with DDL changes. Some of its key features include:

  • Replication of DDL changes: Schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
  • Modular deployment configuration: pgstream modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. However, it can also easily integrate with Kafka for more complex use cases.
  • Out of the box supported targets:
    • Postgres: Replication to Postgres databases with support for schema changes and batch processing.
    • Elasticsearch/Opensearch: Replication to search stores with special handling of field IDs to minimize re-indexing.
    • Webhooks: Subscribe and receive webhook notifications whenever your source data changes.
  • Snapshots: Capture a consistent view of your Postgres database at a specific point in time, either as an initial snapshot before starting replication or as a standalone process when replication is not needed.
  • Column transformations: Modify column values during replication or snapshots, which is particularly useful for anonymizing sensitive data.

For more details on how pgstream works under the hood, check out the full documentation.

What's new?

This update focuses on improving the usability of pgstream, by adding JSON transformer for increased flexibility; and by adding progress tracking and wildcard support for a better snapshot experience.

🔐 JSON transformer

After introducing many useful transformers in previous releases, we're continuing to expand transformation capabilities of pgstream in this release, by adding the JSON transformer, which executes a list of given operations on the json data to be transformed.

JSON transformer is compatible with Postgres data types json and jsonb and it is powered by the open source library sjson and all the set and delete operations must follow the syntax rules of sjson.

JSON transformer supports a variety of useful functions:

  • Use .GetValue to refer to the value at the specified path.
  • Use .GetDynamicValue "<column_name>" to refer to some other column value.
  • Standard Go template functions
  •  greenmask's huge set of core functions including masking function by go-masker and various random data generator functions powered by the open source library faker.
  • sprig's many useful helper functions.

Let's take a closer look with an example transformation.

Example configuration

With the above config pgstream transforms the json values in the column user_info_json of the table users by:

  • First, traversing all the items in the array named "purchases", and for each element, setting value to "-" for key "item".
  • Then, deleting the object named "country" under the top-level object "address".
  • Completely masking the "city" value under object "address", using go-masker's default masking function supported by pgstream's templating.
  • Finally, setting the user's "lastname" after fetching it from some other column named "lastname", using dynamic values support. Assuming there's such column, having the lastname info for users.

Example input

Example output

See the docs for more details.

🔭 Snapshot progress tracking

As we mentioned at the beginning of this article, this release includes some improvements for the snapshot process. In order to make snapshotting with pgstream a better experience, we have introduced a progress tracking bar for snapshots, making use of the open source library progressbar. We love open source projects 💜

snapshot_progress_demo

*️⃣ Wildcard support for snapshots

Another snapshot improvement that comes with pgstream v0.7.1 is the wildcard support for snapshot schema names. We already had support for wildcard table names, e.g public.*, but with this release we now have support for *.*.

With this config, pgstream will be snapshotting all the tables in all schemas, of course except for pg_ schemas.

Please note that * actually means public.* and to refer to all tables in all schemas *.* should be used. For now we do not support wildcard schema name with an actual table name, e.g *.table_1

Conclusion

With the latest features discussed in the blogpost, you can build robust, compliant, and efficient data workflows. Whether you're replicating data to downstream systems, anonymizing sensitive information, or creating snapshots, pgstream has the tools you need.

If you have any suggestions or questions, you can reach out to us on Discord or follow us on X / Twitter or Bluesky. We welcome any feedback in issues, or contributions via pull requests! 💜

Ready to get started? Check out the pgstream documentation for more details.

Related Posts

pgstream v0.7.1: JSON transform, snapshot progress, wildcard | xata