pgstream v0.8.1: hstore transformer, roles snapshotting, CLI improvements and more

Learn how pgstream v0.8.1 transforms hstore data and improves snapshot experience with roles snapshotting and excluded tables option

Author

Ahmet Gedemenli

Date published

We're proud to announce the release of v0.8.1 of pgstream, our open-source CDC (Change Data Capture) tool for Postgres! šŸš€ Let’s explore some of the standout features we’ve rolled out since the last update!

You can find changelog details on the pgstream releases Github page.

What is pgstream?

pgstream is an open source CDC tool and library that offers Postgres replication support with DDL changes. Some of its key features include:

  • Replication of DDL changes: Schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
  • Modular deployment configuration: pgstream modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. However, it can also easily integrate with Kafka for more complex use cases.
  • Out of the box supported targets:
    • Postgres: Replication to Postgres databases with support for schema changes and batch processing.
    • Elasticsearch/Opensearch: Replication to search stores with special handling of field IDs to minimize re-indexing.
    • Webhooks: Subscribe and receive webhook notifications whenever your source data changes.
  • Snapshots: Capture a consistent view of your Postgres database at a specific point in time, either as an initial snapshot before starting replication or as a standalone process when replication is not needed.asd
  • Column transformations: Modify column values during replication or snapshots, which is particularly useful for anonymizing sensitive data.

For more details on how pgstream works under the hood, check out the full documentation.

What's new?

This release enhances pgstream’s usability with a new hstore transformer for greater flexibility and role snapshotting powered by pg_dumpall for a smoother snapshot experience. It also includes numerous fixes and improvements inspired by feedback from the pgstream community šŸ’œ.

šŸ” hstore transformer

Building on the many transformers introduced in earlier releases, this update expands pgstream’s transformation capabilities with the new hstore transformer. Requested by members of the pgstream community, it applies a list of given set and delete operations to the hstore data to be transformed.

The hstore transformer is compatible with Postgres data type hstore, and like our template and json transformers, it supports a variety of useful template functions:

  • UseĀ .GetValueĀ to refer to the value for the specified key.
  • UseĀ .GetDynamicValue "<column_name>"Ā to refer to some other column value.
  • Standard Go template functions
  • Ā greenmask's large set ofĀ core functionsĀ includingĀ maskingĀ function byĀ go-maskerĀ and variousĀ random data generator functionsĀ powered by the open source libraryĀ faker.
  • sprig's many useful helper functions.

Let's take a closer look with an example transformation.

Example configuration

With the configuration shown above, pgstream transforms the hstore values in the attributes column of the users table as follows:

  1. Email masking: If the key "email" exists, its value is replaced with a masked version using the email masking function. If the key is missing, it's ignored, since error_not_exist option is not given and it defaults to false.
  2. Key deletion: The key-value pair with key "public_key" is deleted. If the key doesn't exist, an error is thrown, because error_not_exist is explicitly set to true.
  3. Value masking: The value for "private_key" is fully masked using the default masking function provided by go-masker, which is integrated into pgstream's templating engine.
  4. New key insertion: The key "newKey" is updated to "newValue". Since error_not_exist is false by default, and the key doesn't exist in the example, a new key-value pair is added.

Example input

Example output

See the docs for more details.

šŸ‘„ Roles snapshotting

As mentioned at the start of this article, this release brings several improvements to the snapshot process in pgstream. One of the key enhancements is the introduction of roles snapshotting for Postgres target, which leverages pg_dumpall to capture and recreate relevant roles.

With this update, pgstream will now automatically create all necessary roles on the target, preserving their privileges. This applies only to roles that are directly associated with the schemas or tables being snapshotted, e.g through ownership or granted privileges.

Roles snapshotting is enabled by default, but it can be configured in the snapshot settings. If needed, you can disable it entirely, or set it to no_passwords mode. In the latter case, pg_dumpall will be invoked with the --no-role-passwords flag, ensuring that role passwords are excluded from the dump.

Example config:

šŸ“ø excluded_tables option for snapshots

Another snapshot improvement that comes with this release is the excluded_tables option. When using the excluded_tables option alongside tables, the two act like a set difference: all tables and schemas specified in tables will be included in the snapshot except those listed in excluded_tables. This improvement makes snapshot configuration more concise and manageable, as you no longer need to explicitly list every table to be included, just define the broader set in tables and exclude the few you don't want.

Example config:

With this config, pgstream will snapshot all the tables in the public schema, except for test2.

āŒØļø CLI improvements

We’ve also made several improvements to the pgstream CLI in this release:

  • The pgstream run command now supports the --init flag, allowing you to initialize and run pgstream in a single step.
  • The init command is now idempotent, if pgstream is already initialized, re-running it will have no effect and won’t result in errors.
  • The old tear-down command has been deprecated in favor of destroy. The new pgstream destroy performs the same cleanup actions as tear-down previously did.
  • Lastly, we've added a --dump-file option to both the snapshot and run commands. This lets you specify a file path where the pg_dump output will be saved for debugging purposes.

šŸ’œ Community requests

We appreciate community suggestions and request and looking forward to having even more of them!

Conclusion

With the latest features discussed in the blogpost, you can build robust, compliant, and efficient data workflows. Whether you're replicating data to downstream systems, anonymizing sensitive information, or creating snapshots, pgstream has the tools you need.

If you have any suggestions or questions, you can reach out to us on Discord or follow us on X / Twitter or Bluesky. We welcome any feedback in issues, or contributions via pull requests! šŸ’œ

Ready to get started? Check out the pgstream documentation for more details.

Related Posts