pgstream v0.8.1: hstore transformer, roles snapshotting, CLI improvements and more
Learn how pgstream v0.8.1 transforms hstore data and improves snapshot experience with roles snapshotting and excluded tables option
Author
Ahmet GedemenliDate published
We're proud to announce the release of v0.8.1
of pgstream, our open-source CDC (Change Data Capture) tool for Postgres! š Letās explore some of the standout features weāve rolled out since the last update!
You can find changelog details on the pgstream releases Github page.
What is pgstream?
pgstream is an open source CDC tool and library that offers Postgres replication support with DDL changes. Some of its key features include:
- Replication of DDL changes: Schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
- Modular deployment configuration: pgstream modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. However, it can also easily integrate with Kafka for more complex use cases.
- Out of the box supported targets:
- Postgres: Replication to Postgres databases with support for schema changes and batch processing.
- Elasticsearch/Opensearch: Replication to search stores with special handling of field IDs to minimize re-indexing.
- Webhooks: Subscribe and receive webhook notifications whenever your source data changes.
- Snapshots: Capture a consistent view of your Postgres database at a specific point in time, either as an initial snapshot before starting replication or as a standalone process when replication is not needed.asd
- Column transformations: Modify column values during replication or snapshots, which is particularly useful for anonymizing sensitive data.
For more details on how pgstream works under the hood, check out the full documentation.
What's new?
This release enhances pgstreamās usability with a new hstore transformer for greater flexibility and role snapshotting powered by pg_dumpall
for a smoother snapshot experience. It also includes numerous fixes and improvements inspired by feedback from the pgstream community š.
š hstore transformer
Building on the many transformers introduced in earlier releases, this update expands pgstreamās transformation capabilities with the new hstore transformer. Requested by members of the pgstream community, it applies a list of given set
and delete
operations to the hstore data to be transformed.
The hstore transformer is compatible with Postgres data type hstore
, and like our template
and json
transformers, it supports a variety of useful template functions:
- UseĀ
.GetValue
Ā to refer to the value for the specified key. - UseĀ
.GetDynamicValue "<column_name>"
Ā to refer to some other column value. - Standard Go template functions
- Ā
greenmask
's large set ofĀ core functionsĀ includingĀmasking
Ā function byĀ go-maskerĀ and variousĀ random data generator functionsĀ powered by the open source libraryĀ faker. - sprig's many useful helper functions.
Let's take a closer look with an example transformation.
Example configuration
With the configuration shown above, pgstream transforms the hstore
values in the attributes
column of the users
table as follows:
- Email masking: If the key
"email"
exists, its value is replaced with a masked version using the email masking function. If the key is missing, it's ignored, sinceerror_not_exist
option is not given and it defaults tofalse
. - Key deletion: The key-value pair with key
"public_key"
is deleted. If the key doesn't exist, an error is thrown, becauseerror_not_exist
is explicitly set totrue
. - Value masking: The value for
"private_key"
is fully masked using the default masking function provided bygo-masker
, which is integrated into pgstream's templating engine. - New key insertion: The key
"newKey"
is updated to"newValue"
. Sinceerror_not_exist
isfalse
by default, and the key doesn't exist in the example, a new key-value pair is added.
Example input
Example output
See the docs for more details.
š„ Roles snapshotting
As mentioned at the start of this article, this release brings several improvements to the snapshot process in pgstream. One of the key enhancements is the introduction of roles snapshotting for Postgres target, which leverages pg_dumpall
to capture and recreate relevant roles.
With this update, pgstream will now automatically create all necessary roles on the target, preserving their privileges. This applies only to roles that are directly associated with the schemas or tables being snapshotted, e.g through ownership or granted privileges.
Roles snapshotting is enabled by default, but it can be configured in the snapshot settings. If needed, you can disable it entirely, or set it to no_passwords
mode. In the latter case, pg_dumpall
will be invoked with the --no-role-passwords
flag, ensuring that role passwords are excluded from the dump.
Example config:
šø excluded_tables option for snapshots
Another snapshot improvement that comes with this release is the excluded_tables
option. When using the excluded_tables
option alongside tables
, the two act like a set difference: all tables and schemas specified in tables
will be included in the snapshot except those listed in excluded_tables
. This improvement makes snapshot configuration more concise and manageable, as you no longer need to explicitly list every table to be included, just define the broader set in tables
and exclude the few you don't want.
Example config:
With this config, pgstream will snapshot all the tables in the public
schema, except for test2
.
āØļø CLI improvements
Weāve also made several improvements to the pgstream CLI in this release:
- The
pgstream run
command now supports the--init
flag, allowing you to initialize and run pgstream in a single step. - The
init
command is now idempotent, if pgstream is already initialized, re-running it will have no effect and wonāt result in errors. - The old
tear-down
command has been deprecated in favor ofdestroy
. The newpgstream destroy
performs the same cleanup actions astear-down
previously did. - Lastly, we've added a
--dump-file
option to both thesnapshot
andrun
commands. This lets you specify a file path where thepg_dump
output will be saved for debugging purposes.
š Community requests
- We've added support for the
citext
type in theneosync_email
transformer, as requested by a pgstream community member. You can now seamlessly transform email columns that use thecitext
data type! See the docs for more details! - We've addressed many fix requests by
pgstream
community, such as the replication tables fix, or the case when snapshotting with timestamps with infinity value.
We appreciate community suggestions and request and looking forward to having even more of them!
Conclusion
With the latest features discussed in the blogpost, you can build robust, compliant, and efficient data workflows. Whether you're replicating data to downstream systems, anonymizing sensitive information, or creating snapshots, pgstream has the tools you need.
If you have any suggestions or questions, you can reach out to us on Discord or follow us on X / Twitter or Bluesky. We welcome any feedback in issues, or contributions via pull requests! š
Ready to get started? Check out the pgstream documentation for more details.
Related Posts
pgstream v0.7.1: JSON transformers, progress tracking and wildcard support for snapshots
Learn how pgstream v0.7.1 transforms JSON data, improves snapshot experience with progress tracking and wildcard support.
pgstream v0.6.0: Template transformers, observability, and performance improvements
Learn how pgstream v0.6 simplifies complex data transformations with custom templates, enhances observability and improves snapshot performance.
pgstream v0.5.0: New transformers, YAML configuration, CLI refactoring & table filtering
Improved user experience with new transformers, YAML configuration, CLI refactoring and table filtering.
pgstream v0.4.0: Postgres-to-Postgres replication, snapshots & transformations
Learn how the latest features in pgstream refine Postgres replication with near real-time data capture, consistent snapshots, and column-level transformations.
Introducing pgstream: Postgres replication with DDL changes
Today weāre excited to expand our open source Postgres platform with pgstream, a CDC command line tool and library for PostgreSQL with replication support for DDL changes to any provided output.