pgstream v0.5.0 update
Improved user experience with new transformers, YAML configuration, CLI refactoring and table filtering.
Author
Esther Minano SanzDate published
We're proud to announce the release of v0.5
of pgstream
, our open-source CDC tool for Postgres! 🚀 In this blog post, we'll dive into some of the key features packed into this latest release, and look at what the future holds!
You can find the complete release notes on the Github v0.5.0 release page.
What is pgstream?
pgstream
is an open source CDC(Change Data Capture) tool and library that offers Postgres replication support with DDL changes. Some of its key features include:
- Replication of DDL changes: schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
- Modular deployment configuration: pgstream modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. However, it can also easily integrate with Kafka for more complex use cases.
- Out of the box supported targets:
- Postgres: replication to Postgres databases with support for schema changes and batch processing.
- Elasticsearch/Opensearch: replication to search stores with special handling of field IDs to minimise re-indexing.
- Webhooks: subscribe and receive webhook notifications whenever your source data changes.
- Snapshots: capture a consistent view of your Postgres database at a specific point in time, either as an initial snapshot before starting replication or as a standalone process when replication is not needed.
- Column transformations: modify column values during replication or snapshots, which is particularly useful for anonymizing sensitive data.
For more details on how pgstream works under the hood, check out the full documentation.
What's new?
This update focuses on improving the usability of pgstream
, from adding new column transformers for added flexibility, to simplifying the configuration management by introducing YAML support, and refining the CLI experience. Also, table filtering is finally here! Let's take a look at the main new features in detail.
🔐 Advanced data transformations
After the introduction of transformers in v0.4
, in this release we continue the work towards improving the transformation capabilities of pgstream
. Masking, phone number and literal transformers, dynamic parameter support, and transformation rules validation are now available. Let's dive a bit deeper into some of these new transformation features!
Masking
Instead of producing random/realistic data to anonymize sensitive information, you can now just simply mask the data or parts of it. Powered by the go-masker
library, it comes with a predefined set of masking functions (password, name, address, email, mobile, telephone, id, credit_card, url), while also offering a custom function in which the user can define the level of masking/unmasking by either providing indexes or percentages (useful when the fields are variable in length).
Example masking rules:
Dynamic parameter support
Supported transformers can now use dynamic parameters, which allows them to define the transformation rules based on the values of different columns in the same row. This is particularly useful for complex transformations that depend on multiple fields.
In the following example, we have a users
table with a mobile_number
and country_code
column. The phone number transformer will use the value of the country_code
column to determine the prefix for the randomly generated mobile phone number.
Transformation rules validation
In order to ensure you don't accidentally forget to add a transformation rule for a column, which could lead to sensitive data leaks, pgstream
transformation rules now expose a validation mode setting. The validation mode can be set to strict
, relaxed
or table_level
.
- relaxed
mode, which is the default, only validates the provided transformations, ensuring the configured transformers are compatible with the table column data types.
- strict
mode checks the transformation rules against the source table schema and enforces the explicit mention of all columns. Not all columns need to have a transformation applied to them (it can be bypassed by just using a noop
transformer or just leaving it unset), but they need to be explicitly mentioned in the configuration.
- table_level
mode means the validation mode is evaluated on a per table basis, allowing to have different validation modes for different tables.
For more details on the new transformers, check out the supported transformers section in the pgstream
documentation.
For more details on how to set up and use transformers with pgstream
, check out the transformers tutorial.
📜 YAML configuration
In this release, we have added support for YAML configuration files. This allows you to define the pgstream
configuration in a more human-readable format, making it easier to manage and share your configurations. The transformation rules are embedded into the same configuration file, simplifying the configuration setup. Environment variables are still supported, but are not compatible with the YAML configuration.
For more details on how to set up and use YAML configuration files with pgstream
, check out the configuration documentation.
🧰 Command-Line Interface (CLI) Refactoring
We decided to spend a bit of time on the CLI, and refactor it to improve the user experience. The new CLI is more intuitive and user-friendly, making it easier to configure and run pgstream
.
- Flags have been added to all commands, removing the need to provide a configuration file. This allows you to quickly set up
pgstream
without needing to create a configuration file, making it easier to get started. It relies on default values for most of the configuration.
- The
snapshot
command is now a separate command from therun
replication command. This allows you to run snapshots independently of replication, making it more straightforward to manage your snapshot workflows (e.g. running a snapshot as a nigthly job).
- A
status
command has been added to validate thepgstream
configuration and initialisation.
For more details on how to use the new CLI, check out the usage documentation or our tutorials section.
🔍 Table level filtering
We recently received some community feedback requesting table level filtering. Up until now, the only way of achieving this was to use pgstream
as a library. In this release we finally added this feature, allowing you to specify which tables to include or exclude from the replication process, giving you more control over the data that is replicated when using the CLI. You can provide the configuration as part of the modifiers section in the new YAML configuration file, or as part of the environment variables.
For more details about the new table filtering configuration, check out the configuration documentation.
Conclusion
With the latest features discussed in this blogpost, you can build robust, compliant, and efficient data workflows. Whether you're replicating data to downstream systems, anonymizing sensitive information, or creating snapshots, pgstream
has the tools you need.
If you have any suggestions or questions, you can reach out to us on Discord or follow us on X / Twitter or Bluesky. We welcome any feedback in issues, or contributions via pull requests! 💜
Ready to get started? Check out the pgstream documentation for more details.
Related Posts
pgstream v0.4.0: Postgres-to-Postgres replication, snapshots & transformations
Learn how the latest features in pgstream refine Postgres replication with near real-time data capture, consistent snapshots, and column-level transformations.
Introducing pgstream: Postgres replication with DDL changes
Today we’re excited to expand our open source Postgres platform with pgstream, a CDC command line tool and library for PostgreSQL with replication support for DDL changes to any provided output.
Postgres Cafe: Solving schema replication gaps with pgstream
In this episode of Postgres Café, we discuss pgstream, an open-source tool for capturing and replicating schema and data changes in PostgreSQL. Learn how it solves schema replication challenges and enhances data pipelines.
Postgres webhooks with pgstream
A simple tutorial for calling webhooks on Postgres data and schema changes using pgstream.