pgstream v0.6.0: Template transformers, observability, and performance improvements
Learn how pgstream v0.6 simplifies complex data transformations with custom templates, enhances observability and improves snapshot performance.
Author
Ahmet GedemenliDate published
We're proud to announce the release of v0.6
of pgstream, our open-source CDC (Change Data Capture) tool for Postgres! 🚀 In this blog post, we'll discuss some of the key features packed into this latest release, and look at what the future holds!
You can find the complete release notes on the Github v0.6.0 release page.
What is pgstream?
pgstream is an open source CDC tool and library that offers Postgres replication support with DDL changes. Some of its key features include:
- Replication of DDL changes: Schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
- Modular deployment configuration: pgstream modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. However, it can also easily integrate with Kafka for more complex use cases.
- Out of the box supported targets:
- Postgres: Replication to Postgres databases with support for schema changes and batch processing.
- Elasticsearch/Opensearch: Replication to search stores with special handling of field IDs to minimize re-indexing.
- Webhooks: Subscribe and receive webhook notifications whenever your source data changes.
- Snapshots: Capture a consistent view of your Postgres database at a specific point in time, either as an initial snapshot before starting replication or as a standalone process when replication is not needed.
- Column transformations: Modify column values during replication or snapshots, which is particularly useful for anonymizing sensitive data.
For more details on how pgstream works under the hood, check out the full documentation.
What's new?
This update focuses on improving the usability of pgstream, by adding new column transformers, especially template transformation support for increased flexibility; and by adding instrumentation for better observability. This release also includes some performance improvements for the snapshot process, which we’ll go over in more detail as part of a follow up blogpost. Stay tuned! 🏎️
🔐 More data transformations & template support
After introducing advanced transformers in v0.5
, we're continuing to expand transformation capabilities of pgstream in this release. New transformers for last name and full name powered by Neosync are now available, along with template transformers built on open source libraries. Let's take a closer look at some of these new transformation features.
Template support
Instead of directly using a transformation function to anonymize sensitive information, you can now create your own Go template with more complex logic. The template transformer is highly flexible and can be used for any Postgres type as long as the given template produces a value with correct syntax for the column type's string representation. e.g The output can be "5-10-2021" for a date column, or "3.14159265" for a double precision one.
The template transformer supports a variety of useful functions.
- Use
.GetValue
to refer to the value to be transformed. - Use
.GetDynamicValue "<column_name>"
to refer to some other column value. - Standard Go template functions
-
greenmask
's huge set of core functions includingmasking
function by go-masker and various random data generator functions powered by the open source library faker. - sprig's many useful helper functions.
With the below example config, pgstream masks values in the column email
of the table users
, using go-masker
's email masking function. But first, this template checks if there's a non-empty value to be used in the column email
. If not, it simply looks for another column named secondary_email
and uses that instead. Then we have another check to see if it's a @xata
email or not, using sprig
's function contains
. Finally masking the value, only if it's not a @xata
email, passing it without a mask otherwise.
It is highly recommended to take a look into the open source libraries & functions supported by template transformer. Check out template transformer docs.
Last name & full name transformers
With this release we are expanding our large set of supported transformers, by adding last name and full name transformers, powered by Neosync under the hood. In both transformers the length and randomness can be configured optionally.
In the following example, we have a users
table with a full_name
column. The full name transformer will generate a random full name which has the same length as the original one, because the preserve_length
parameter is set to true
.
For more details on the new transformers, check out the supported transformers section in the pgstream documentation.
For more details on how to set up and use transformers with pgstream, check out the transformers tutorial.
🔭 Improved observability
As we mentioned at the beginning of this article, this release includes some performance improvements for the snapshot process. In order to figure out where the bottlenecks were and what parts to optimize, we invested some time in improving the existing instrumentation. Some of those improvements include:
- Enable observability from the CLI (only possible when used as a library before)
- Easily integrate any opentelemetry provider into pgstream
- SigNoz pgstream dashboard (because we love open source projects 💜) available with all metrics collected internally
- Sections for all relevant pgstream modules, including snapshot, replication, target, modifiers and runtime sections
Conclusion
With the latest features discussed in the blogpost, you can build robust, compliant, and efficient data workflows. Whether you're replicating data to downstream systems, anonymizing sensitive information, or creating snapshots, pgstream has the tools you need.
If you have any suggestions or questions, you can reach out to us on Discord or follow us on X / Twitter or Bluesky. We welcome any feedback in issues, or contributions via pull requests! 💜
Ready to get started? Check out the pgstream documentation for more details.
Related Posts
pgstream v0.5.0: New transformers, YAML configuration, CLI refactoring & table filtering
Improved user experience with new transformers, YAML configuration, CLI refactoring and table filtering.
pgstream v0.4.0: Postgres-to-Postgres replication, snapshots & transformations
Learn how the latest features in pgstream refine Postgres replication with near real-time data capture, consistent snapshots, and column-level transformations.
Introducing pgstream: Postgres replication with DDL changes
Today we’re excited to expand our open source Postgres platform with pgstream, a CDC command line tool and library for PostgreSQL with replication support for DDL changes to any provided output.
Postgres Cafe: Solving schema replication gaps with pgstream
In this episode of Postgres Café, we discuss pgstream, an open-source tool for capturing and replicating schema and data changes in PostgreSQL. Learn how it solves schema replication challenges and enhances data pipelines.