pgstream v0.6.0: Template transformers, observability, and performance improvements

Learn how pgstream v0.6 simplifies complex data transformations with custom templates, enhances observability and improves snapshot performance.

Author

Ahmet Gedemenli

Date published

We're proud to announce the release of v0.6 of pgstream, our open-source CDC (Change Data Capture) tool for Postgres! 🚀 In this blog post, we'll discuss some of the key features packed into this latest release, and look at what the future holds!

You can find the complete release notes on the Github v0.6.0 release page.

What is pgstream?

pgstream is an open source CDC tool and library that offers Postgres replication support with DDL changes. Some of its key features include:

  • Replication of DDL changes: Schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
  • Modular deployment configuration: pgstream modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. However, it can also easily integrate with Kafka for more complex use cases.
  • Out of the box supported targets:
    • Postgres: Replication to Postgres databases with support for schema changes and batch processing.
    • Elasticsearch/Opensearch: Replication to search stores with special handling of field IDs to minimize re-indexing.
    • Webhooks: Subscribe and receive webhook notifications whenever your source data changes.
  • Snapshots: Capture a consistent view of your Postgres database at a specific point in time, either as an initial snapshot before starting replication or as a standalone process when replication is not needed.
  • Column transformations: Modify column values during replication or snapshots, which is particularly useful for anonymizing sensitive data.

For more details on how pgstream works under the hood, check out the full documentation.

What's new?

This update focuses on improving the usability of pgstream, by adding new column transformers, especially template transformation support for increased flexibility; and by adding instrumentation for better observability. This release also includes some performance improvements for the snapshot process, which we’ll go over in more detail as part of a follow up blogpost. Stay tuned! 🏎️

🔐 More data transformations & template support

After introducing advanced transformers in v0.5, we're continuing to expand transformation capabilities of pgstream in this release. New transformers for last name and full name powered by Neosync are now available, along with template transformers built on open source libraries. Let's take a closer look at some of these new transformation features.

Template support

Instead of directly using a transformation function to anonymize sensitive information, you can now create your own Go template with more complex logic. The template transformer is highly flexible and can be used for any Postgres type as long as the given template produces a value with correct syntax for the column type's string representation. e.g The output can be "5-10-2021" for a date column, or "3.14159265" for a double precision one.

The template transformer supports a variety of useful functions.

  • Use .GetValue to refer to the value to be transformed.
  • Use .GetDynamicValue "<column_name>" to refer to some other column value.
  • Standard Go template functions
  •  greenmask's huge set of core functions including masking function by go-masker and various random data generator functions powered by the open source library faker.
  • sprig's many useful helper functions.

With the below example config, pgstream masks values in the column email of the table users, using go-masker's email masking function. But first, this template checks if there's a non-empty value to be used in the column email. If not, it simply looks for another column named secondary_email and uses that instead. Then we have another check to see if it's a @xata email or not, using sprig's function contains. Finally masking the value, only if it's not a @xata email, passing it without a mask otherwise.

It is highly recommended to take a look into the open source libraries & functions supported by template transformer. Check out template transformer docs.

Last name & full name transformers

With this release we are expanding our large set of supported transformers, by adding last name and full name transformers, powered by Neosync under the hood. In both transformers the length and randomness can be configured optionally.

In the following example, we have a users table with a full_name column. The full name transformer will generate a random full name which has the same length as the original one, because the preserve_length parameter is set to true.

For more details on the new transformers, check out the supported transformers section in the pgstream documentation.

For more details on how to set up and use transformers with pgstream, check out the transformers tutorial.

🔭 Improved observability

As we mentioned at the beginning of this article, this release includes some performance improvements for the snapshot process. In order to figure out where the bottlenecks were and what parts to optimize, we invested some time in improving the existing instrumentation. Some of those improvements include:

  • Enable observability from the CLI (only possible when used as a library before)
  • Easily integrate any opentelemetry provider into pgstream
  • SigNoz pgstream dashboard (because we love open source projects 💜) available with all metrics collected internally
    • Sections for all relevant pgstream modules, including snapshot, replication, target, modifiers and runtime sections

Conclusion

With the latest features discussed in the blogpost, you can build robust, compliant, and efficient data workflows. Whether you're replicating data to downstream systems, anonymizing sensitive information, or creating snapshots, pgstream has the tools you need.

If you have any suggestions or questions, you can reach out to us on Discord or follow us on X / Twitter or Bluesky. We welcome any feedback in issues, or contributions via pull requests! 💜

Ready to get started? Check out the pgstream documentation for more details.

Related Posts

pgstream v0.6.0: Template transformers, observability, and performance improvements | xata