pgstream v0.9.0: Better schema replication, snapshots and cloud support
Bringing connection retries, anonymizer features, memory improvements and solid community input.
Author
Esther Minano SanzDate published
We just shipped pgstream v0.9.0, and it comes packed with new features, memory improvements and some much appreciated contributions from the community. In this post we’ll go over the key things that landed since the last minor release. If you’re still on an older version, it’s time to upgrade!
ℹ️ You can find the complete release notes on the Github v0.9.0 release page.
Before we jump into the new stuff, here’s a quick reminder of what pgstream is.
What is pgstream?
pgstream is an open source CDC tool and library that offers Postgres replication support with DDL changes. Some of its key features include:
- Replication of DDL changes: Schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
- Modular deployment configuration: pgstream’s modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. It can also integrate easily with Kafka for more complex workflows.
- Out of the box supported targets:
- Postgres: Replicate to Postgres databases with support for schema changes and batch processing.
- Elasticsearch / Opensearch: Replicate to search stores with special handling of field IDs to minimize reindexing.
- Webhooks: Subscribe and receive webhook notifications whenever your source data changes.
- Snapshots: Capture a consistent view of your Postgres database at a specific point in time, either as an initial snapshot before starting replication or as a standalone process when replication is not needed.
- Column transformations: Modify column values during replication or snapshots, which is particularly useful for anonymizing sensitive data.
For more details on how pgstream works under the hood, check out the full documentation.
What is new?
With this release, we focused on making pgstream easier to adopt and more production ready. Let’s look at the key changes driving that effort!
☁️ Cloud provider support
pgstream now integrates better with managed PostgreSQL services. This includes support for:
You can also find onboarding documentation that shows how to configure pgstream with the restricted privileges common in cloud environments.
🌳 Better schema replication
Schema replication got a big upgrade in this release! pgstream has always aimed to make it easy to track both data changes and schema changes in PostgreSQL, and it can now track more of the structure of your database, including indexes, constraints and foreign keys.
In addition to the enriched schema tracking, we also added support for capturing the schema when SELECT INTO and CREATE TABLE AS statements are used, which were ignored before. That meant tables created with those statements weren’t being replicated downstream… until now!
And for those of you using IDENTITY or generated columns, pgstream now supports replicating them as well, including changes such as switching from ALWAYS to BY DEFAULT, or dropping generated column expressions.
All of this makes downstream schemas more faithful to the source and reduces cases where the target didn’t behave quite like the original database.
📷 Snapshot improvements
Snapshots are often the first pain point when adopting CDC, especially on large datasets. One of the biggest issues during large snapshots is the lack of observability into progress. pgstream already tracks data progress, but schema operations such as index creation rely on pg_dump / pg_restore under the hood, which historically gave very little insight into what was happening.
In this release we added index creation progress tracking by leveraging Postgres’ pg_stat_progress_create_index view and checking the progress automatically for you. This makes the difference between a process that seems stuck and one that clearly tells you what it’s doing without overwhelming you with debug logs.
In addition to the added visibility into index creation progress, it is now also possible to snapshot only the schema or only the data. This offers much more flexibility, especially if you have a large database.
🔌 Reconnection retries
Until now, Postgres connections would be closed whenever an error occurred. This meant that database restarts, upgrades, or transient network issues could stop the pgstream process. In this release we introduced connection retries for both the replication connection and the target Postgres writer, so pgstream can recover gracefully when something goes wrong. The retry strategy is configurable, giving you full control over how pgstream behaves when things go south.
With retries comes improved error mapping, with more granular Postgres error parsing. This gives us better control over which errors are retriable and which aren’t. And the best part is that the improved error mapping benefits all workflows, not just retries.
🔀 PostgreSQL Anonymizer Transformer
pgstream now integrates with the PostgreSQL Anonymizer extension (also known as anon). This means you can use supported masking rules directly in pgstream transformation rules.
Even better, if you have masking rules defined in your source Postgres database already, pgstream can now infer the transformation rules from your SECURITY LABELS and apply them automatically during snapshot and replication flows without the need to configure them manually.
Check out more about this integration here.
💾 Memory improvements
We also made a series of changes to reduce memory usage, especially for long running jobs or setups that process a lot of data. pgstream now holds onto fewer in-memory structures during streaming and snapshots, and it cleans up internal buffers more efficiently. This keeps memory usage stable over time and makes pgstream more dependable when running continuously in production.
⭐ Community Spotlight
We’d like to shout out two contributors who helped make this release stronger:
- @Arochka improved how pgstream handles schema changes (you can find more about that in the better schema replication section above).
- @spongenee added a new email transformer for data anonymization, giving you more control over email masking in streaming and snapshot workflows.
Thanks so much to both of you for rolling up your sleeves and making real improvements that benefit everyone using pgstream 💟.
Conclusion
With everything packed into this release, pgstream is now even better suited for building reliable, flexible and production ready data workflows. Whether you’re streaming changes to downstream systems, keeping schemas in sync, taking snapshots at scale or applying transformations along the way, pgstream gives you the tools to do it with confidence.
If you have suggestions, questions, or ideas for what we should tackle next, you can reach us on Discord or follow us on X / Twitter or Bluesky. We’re always happy to hear from you. Feedback is welcome in issues, and contributions via pull requests are even better! 💜
Ready to try it out? Head to the pgstream documentation to get started!
Related Posts
pgstream v0.8.1: hstore transformer, roles snapshotting, CLI improvements and more
Learn how pgstream v0.8.1 transforms hstore data and improves snapshot experience with roles snapshotting and excluded tables option
pgstream v0.7.1: JSON transformers, progress tracking and wildcard support for snapshots
Learn how pgstream v0.7.1 transforms JSON data, improves snapshot experience with progress tracking and wildcard support.
pgstream v0.6.0: Template transformers, observability, and performance improvements
Learn how pgstream v0.6 simplifies complex data transformations with custom templates, enhances observability and improves snapshot performance.
pgstream v0.5.0: New transformers, YAML configuration, CLI refactoring & table filtering
Improved user experience with new transformers, YAML configuration, CLI refactoring and table filtering.
pgstream v0.4.0: Postgres-to-Postgres replication, snapshots & transformations
Learn how the latest features in pgstream refine Postgres replication with near real-time data capture, consistent snapshots, and column-level transformations.
Behind the scenes: Speeding up pgstream snapshots for PostgreSQL
How targeted improvements helped us speed up bulk data loads and complex schemas.
Introducing pgstream: Postgres replication with DDL changes
Today we’re excited to expand our open source Postgres platform with pgstream, a CDC command line tool and library for PostgreSQL with replication support for DDL changes to any provided output.
Postgres Cafe: Solving schema replication gaps with pgstream
In this episode of Postgres Café, we discuss pgstream, an open-source tool for capturing and replicating schema and data changes in PostgreSQL. Learn how it solves schema replication challenges and enhances data pipelines.
Postgres webhooks with pgstream
A simple tutorial for calling webhooks on Postgres data and schema changes using pgstream.