pgstream v0.9.0: Better schema replication, snapshots and cloud support

We just shipped pgstream v0.9.0, and it comes packed with new features, memory improvements and some much appreciated contributions from the community. In this post we’ll go over the key things that landed since the last minor release. If you’re still on an older version, it’s time to upgrade!

ℹ️ You can find the complete release notes on the Github v0.9.0 release page.

Before we jump into the new stuff, here’s a quick reminder of what pgstream is.

What is pgstream?

pgstream is an open source CDC tool and library that offers Postgres replication support with DDL changes. Some of its key features include:

Replication of DDL changes: Schema changes are tracked and seamlessly replicated downstream alongside the data, avoiding manual intervention and data loss.
Modular deployment configuration: pgstream’s modular implementation allows it to be configured for simple use cases, removing unnecessary complexity and deployment challenges. It can also integrate easily with Kafka for more complex workflows.
Out of the box supported targets:
- Postgres: Replicate to Postgres databases with support for schema changes and batch processing.
- Elasticsearch / Opensearch: Replicate to search stores with special handling of field IDs to minimize reindexing.
- Webhooks: Subscribe and receive webhook notifications whenever your source data changes.
Snapshots: Capture a consistent view of your Postgres database at a specific point in time, either as an initial snapshot before starting replication or as a standalone process when replication is not needed.
Column transformations: Modify column values during replication or snapshots, which is particularly useful for anonymizing sensitive data.

For more details on how pgstream works under the hood, check out the full documentation.

What is new?

With this release, we focused on making pgstream easier to adopt and more production ready. Let’s look at the key changes driving that effort!

☁️ Cloud provider support

pgstream now integrates better with managed PostgreSQL services. This includes support for:

You can also find onboarding documentation that shows how to configure pgstream with the restricted privileges common in cloud environments.

🌳 Better schema replication

Schema replication got a big upgrade in this release! pgstream has always aimed to make it easy to track both data changes and schema changes in PostgreSQL, and it can now track more of the structure of your database, including indexes, constraints and foreign keys.

In addition to the enriched schema tracking, we also added support for capturing the schema when SELECT INTO and CREATE TABLE AS statements are used, which were ignored before. That meant tables created with those statements weren’t being replicated downstream… until now!

And for those of you using IDENTITY or generated columns, pgstream now supports replicating them as well, including changes such as switching from ALWAYS to BY DEFAULT, or dropping generated column expressions.

All of this makes downstream schemas more faithful to the source and reduces cases where the target didn’t behave quite like the original database.

📷 Snapshot improvements

Snapshots are often the first pain point when adopting CDC, especially on large datasets. One of the biggest issues during large snapshots is the lack of observability into progress. pgstream already tracks data progress, but schema operations such as index creation rely on pg_dump / pg_restore under the hood, which historically gave very little insight into what was happening.

In this release we added index creation progress tracking by leveraging Postgres’ pg_stat_progress_create_index view and checking the progress automatically for you. This makes the difference between a process that seems stuck and one that clearly tells you what it’s doing without overwhelming you with debug logs.

In addition to the added visibility into index creation progress, it is now also possible to snapshot only the schema or only the data. This offers much more flexibility, especially if you have a large database.

🔌 Reconnection retries

Until now, Postgres connections would be closed whenever an error occurred. This meant that database restarts, upgrades, or transient network issues could stop the pgstream process. In this release we introduced connection retries for both the replication connection and the target Postgres writer, so pgstream can recover gracefully when something goes wrong. The retry strategy is configurable, giving you full control over how pgstream behaves when things go south.

With retries comes improved error mapping, with more granular Postgres error parsing. This gives us better control over which errors are retriable and which aren’t. And the best part is that the improved error mapping benefits all workflows, not just retries.

🔀 PostgreSQL Anonymizer Transformer

pgstream now integrates with the PostgreSQL Anonymizer extension (also known as anon). This means you can use supported masking rules directly in pgstream transformation rules.

Even better, if you have masking rules defined in your source Postgres database already, pgstream can now infer the transformation rules from your SECURITY LABELS and apply them automatically during snapshot and replication flows without the need to configure them manually.

Check out more about this integration here.

💾 Memory improvements

We also made a series of changes to reduce memory usage, especially for long running jobs or setups that process a lot of data. pgstream now holds onto fewer in-memory structures during streaming and snapshots, and it cleans up internal buffers more efficiently. This keeps memory usage stable over time and makes pgstream more dependable when running continuously in production.

⭐ Community Spotlight

We’d like to shout out two contributors who helped make this release stronger:

@Arochka improved how pgstream handles schema changes (you can find more about that in the better schema replication section above).
@spongenee added a new email transformer for data anonymization, giving you more control over email masking in streaming and snapshot workflows.

Thanks so much to both of you for rolling up your sleeves and making real improvements that benefit everyone using pgstream 💟.

Conclusion

With everything packed into this release, pgstream is now even better suited for building reliable, flexible and production ready data workflows. Whether you’re streaming changes to downstream systems, keeping schemas in sync, taking snapshots at scale or applying transformations along the way, pgstream gives you the tools to do it with confidence.

If you have suggestions, questions, or ideas for what we should tackle next, you can reach us on Discord or follow us on X / Twitter or Bluesky. We’re always happy to hear from you. Feedback is welcome in issues, and contributions via pull requests are even better! 💜

Ready to try it out? Head to the pgstream documentation to get started!

pgstream v0.9.0: Better schema replication, snapshots and cloud support

What is pgstream?

What is new?

☁️ Cloud provider support

🌳 Better schema replication

📷 Snapshot improvements

🔌 Reconnection retries

🔀 PostgreSQL Anonymizer Transformer

💾 Memory improvements

⭐ Community Spotlight

Conclusion

Related Posts

pgstream v0.8.1: hstore transformer, roles snapshotting, CLI improvements and more

pgstream v0.7.1: JSON transformers, progress tracking and wildcard support for snapshots

pgstream v0.6.0: Template transformers, observability, and performance improvements

pgstream v0.5.0: New transformers, YAML configuration, CLI refactoring & table filtering

pgstream v0.4.0: Postgres-to-Postgres replication, snapshots & transformations

Behind the scenes: Speeding up pgstream snapshots for PostgreSQL

Introducing pgstream: Postgres replication with DDL changes

Postgres Cafe: Solving schema replication gaps with pgstream

Postgres webhooks with pgstream