Snapshots

pgstream supports the generation of PostgreSQL schema and data snapshots. It can be done as an initial step before starting the replication, or as a standalone mode, where a snapshot of the database is performed without any replication. The snapshot behaviour is the same in both cases, with the only difference that if we’re listening on the replication slot, we will store the current LSN before performing the snapshot, so that we can replay any operations that happened while the snapshot was ongoing. The snapshot implementation is different for schema and data.

Schema: it relies on pg_dump to produce the dump of the schema to be snapshotted. For Postgres targets it relies on pg_restore for restoring the schema, while for other targets it emits DDL events into the WAL pipeline to be processed.
Data: it relies on transaction snapshot ids to obtain a stable view of the database tables, and paralellises the read of all the rows by dividing them into ranges using the ctid.

For more details into the snapshot implementation and performance benchmarking, check out this blogpost. For details on how to use and configure the snapshot mode, check the snapshot tutorial.

Tutorials

Onboarding

Release Notes