pgstream supports the generation of PostgreSQL schema and data snapshots. It can be done as an initial step before starting the replication, or as a standalone mode, where a snapshot of the database is performed without any replication.
The snapshot behaviour is the same in both cases, with the only difference that if we’re listening on the replication slot, we will store the current LSN before performing the snapshot, so that we can replay any operations that happened while the snapshot was ongoing.
The snapshot implementation is different for schema and data.
-
Schema: it relies on
pg_dumpto produce the dump of the schema to be snapshotted. For Postgres targets it relies onpg_restorefor restoring the schema, while for other targets it emits DDL events into the WAL pipeline to be processed. -
Data: it relies on transaction snapshot ids to obtain a stable view of the database tables, and paralellises the read of all the rows by dividing them into ranges using the
ctid.