Introduction
This tutorial will showcase the use of pgstream to replicate data from PostgreSQL using transformers. We’ll use a target PostgreSQL database as target, but the same would apply to any of the supported targets.Requirements
- A source PostgreSQL database
- A target PostgreSQL database
- pgstream (see installation instructions for more details)
Demo
Environment setup
The first step is to start the two PostgreSQL databases that will be used as source and target for replication. Thepgstream repository provides a docker installation that will be used for the purposes of this tutorial, but can be replaced by any available PostgreSQL servers, as long as they have wal2json installed.
To start the docker provided PostgreSQL servers, run the following command:
5432 and 7654.
Database initialisation
Once both PostgreSQL servers are up and running, the next step is to initialise pgstream on the source database. This will create thepgstream schema in the configured Postgres database, along with the tables/functions/triggers required to keep track of the schema changes. See Tracking schema changes section for more details. This step will also create a replication slot on the source database which will be used by the pgstream service.
The initialisation step allows to provide both the URL of the PostgreSQL database and the name of the replication slot to be created. The PostgreSQL URL is required, but the replication slot name is optional. If not provided, it will default to pgstream_<dbname>_slot, where <dbname> is the name of the PostgreSQL database. The configuration can be provided either by using the CLI supported parameters, or using the environment variables.
For this tutorial, we’ll create a replication slot with the name pgstream_tutorial_slot.
-
Using the
--initflag in theruncommand - Using CLI flags:
- Using environment variables:
destroy CLI command.
Prepare pgstream configuration
Listener
The listener configuration will be the same as the one in the PostgreSQL to PostgreSQL replication tutorial. Check the details here.- Without initial snapshot
- With initial snapshot
Processor
The processor configuration will be the same as the one in the PostgreSQL to PostgreSQL replication tutorial. Check the details here.pg2pg_transformer_tutorial.env file to be used in the next step. An equivalent pg2pg_transformer_tutorial.yaml configuration can be found below the environment one, and can be used interchangeably.
- Without initial snapshot
- With initial snapshot
Validate pgstream status
We can validate that the initialisation and the configuration are valid by running the status command before starting pgstream.
Run pgstream
With the configuration ready, we can now run pgstream. In this case we set the log level as trace to provide more context for debugging and have more visibility into what pgstream is doing under the hood.
Important: Ensure that the source and target databases are running before proceeding.
Verify data transformation
When we check the target PostgreSQL database, we should see the three rows have been inserted, but both the name and the emails have been anonymised.Troubleshooting
-
Error:
Connection refused- Ensure the Docker containers for the source and target databases are running.
- Verify the database URLs in the configuration.
-
Error:
Replication slot not found- Check that the replication slot was created successfully during initialization.
- Run the
pgstream statuscommand to validate the initialisation was successful. - Run the following query on the source database to verify:
-
Error:
Transformation rules not applied- Ensure the
PGSTREAM_TRANSFORMER_RULES_FILEpoints to the correct YAML file. - Validate the syntax of the transformer rules using the pgstream documentation.
- Run the
pgstream statuscommand to validate the transformation rules are correct.
- Ensure the
Summary
In this tutorial, we successfully configuredpgstream to replicate data from a source PostgreSQL database to a target PostgreSQL database, applying transformations to anonymize sensitive data during replication. For more use cases, refer to the pgstream documentation.