Set up a logical streaming replica
Use Xata's streaming replication to keep your database continuously synchronized with real-time changes.
This guide shows you how to set up continuous logical streaming replication from your production PostgreSQL database to Xata, enabling real-time data synchronization with optional anonymization.
1. Prerequisites
- A Xata account (sign up here)
- The Xata CLI installed:
curl -fsSL https://xata.io/install.sh | bash
- A PostgreSQL database with:
- Logical replication enabled
- Role with permissions to create a replication slow (
xata clone stream
command does that automatically) - Network connectivity from Xata to your database
2. Enable logical replication on source database
First, ensure your source PostgreSQL database has logical replication enabled. You'll need to set these parameters:
-- Check current settings
SHOW wal_level;
SHOW max_replication_slots;
SHOW max_wal_senders;
If not already configured, update your PostgreSQL configuration:
ALTER SYSTEM SET wal_level = logical;
ALTER SYSTEM SET max_replication_slots = 10;
ALTER SYSTEM SET max_wal_senders = 10;
Restart your PostgreSQL instance for the changes to take effect.
3. Create a Xata project and branch
In the Console, create a new project and then click the Create main branch button to create the PostgreSQL instance.
For streaming replication, consider using at least 1 replica to ensure high availability during continuous synchronization. Select an instance size that can handle your expected write throughput.
Note: Streaming replication maintains a persistent connection to your source database. Ensure your network allows stable, long-lived connections between Xata and your PostgreSQL instance.
4. Configure the Xata CLI
Authenticate the CLI by running:
xata auth login
Initialize the project by running:
xata init
5. Configure streaming replication
Generate a configuration for the streaming process:
xata clone config --source-url $CONN_STRING
Where CONN_STRING
is your PostgreSQL connection string with replication permissions.
The configuration prompt will ask you to:
- Select tables to replicate
- Set up transformation pipelines i.e. anonymization rules
This creates a configuration file at .xata/clone.yaml
that you can further customize.
6. Initialize and start streaming
xata clone stream --source-url $CONN_STRING
This command will:
- Create an initial snapshot of your specified tables
- Set up the streaming pipeline
- Begin continuous replication
7. Advanced configuration
Filtering specific tables
To stream only specific tables, use the --filter-tables
flag:
xata clone stream --source-url $CONN_STRING \
--filter-tables "users.*,orders.*,products.*"
If this option is not specified it defaults to *.*
Custom transformations
Edit your .xata/clone.yaml
file to add custom transformations:
transforms:
- table: users
columns:
- name: email
transformer: mask_email
- name: phone
transformer: redact
- table: orders
columns:
- name: credit_card
transformer: mask_credit_card
Running with Docker
For production deployments, consider running the streaming process in a containerized environment:
docker run -d \
--name xata-stream \
--restart unless-stopped \
-v $(pwd)/.xata:/config \
xata/cli clone stream \
--source-url $CONN_STRING
10. Handling failures and recovery
If the streaming connection is interrupted, the replication slot ensures no data is lost. Simply restart the streaming command:
xata clone stream --source-url $CONN_STRING
The process will resume from where it left off, catching up with any changes that occurred during the downtime. However, if the too much lag accumulates then the Postgres server might slow down as it has to do both catching up on the lag and its normal operations.
If you terminate the xata clone stream
process and do not wish to run streaming replication again, clean up the replication slot and
other pgstream
objects using xata stream destroy
command.
Not cleaning up the replication slot will cause the WAL to be aggregated continuously and that would lead to full disk space. Use options like max_slot_wal_keep_size
to keep the max WAL size in check.
Summary
- You now have real-time streaming replication (Postgres's logical replication) from your PostgreSQL database to Xata
- Changes in your source database are automatically synchronized
- Your data can be anonymized in transit using configurable transformers
- The replication slot ensures no data loss during network interruptions
For more details on advanced streaming configurations and monitoring, see the clone command documentation.