Data masking and anonymization
for PostgreSQL
Protect privacy
Your business requires you to handle sensitive data like PII (Personal Identifiable Information) and PHI (Protected Health Information). We help you keep it safe.
Stay compliant
Regulations like GDPR, HIPAA, and PCI require that staging and development environments don't contain any PII or PHI.
Avoid costly mistakes
Real email addresses or phone numbers in your test data? This can lead to embarrassing mistakes and costly data breaches.
Open source
Data anonymization is implemented in our open source pgstream project, which gives you flexibility in terms of deployment options, transformers, and configuration.
Deterministic transformers
Deterministic transformers ensure that the same data will be masked the same way every time. This means that your relational data stays consistent.
Any transformation
Data anonymization requirements are complex. You can use a wide range of transformers from open source projects like Greenmask and NeoSync, as well as implement custom transformers.
Testing for companies that handle sensitive data
Testing with real data is too risky. Testing with seeded data is often not enough to catch bugs.
The challenges you are facing
Using raw production data. in staging or testing risks exposing sensitive information.
Seeded data is too small in size. This means performance issues or bugs are not caught before production.
Creating a realistic dataset for testing is hard. It's hard to generate synthetic data that is representative of the production data.
How Xata solves them
Mask sensitive data. Xata anonymizes sensitive data while keeping it realistic and structurally intact.
Use anonymized data from production. This means your testing data set is as large as your production data set.
Automate creating a realistic data set. A production snapshot is taken nightly and automatically anonymized according to your rules.
Powered by pgstream
pgstream is an open source project for PostgreSQL data replication and transformation. It is using PostgreSQL logical replication (including DDL statements) as well as parallel snapshotting in order to copy data at maximum throughput.
You can schedule pgstream via your CI/CD pipeline to execute nightly and create an anonymized snapshot of your production database, which is then written to your staging database.
The snapshots can be created from a read-replica, in order to minimize the impact on the production database.
Visit pgstream on GitHubDeterministic and realistic transformers
Anonymization is accomplished by applying a set of column value transformations. Deterministic transformers ensure that the same data will be masked the same way every time. This means that your relational data stays consistent.
pgstream integrates with existing transformer open source libraries, such as Greenmask, NeoSync and go-masker, to leverage a large amount of transformation capabilities, as well as having support for custom transformations written in Golang. This way even the most complex anonymization requirements can be met.
Read transformers documentation in pgstreamData subsetting (coming soon)
In case your production data is measured in terabytes, it might be more practical to create a smaller staging dataset, yet still large enough to catch performance issues.
Creating this subset is challenging, because it requires keeping track of the relations between tables and subsetting each table in such a way that the data is consistent and interconnected.
In the example diagram, we request that the 'orders' table is subsetted to 5% of the total size. The subsetting logic automatically follows the foreign key relations to the 'users' and 'products' tables and filters the data accordingly.
Part of a full solution for PostgreSQL
Xata is the only solution on the market that combines advanced data anonymization with instant Copy-on-Write branches for PostgreSQL.Connect to your production database
Keep your production database where it is, whether it is AWS RDS, Aurora, GCP Cloud SQL, Azure Database, or even self-hosted.
Anonymize PII or other sensitive data
Mask sensitive data using configurable transformers that maintain referential integrity.
Staging replica with realistic data
Get a nightly synced replica of production with sensitive data removed. Ready for testing.
Learn moreInstant dev branches
From the staging environment, instantly create Copy-on-Write branches for each pull request to accelerate development, testing, and collaboration.
Learn moreDeploy to production without downtime
Apply database changes confidently with pgroll, serving old and new schema versions in parallel for smooth, lock-free migrations.
Learn more