Data masking and anonymization
for PostgreSQL

Dev and testing environments for companies that have sensitive data and PII.

Protect privacy

Your business requires you to handle sensitive data like PII (Personal Identifiable Information) and PHI (Protected Health Information). We help you keep it safe.

Stay compliant

Regulations like GDPR, HIPAA, and PCI require that staging and development environments don't contain any PII or PHI.

Avoid costly mistakes

Real email addresses or phone numbers in your test data? This can lead to embarrassing mistakes and costly data breaches.

Open source

Data anonymization is implemented in our open source pgstream project, which gives you flexibility in terms of deployment options, transformers, and configuration.

Deterministic transformers

Deterministic transformers ensure that the same data will be masked the same way every time. This means that your relational data stays consistent.

Any transformation

Data anonymization requirements are complex. You can use a wide range of transformers from open source projects like Greenmask and NeoSync, as well as implement custom transformers.

Testing for companies that handle sensitive data

Testing with real data is too risky. Testing with seeded data is often not enough to catch bugs.

The challenges you are facing

  • Using raw production data. in staging or testing risks exposing sensitive information.

  • Seeded data is too small in size. This means performance issues or bugs are not caught before production.

  • Creating a realistic dataset for testing is hard. It's hard to generate synthetic data that is representative of the production data.

How Xata solves them

  • Mask sensitive data. Xata anonymizes sensitive data while keeping it realistic and structurally intact.

  • Use anonymized data from production. This means your testing data set is as large as your production data set.

  • Automate creating a realistic data set. A production snapshot is taken nightly and automatically anonymized according to your rules.

Powered by pgstream

pgstream is an open source project for PostgreSQL data replication and transformation. It is using PostgreSQL logical replication (including DDL statements) as well as parallel snapshotting in order to copy data at maximum throughput.

You can schedule pgstream via your CI/CD pipeline to execute nightly and create an anonymized snapshot of your production database, which is then written to your staging database.

The snapshots can be created from a read-replica, in order to minimize the impact on the production database.

Visit pgstream on GitHub

Deterministic and realistic transformers

Anonymization is accomplished by applying a set of column value transformations. Deterministic transformers ensure that the same data will be masked the same way every time. This means that your relational data stays consistent.

pgstream integrates with existing transformer open source libraries, such as Greenmask, NeoSync and go-masker, to leverage a large amount of transformation capabilities, as well as having support for custom transformations written in Golang. This way even the most complex anonymization requirements can be met.

Read transformers documentation in pgstream

Data subsetting (coming soon)

In case your production data is measured in terabytes, it might be more practical to create a smaller staging dataset, yet still large enough to catch performance issues.

Creating this subset is challenging, because it requires keeping track of the relations between tables and subsetting each table in such a way that the data is consistent and interconnected.

In the example diagram, we request that the 'orders' table is subsetted to 5% of the total size. The subsetting logic automatically follows the foreign key relations to the 'users' and 'products' tables and filters the data accordingly.

Part of a full solution for PostgreSQL

Xata is the only solution on the market that combines advanced data anonymization with instant Copy-on-Write branches for PostgreSQL.
1

Connect to your production database

Keep your production database where it is, whether it is AWS RDS, Aurora, GCP Cloud SQL, Azure Database, or even self-hosted.

2

Anonymize PII or other sensitive data

Mask sensitive data using configurable transformers that maintain referential integrity.

3

Staging replica with realistic data

Get a nightly synced replica of production with sensitive data removed. Ready for testing.

Learn more
4

Instant dev branches

From the staging environment, instantly create Copy-on-Write branches for each pull request to accelerate development, testing, and collaboration.

Learn more
5

Deploy to production without downtime

Apply database changes confidently with pgroll, serving old and new schema versions in parallel for smooth, lock-free migrations.

Learn more

Get started with Xata

Sign up to access the Xata platform or book a demo to learn how Xata works from the engineers who built it.