Copy-on-write storage
for Postgres

Xata uses copy-on-write storage to create full Postgres branches without duplicating data. Share the same underlying data and only store what changes, making branching fast, efficient, and scalable.

Copy-on-write branching

Create full Postgres branches instantly, with schema and data, without copying your database.

No duplicated storage

Branches share the same underlying data and only store what changes.

Separation of storage and compute

Scale storage and compute independently, with clear and efficient pricing.

High-performance storage

Built on NVMe/TCP for low latency and high throughput.

Bottomless storage

Scale storage as needed and pay only for what you use, no over-provisioning.

100% vanilla PostgreSQL

No forks or modifications. Use standard Postgres tools, extensions, and ecosystem.

How copy-on-write storage works

Xata implements data branching at the storage layer. This enables very fast branching operations, regardless of how large the data is. It also reduces the amount of disk space used. Let's look at how it works.

1

Initial state

In the initial state, we start with a single branch, the parent branch. The volume associated with this branch is mounted on the PostgreSQL instance. The data is organized into blocks (numbered 1 to 8 in the diagram) and an index is used to keep track of the blocks.

2

Creating a new branch

When a new database branch is created, initially only the index is copied. The new child branch references the same data blocks as the parent branch. This is why branching is so fast regardless of the size of the branch, because no significant amount of data is copied at this stage.

3

Copy only modified blocks

As writes are received, either by the parent branch or the child branch, the blocks that are written to are copied. When this happens, each branch references their own copy of the blocks. In the diagram, this has happened for blocks 3 and 6. Assuming very few blocks are written after the branch operations, this results in significant disk space savings.

Learn more about the Xata storage layer

Separated storage and compute architecture

The Xata platform combines well known and tested technologies with an innovative storage system to offer a high performance, scalable, and secure PostgreSQL service.

1

Compute on top of Kubernetes

The PostgreSQL instances are running inside a Kubernetes cluster, and are managed by the CloudNativePG operator, one of the most stable, feature rich, and popular operators for PostgreSQL on Kubernetes.

2

Logical storage volumes mounted via CSI

A Kubernetes CSI driver is used to mount the logical storage volumes to the PostgreSQL pods. This allows for the storage volumes to be used by multiple PostgreSQL pods, and for the storage volumes to be resized as needed.

3

NVMe over Fabrics (NVMe/TCP)

The storage volumes are connected to the PostgreSQL pods via NVMe/TCP, which is a high-performance network protocol for block storage.

4

Distributed storage cluster

The storage cluster uses multiple storage nodes (e.g. EC2 instances) and the data blocks are automatically distributed across them, using parity to protect against node failures.

NVMe over Fabrics (NVMe/TCP) and SPDK

NVMe over Fabrics (NVMe/TCP) and SPDK are used to offer ultra-high performance and low latency.

NVMe is a high-speed storage protocol that maximizes storage performance through low latency and parallel processing. NVMe over Fabrics, including NVMe/TCP, extends these benefits over network connections, enabling scalable and efficient remote storage with minimal performance loss.

  • High-performance parallel processing
  • Polling in user-space via SPDK
  • Reduced context switching and CPU overhead
  • Zero-copy data transfer

SPDK (Storage Performance Development Kit) is an open source set of tools and libraries designed to maximize storage performance by enabling user-space, zero-copy, and asynchronous access to storage devices—bypassing traditional kernel-based I/O stacks. Developed by Intel, SPDK is optimized for NVMe and NVMe over Fabrics and leverages technologies like DPDK (Data Plane Development Kit) to achieve extremely low latency and high throughput.

By running entirely in user space and utilizing polling rather than interrupts, SPDK minimizes context switching and CPU overhead, making it ideal for high-performance storage applications such as databases, hyper-converged infrastructure, and software-defined storage systems.

Postgres for agent scale.

Use your existing Postgres. Run it better with Xata.