High-performance storage
for PostgreSQL

Our PostgreSQL service separates storage from compute, offers instant data branching, and yet has incredibly high levels of performance. All without forking or modifying PostgreSQL.This is possible thanks to the unique NVMe-over-Fabrics storage technology.

Data branching

Uses Copy-on-Write snapshot and restore in order to create instant PostgreSQL database branches with schema and data!

Separation of storage and compute

On the Xata platform the storage and compute are separated, you can scale them individually, and have segregated pricing.

Ultra-high performance

The Xata architecture uses NVMe/TCP and SPDK technology to offer ultra low latency storage and very high throughput.

Bottomless storage

The Xata storage is billed per GB, and you can grow it as much as you want. You only pay for the storage you use, no need to over-provision.

100% vanilla PostgreSQL

Since the performance improvements and branching are implemented at the storage layer, we don't fork or modify PostgreSQL. You get all features and extensions.

Storage tiering (coming soon)

Keep the most frequently used data on very fast storage (hot tier), medium used data on block storage (warm tier), and the least used data on S3 (cold tier).

Separated storage and compute architecture

The Xata platform combines well known and tested technologies with an innovative storage system to offer a high performance, scalable, and secure PostgreSQL service.

1

Compute on top of Kubernetes

The PostgreSQL instances are running inside a Kubernetes cluster, and are managed by the CloudNativePG operator, one of the most stable, feature rich, and popular operators for PostgreSQL on Kubernetes.

2

Logical storage volumes mounted via CSI

A Kubernetes CSI driver is used to mount the logical storage volumes to the PostgreSQL pods. This allows for the storage volumes to be used by multiple PostgreSQL pods, and for the storage volumes to be resized as needed.

3

NVMe over Fabrics (NVMe/TCP)

The storage volumes are connected to the PostgreSQL pods via NVMe/TCP, which is a high-performance network protocol for block storage.

4

Distributed storage cluster

The storage cluster uses multiple storage nodes (e.g. EC2 instances) and the data blocks are automatically distributed across them, using parity to protect against node failures.

NVMe over Fabrics (NVMe/TCP) and SPDK

NVMe over Fabrics (NVMe/TCP) and SPDK are used to offer ultra-high performance and low latency.

NVMe is a high-speed storage protocol that maximizes storage performance through low latency and parallel processing. NVMe over Fabrics, including NVMe/TCP, extends these benefits over network connections, enabling scalable and efficient remote storage with minimal performance loss.

  • High-performance parallel processing
  • Polling in user-space via SPDK
  • Reduced context switching and CPU overhead
  • Zero-copy data transfer

SPDK (Storage Performance Development Kit) is an open source set of tools and libraries designed to maximize storage performance by enabling user-space, zero-copy, and asynchronous access to storage devices—bypassing traditional kernel-based I/O stacks. Developed by Intel, SPDK is optimized for NVMe and NVMe over Fabrics and leverages technologies like DPDK (Data Plane Development Kit) to achieve extremely low latency and high throughput.

By running entirely in user space and utilizing polling rather than interrupts, SPDK minimizes context switching and CPU overhead, making it ideal for high-performance storage applications such as databases, hyper-converged infrastructure, and software-defined storage systems.

Redundancy with distributed parity

The storage layer protects data using distributed erasure coding, which means that data is striped across multiple storage nodes along with parity fragments.

The storage engine implements erasure coding, a RAID-like system, which uses parity information to protect data and restore it in case of a failure. However, the multi-node system implements this at a higher layer.

While most erasure coding implementation provide a Maximum Tolerable Failure (MFT) in terms of how many disks can fail, our storage engine defines it as the number of nodes that can fail. The erasure coding scheme provides redundancy with minimal storage overhead.

  • Higher durability with less storage overhead than replication.
  • Parity blocks are calculated and stored on a separate storage node.

Storage tiering (coming soon)

Hot/warm/cold storage tiering allows you to optimize your PostgreSQL storage costs by moving data between tiers based on usage patterns.

Optimize performance and cost without having to modify the data layout, use partitioning, or namespaces. You can configure the tiering policy and the storage classes, and the rest is handled automatically.

  • Hot tier: frequently accessed data (e.g., recent transactions, active sessions)
  • Warm tier: Occasionally accessed data (e.g., last month's reports)
  • Cold tier: Rarely accessed or archival data (e.g., historical logs, audit data)