Postgres major version upgrades with minimal downtime

With the beta release of dedicated clusters, we've added the ability to move branches between clusters. Xata customers can now perform Postgres major version upgrades with minimal downtime.

Written by

Carlos Pérez-Aradros Herce

Published on

July 18, 2024

One of the challenges of running a Postgres database is upgrading to a new major version. Major version upgrades can introduce new features, performance improvements, and bug fixes, but they can also be complex and time-consuming.

The most accepted methods for upgrading Postgres to a new major are:

  • pg_upgrade for in-place upgrade, which involves stopping the server while the upgrade is being performed. Even if this usually happens very fast, it means taking the service down for a few minutes, plus leading with issues in the process during that downtime.

  • blue-green deployments, which leverages logical replication to create a standby replica (running a newer version) of the primary server. Once replication is done, a switch over is performed, involving cutting writes to the primary, and switching traffic to the replica. Details on this approach later.

Common issues with major Postgres upgrades are:

  • Downtime: Planned (and unplanned) downtime is not uncommon when a major version upgrade happens, impacting users and business.
  • Complex orchestration: Orchestrating a migration is a multi-step process that requires careful planning, with little room for errors.
  • Adapting clients: Not only the infra part, in many cases also client apps also need to be aware of the migration and update their database connection details to point to a new host once a migration is done (DNS or Proxies can help here).

For all these reasons, major version upgrades are often postponed and become a major event across multiple engineering teams.

#

Moving Database Branches in Xata

The most interesting feature from our dedicated clusters public beta announcement this week is the ability to move databases and branches between clusters. This allows Xata customers to move a database between two Postgres clusters while it's still being accessed. You can use this as a way to move a database from a shared cluster in our free tier to your own dedicated cluster, for better performance & isolation.

This feature also allows to perform near zero-downtime major Postgres version upgrades by moving a database between two clusters running on different versions.

We implemented a blue-green approach in order to offer the ability to move databases, hiding all the complexity behind a simple user experience.

In order to understand the process of moving a database, it is useful to explain some key parts of our infrastructure that contribute to the process: All Xata databases can be accessed from the same proxy service, that handles authentication and routes traffic to the cluster hosting the target database:

Xata clusters and proxy
Xata clusters and proxy

This allows us to have fine grained control over connections, ensure security and many features to come. In the case of moving databases, it guarantees that we are able to redirect traffic to the correct cluster as needed.

This is the process to move a database branch when requested:

Moving a database branch between clusters
Moving a database branch between clusters

We can divide it in four main steps:

In this process we recreate the database in the target cluster and configure replication to start syncing it.

  • Block schema changes in the source cluster. This allows us to replicate the database consistently, without missing any change.
  • Replicate the database schema in the target cluster.
  • Create a publication in the source cluster.
  • Subscribe to it in the target.

With the databases now replicating, we need to wait for the sync process to be up to date, so the replica is following close enough to the primary.

  • Check the replication slots created by the subscription, wait for their LSN distance to be low enough.

Target cluster (replica) is close enough to be promoted as the new primary. We start the switch process. It's critical that this step is performed as quickly as possible, as we want to keep downtime to a minimum:

  • Block writes to the source cell. We do this by revoking write permissions from the accessing roles.
  • Wait for the replication slot. With writes stopped, we need to ensure that all the ones that went through make it to the target cluster.
  • Switch traffic to the target cluster (new primary). All new connections will go to it.
  • Kill previously existing connections to the source cluster, ensuring they reconnect to the new one.

Cleanup all created slots + garbage in the source cell.

With these steps the process is done, the database is now accessible in the newer version. It's important to note that this process offers reasonable guarantees: reads work during the whole process, and writes are only blocked for a brief period of time (normally under 100ms).

We are excited about this new feature, and we have many ideas about how we could evolve it in the future, for example:

  • Providing more detailed information about the move progress.
  • Achieve real zero-downtime by holding, then redirecting writes in the proxy, effectively avoiding killing any connection.
  • Allow for schema changes while move happens
  • Provide more fine grained control over the process. For instance, allowing users to decide when to switch, and allow to send traffic to the target cell before the switch, for instance to perform custom tests.

We are excited to share this with you and look forward to hearing your feedback! Want to see major version upgrades in action? Check out this quick demo video.

If you have any suggestions or questions, please open an feature request in Canny, reach out to us on Discord or follow us on X / Twitter.

You can follow along this week with a new announcement every day on our launch week page 🦋

Start free,
pay as you grow

Xata provides the best free plan in the industry. It is production ready by default and doesn't pause or cool-down. Take your time to build your business and upgrade when you're ready to scale.

Free plan includes
  • Single team member
  • 10 database branches
  • High availability
  • 15 GB data storage
  • 15 GB search engine storage
  • 2 GB file attachments
  • 250 AI queries per month
Start freeExplore all plans
Free plan includes
  • Single team member
  • 10 database branches
  • High availability
  • 15 GB data storage
  • 15 GB search engine storage
  • 2 GB file attachments
  • 250 AI queries per month

Sign up to our newsletter

By subscribing, you agree with Xata’s Terms of Service and Privacy Policy.

Copyright © 2024 Xatabase Inc.
All rights reserved.

Product

RoadmapFeature requestsPricingStatusAI solutionsFile attachments