pgroll 0.10.0 update

We've just released v0.10 of pgroll, our open-source schema migration tool for Postgres. As we've done for previous releases, we'll take a look at some of the headline features that made it into the release and look ahead to what's coming next. Past entries in the release blog series can be read here:

For the complete notes for this release see the Github v0.10.0 release page.

What is pgroll?

pgroll is a schema migration tool for Postgres. Designed for application developers working on applications that require frequent schema changes but also need to maintain zero downtime around those schema changes, pgroll takes a different approach compared to most other migration tools on the market. There are two aspects that characterize pgroll's approach to migrations:

Multi-version migrations: Making a schema change with pgroll results in two versions of the schema; the one before the change and the one after the change - this allows applications to select which version of the schema they want to work with and allows side-by-side rollout of applications that require the new schema changes with old applications that may be incompatible with it.
Lock-safe migrations: Migrations using pgroll are expressed declaratively, rather than using SQL directly. This allows pgroll to implement the steps required to perform the schema change in a safe manner, ensuring that any locks required on the affected objects are held for the shortest possible time.

Full documentation for pgroll is in the documentation section of this site.

pgroll v0.10 includes a new subcommand, several improvements and one breaking change. Let's take a look at what went into the v0.10 release in more detail:

Convert SQL migration scripts to pgroll migrations

One of the first community requests was to use pgroll with Object Relational Mappers (ORMs). We looked at several ways of integrating pgroll into existing ORM workflows. As most ORMs can generate SQL migrations scripts, we decided to translate SQL statements into pgroll migrations. The generated pgroll migration can be saved to a file and you can run the pgroll migration using the start command as usual.

We added a new subcommand called convert. Now pgroll can translate SQL migration scripts generated by ORMs into pgroll migrations. It can read SQL statements from stdin or from a specified file. It has one flag --name to configure the name of the migration. If the flag is unset, the name is set to the current timestamp.

Examples

You can use pgroll with any ORM that can generate raw SQL statements for its migrations. Let's look at a few examples.

Alembic

Alembic is a database migration tool used in SQLAlchemy projects. You can generate SQL statements from its migrations in its offline mode. If you add a --sql flag to the upgrade command of alembic it prints the SQL statements to stdout. You can pipe this output into pgroll convert:

Django

Django is the go-to tool for Python web development. It can connect to several databases, including PostgreSQL. If your web application uses PostgreSQL storage, you can leverage the convert subcommand.

After you've defined your migration in Python, you can use manage.py to extract the SQL script from Django. The subcommand sqlmigrate prints the SQL statements to stdout.

Create a new model for an Employee with a following simplified model:

Then run sqlmigrate to generate the SQL statements and pipe the output into pgroll convert:

Drizzle

Drizzle is a popular ORM for Typescript projects. You can extract SQL statements using its generate command.

The following example schema in Drizzle can be translated using pgroll:

Run pgroll convert to get the appropriate pgroll migrations.

Limitations

We are releasing this feature so we can get your feedback early. The functionality still has some limitations. The generated pgroll migrations must be edited to provide up and down data migrations manually.

Furthermore, the SQL statements are not aggregated into single pgroll operations. Some ORMs add unique constraints in a different statement when they are creating a table with a unique column. This leads to more pgroll operations than necessary. You can resolve this manually by removing the unique constraint operation from the pgroll migration, and add it to the list of constraints of create_table.

can be simplified to

Also, the same applies to some cases when the ORM does the backfilling for new columns. For example, when you add a new column with a default value, an ORM produces these SQL statements:

Resulting in the following pgroll migration using the convert subcommand:

This can be written as a single pgroll migration:

Also, SQL migration scripts usually start with BEGIN and COMMIT because the ORM runs the DDLs in a single transaction. These statements show up in the list of operations in the generated migrations. These operations are safe to be deleted from the list.

Breaking change to create_index operations

pgroll v0.10 includes one breaking change to how columns are configured when creating new indexes. Previously, the columns list of create_index operations expected a list of strings:

Now you need to specify columns as map keys:

pgroll is now using maps so create_index can support several new options to configure the index elements per column.

Sort order

Using the sort attribute, you can change the sort order from ascending, which is the default, to descending.

Ordering of NULL values

Another option to change sort ordering of the index is nulls. You can specify if the null values come before (first) or after (last) non-null values.

Collation

The attribute collation changes the collation used for the index.

Operator class

Now you can also change the operator class for the indexed column. Also, you can pass configuration paramaters for the operator class.

Adding columns with default values

pgroll v0.10 makes improvements to how columns with DEFAULT values are added to existing tables to avoid long-lived ACCESS_EXCLUSIVE locks when adding columns with volatile DEFAULT values.

In Postgres, adding a new column with a DEFAULT value is an operation that comes with a pitfall: Postgres has different behaviours depending on the volatility of the DEFAULT. Consider this ALTER TABLE statement:

The DEFAULT expression here is non-volatile, meaning that Postgres can apply an optimization when adding the column - rather than rewriting the table to populate every row with the new DEFAULT, Postgres applies the DEFAULT lazily. The entry in the system catalog pg_attribute for the new column is updated to record that there is a default value for the column that should be populated on-the-fly whenever the column is read.

By contrast, consider this ALTER TABLE operation:

This time, the DEFAULT expression is volatile, meaning that Postgres cannot apply the same optimization as before. Instead, Postgres must rewrite the table to populate the new field with its DEFAULT in every row. The table rewrite requires an ACCESS_EXCLUSIVE lock for the duration of the rewrite.

Previously, pgroll did not differentiate between these two cases when adding a new column with a DEFAULT value, meaning that adding a new column with a volatile DEFAULT would take an ACCESS_EXCLUSIVE lock, locking out reads and writes from the affected table for the duration of the table rewrite.

Starting with pgroll v0.10, pgroll now detects whether the DEFAULT value for a new column is volatile or non-volatile, and applies the DEFAULT differently in each case:

For non-volatile DEFAULT values, pgroll applies the DEFAULT to the new column directly as before. The operation is a catalog-only update and requires only a very brief ACCESS_EXCLUSIVE lock.
For a volatile DEFAULT, pgroll creates the column without a DEFAULT, avoiding the table rewrite. The new column is then backfilled with the DEFAULT value using pgroll's backfill mechanism. The DEFAULT is added to the column on migration completion.

A migration to add a new column with a volatile DEFAULT is written:

The up expression used as the backfill must be the same as the default expression for the new column. pgroll will backfill the new column with the new DEFAULT value, and then apply the DEFAULT to the column on completion of the migration.

This means that adding columns with volatile DEFAULT values no longer requires a potentially long-lived ACCESS_EXCLUSIVE lock, making such operations safe to perform on tables with large row counts.

Conclusion

pgroll is open-source; check out the repository to get involved.

Many of the features that have gone into this and previous releases have been directly influenced by how users tell us they are using the tool and where it's currently lacking. We welcome any feedback in issues, or contributions via pull requests!