pgroll 0.10.0 update

pgroll v0.10 includes one breaking change, a new subcommand to convert SQL statements into pgroll operations and improvements to adding default values to existing tables.

Written by

Andrew Farries, Noémi Ványi

Published on

February 28, 2025

We've just released v0.10 of pgroll, our open-source schema migration tool for Postgres. As we've done for previous releases, we'll take a look at some of the headline features that made it into the release and look ahead to what's coming next. Past entries in the release blog series can be read here:

For the complete notes for this release see the Github v0.10.0 release page.

#

What is pgroll?

pgroll is a schema migration tool for Postgres. Designed for application developers working on applications that require frequent schema changes but also need to maintain zero downtime around those schema changes, pgroll takes a different approach compared to most other migration tools on the market. There are two aspects that characterize pgroll's approach to migrations:

  • Multi-version migrations: Making a schema change with pgroll results in two versions of the schema; the one before the change and the one after the change - this allows applications to select which version of the schema they want to work with and allows side-by-side rollout of applications that require the new schema changes with old applications that may be incompatible with it.
  • Lock-safe migrations: Migrations using pgroll are expressed declaratively, rather than using SQL directly. This allows pgroll to implement the steps required to perform the schema change in a safe manner, ensuring that any locks required on the affected objects are held for the shortest possible time.

Full documentation for pgroll is in the documentation section of this site.

pgroll v0.10 includes a new subcommand, several improvements and one breaking change. Let's take a look at what went into the v0.10 release in more detail:

One of the first community requests was to use pgroll with Object Relational Mappers (ORMs). We looked at several ways of integrating pgroll into existing ORM workflows. As most ORMs can generate SQL migrations scripts, we decided to translate SQL statements into pgroll migrations. The generated pgroll migration can be saved to a file and you can run the pgroll migration using the start command as usual.

We added a new subcommand called convert. Now pgroll can translate SQL migration scripts generated by ORMs into pgroll migrations. It can read SQL statements from stdin or from a specified file. It has one flag --name to configure the name of the migration. If the flag is unset, the name is set to the current timestamp.

$ pgroll convert --help
Convert SQL statements to a pgroll migration. The command can read SQL statements from stdin or a file

Usage:
  pgroll convert <path to file with migrations> [flags]
Flags:
  -h, --help          help for convert
  -n, --name string   Name of the migration (default "{current_timestamp}")

You can use pgroll with any ORM that can generate raw SQL statements for its migrations. Let's look at a few examples.

Alembic is a database migration tool used in SQLAlchemy projects. You can generate SQL statements from its migrations in its offline mode. If you add a --sql flag to the upgrade command of alembic it prints the SQL statements to stdout. You can pipe this output into pgroll convert:

$ alembic update {revision} --sql | pgroll convert --name {revision}
{
  "name": "{revision}",
  "operations": [
    {
      "create_table": {
        "name": "employees",
        "colunms:" [
          {
            "name": "name",
            "type": "varchar(100)"
          },
          {
            "name": "joined",
            "type": "timestamp with time zone"
          },
          {
            "name": "email",
            "type": "varchar(254)"
          }
        ]
      }
    }
  ]
}

Django is the go-to tool for Python web development. It can connect to several databases, including PostgreSQL. If your web application uses PostgreSQL storage, you can leverage the convert subcommand.

After you've defined your migration in Python, you can use manage.py to extract the SQL script from Django. The subcommand sqlmigrate prints the SQL statements to stdout.

Create a new model for an Employee with a following simplified model:

class Employee():
    name = models.CharField(max_length=100)
    joined = models.DateTimeField()
    email = models.EmailField()

Then run sqlmigrate to generate the SQL statements and pipe the output into pgroll convert:

manage.py sqlmigrate my_app 0000 | pgroll convert -name 0000_init
{
  "name": "0000_init",
  "operations": [
    {
      "create_table": {
        "name": "employees",
        "colunms:" [
          {
            "name": "name",
            "type": "varchar(100)"
          },
          {
            "name": "joined",
            "type": "timestamp with time zone"
          },
          {
            "name": "email",
            "type": "varchar(254)"
          }
        ]
      }
    }
  ]
}

Drizzle is a popular ORM for Typescript projects. You can extract SQL statements using its generate command.

The following example schema in Drizzle can be translated using pgroll:

import { pgTable, timestamp, varchar } from 'drizzle-orm/pg-core';

export const employees = pgTable('employees', {
  name: varchar({ length: 100 }),
  joined: timestamp({ withTimezone: true }),
  email: varchar({ length: 254 })
});

Run pgroll convert to get the appropriate pgroll migrations.

drizzle-kit generate --dialect postgresql --schema=./src/schema.ts --name=init
pgroll convert 0000_init.sql --name 0000_init
{
  "name": "0000_init",
  "operations": [
      "create_table": {
        "name": "employees",
        "colunms:" [
          {
            "name": "name",
            "type": "varchar(100)"
          },
          {
            "name": "joined",
            "type": "timestamp with time zone"
          },
          {
            "name": "email",
            "type": "varchar(254)"
          }
        ]
      }
    ]
  }
}

We are releasing this feature so we can get your feedback early. The functionality still has some limitations. The generated pgroll migrations must be edited to provide up and down data migrations manually.

Furthermore, the SQL statements are not aggregated into single pgroll operations. Some ORMs add unique constraints in a different statement when they are creating a table with a unique column. This leads to more pgroll operations than necessary. You can resolve this manually by removing the unique constraint operation from the pgroll migration, and add it to the list of constraints of create_table.

{
  "create_table": {
    "name": "employees",
    "columns": [
      {
        "name": "email",
        "varchar(254)"
      }
    ]
  }
},
{
  "create_constraint": {
    "name": "my_unique_email",
    "type": "unique",
    "columns": ["email"]
    "up": {
      "email": "TODO"
    }
  }
}

can be simplified to

{
  "create_table": {
    "name": "employees",
    "columns": [
      {
        "name": "email",
        "varchar(254)"
      }
    ],
    "constraints": [
      {
        "name": "my_unique_email",
        "type": "unique",
        "columns": ["email"]
      }
    ]
  }
}

Also, the same applies to some cases when the ORM does the backfilling for new columns. For example, when you add a new column with a default value, an ORM produces these SQL statements:

ALTER TABLE "adults" ADD COLUMN "age" smallint DEFAULT 18 NULL CHECK ("age" >= 18);
ALTER TABLE "adults" ALTER COLUMN "level" DROP DEFAULT;

UPDATE adults SET age = 18;

ALTER TABLE adults ALTER COLUMN age SET NOT NULL;

UPDATE "adults" SET "age" = 18 WHERE "age" IS NULL;
SET CONSTRAINTS ALL IMMEDIATE;

ALTER TABLE "adults" ALTER COLUMN "age" SET NOT NULL;

Resulting in the following pgroll migration using the convert subcommand:

{
  "add_column": {
    "column": {
      "check": {
        "constraint": "age >= 18",
        "name": "age_check"
      },
      "default": "18",
      "name": "age",
      "nullable": true,
      "type": "smallint"
    },
    "table": "adults",
    "up": "TODO: Implement SQL data migration"
  }
},
{
  "alter_column": {
    "column": "age",
    "default": null,
    "table": "adults",
    "down": "TODO: Implement SQL data migration",
    "up": "TODO: Implement SQL data migration"
  }
},
{
  "sql": {
    "up": "update adults set age = 18"
  }
},
{
  "alter_column": {
    "column": "age",
    "nullable": false,
    "table": "adults",
    "up": "TODO: Implement SQL data migration"
    "down": "TODO: Implement SQL data migration",
  }
},
{
  "sql": {
    "up": "UPDATE \"adults\" SET \"age\" = 18 WHERE \"age\" IS NULL"
  }
},
{
  "sql": {
    "up": "SET CONSTRAINTS ALL IMMEDIATE"
  }
},
{
  "alter_column": {
    "column": "age",
    "nullable": false,
    "table": "adults",
    "up": "TODO: Implement SQL data migration"
    "down": "TODO: Implement SQL data migration",
  }
}

This can be written as a single pgroll migration:

{
  "add_column": {
    "column": {
      "check": {
        "constraint": "age \u003e= 0",
        "name": "age_check"
      },
      "default": "18",
      "name": "age",
      "nullable": true,
      "type": "smallint"
    },
    "table": "adults",
    "up": "18"
  }
},

Also, SQL migration scripts usually start with BEGIN and COMMIT because the ORM runs the DDLs in a single transaction. These statements show up in the list of operations in the generated migrations. These operations are safe to be deleted from the list.

pgroll v0.10 includes one breaking change to how columns are configured when creating new indexes. Previously, the columns list of create_index operations expected a list of strings:

{
  "name": "10_create_index",
  "operations": [
    {
      "create_index": {
        "table": "employees",
        "columns": ["salary"]
      }
    }
  ]
}

Now you need to specify columns as map keys:

{
  "name": "10_create_index",
  "operations": [
    {
      "create_index": {
        "table": "employees",
        "columns": {
          "salary": {}
        }
      }
    }
  ]
}

pgroll is now using maps so create_index can support several new options to configure the index elements per column.

Using the sort attribute, you can change the sort order from ascending, which is the default, to descending.

{
  "name": "10_create_index",
  "operations": [
    {
      "create_index": {
        "table": "employees",
        "columns": [
           "salary": {
             "sort": "desc"
           }
        ]
      }
    }
  ]
}

Another option to change sort ordering of the index is nulls. You can specify if the null values come before (first) or after (last) non-null values.

{
  "name": "10_create_index",
  "operations": [
    {
      "create_index": {
        "table": "employees",
        "columns": [
           "position": {
             "nulls": "last"
           }
        ]
      }
    }
  ]
}

The attribute collation changes the collation used for the index.

{
  "name": "10_create_index",
  "operations": [
    {
      "create_index": {
        "table": "employees",
        "columns": [
           "bio": {
             "collate": "fr_FR"
           }
        ]
      }
    }
  ]
}

Now you can also change the operator class for the indexed column. Also, you can pass configuration paramaters for the operator class.

{
  "name": "10_create_index",
  "operations": [
    {
      "create_index": {
        "table": "employees",
        "columns": [
           "bio": {
             "nulls": "first"
             "opclass": {
               "name": "opc_name",
               "params": {
                 "param1": "val1"
               }
             }
           }
        ]
      }
    }
  ]
}

pgroll v0.10 makes improvements to how columns with DEFAULT values are added to existing tables to avoid long-lived ACCESS_EXCLUSIVE locks when adding columns with volatile DEFAULT values.

In Postgres, adding a new column with a DEFAULT value is an operation that comes with a pitfall: Postgres has different behaviours depending on the volatility of the DEFAULT. Consider this ALTER TABLE statement:

ALTER TABLE items ADD COLUMN value INTEGER DEFAULT 100;

The DEFAULT expression here is non-volatile, meaning that Postgres can apply an optimization when adding the column - rather than rewriting the table to populate every row with the new DEFAULT, Postgres applies the DEFAULT lazily. The entry in the system catalog pg_attribute for the new column is updated to record that there is a default value for the column that should be populated on-the-fly whenever the column is read.

By contrast, consider this ALTER TABLE operation:

ALTER TABLE items ADD COLUMN value INTEGER DEFAULT (random()*100)::integer;

This time, the DEFAULT expression is volatile, meaning that Postgres cannot apply the same optimization as before. Instead, Postgres must rewrite the table to populate the new field with its DEFAULT in every row. The table rewrite requires an ACCESS_EXCLUSIVE lock for the duration of the rewrite.

Previously, pgroll did not differentiate between these two cases when adding a new column with a DEFAULT value, meaning that adding a new column with a volatile DEFAULT would take an ACCESS_EXCLUSIVE lock, locking out reads and writes from the affected table for the duration of the table rewrite.

Starting with pgroll v0.10, pgroll now detects whether the DEFAULT value for a new column is volatile or non-volatile, and applies the DEFAULT differently in each case:

  • For non-volatile DEFAULT values, pgroll applies the DEFAULT to the new column directly as before. The operation is a catalog-only update and requires only a very brief ACCESS_EXCLUSIVE lock.
  • For a volatile DEFAULT, pgroll creates the column without a DEFAULT, avoiding the table rewrite. The new column is then backfilled with the DEFAULT value using pgroll's backfill mechanism. The DEFAULT is added to the column on migration completion.

A migration to add a new column with a volatile DEFAULT is written:

{
  "name": "29_add_column_with_default",
  "operations": [
    {
      "add_column": {
        "table": "items",
        "up": "(random()*100)::integer",
        "column": {
          "name": "value",
          "type": "integer",
          "default": "(random()*100)::integer"
        }
      }
    }
  ]
}

The up expression used as the backfill must be the same as the default expression for the new column. pgroll will backfill the new column with the new DEFAULT value, and then apply the DEFAULT to the column on completion of the migration.

This means that adding columns with volatile DEFAULT values no longer requires a potentially long-lived ACCESS_EXCLUSIVE lock, making such operations safe to perform on tables with large row counts.

pgroll is open-source; check out the repository to get involved.

Many of the features that have gone into this and previous releases have been directly influenced by how users tell us they are using the tool and where it's currently lacking. We welcome any feedback in issues, or contributions via pull requests!

Start free,
pay as you grow

Xata provides the best free plan in the industry. It is production ready by default and doesn't pause or cool-down. Take your time to build your business and upgrade when you're ready to scale.

Free plan includes
  • Single team member
  • 10 database branches
  • High availability
  • 15 GB data storage
Start freeExplore all plans
Free plan includes
  • Single team member
  • 10 database branches
  • High availability
  • 15 GB data storage

Sign up to our newsletter

By subscribing, you agree with Xata’s Terms of Service and Privacy Policy.

Copyright © 2025 Xatabase Inc.
All rights reserved.

Product

Feature requestsPricingStatusAI solutions