PostgreSQL major version upgrade by using Logical Replication

Posted on 2019-10-30 by CYBERTEC Guest

By Kaarel Moppel – Some weeks ago, in the light of PostgreSQL v12 release, I wrote a general overview on various major version upgrade methods and benefits of upgrading in general – so if upgrading is a new thing for you I’d recommend to read that posting first. But this time I’m concentrating on the newest (available since v10) and the most complex upgrade method – called “Logical Replication” or LR shortly. For demonstration purposes I’ll be migrating from v10 to freshly released v12 as this is probably the most likely scenario. But it should work the same also with v11 to v12. But do read on for details.

Benefits of LR upgrades

First a bit of recap from the previous post on why would you use LR for upgrading at all. Well, in short – because it’s the safest option with shortest possible downtime! With that last point I’m already sold…but here again the list of “pros” / “cons”:

PROS

Minimal downtime required

After the initial setup burden one just needs to wait (and verify) that the new instance hast all the data from the old one…and then just shut down the old instance and point applications to the new instance. Couldn’t be easier!

Also before the switchover one can make sure that statistics are up to date, to minimize the typical “degraded performance” period seen after “pg_upgrade” for more complex queries (on bigger databases). For high load application one could even be more careful here and pull the most popular relations into shared buffers by using the (relatively unknown) “pg_prewarm” Contrib extension or by just running common SELECT-s in a loop, to counter the “cold cache” effect.

Flexible

One can for example already make some changes on the target DB – add columns / indexes, change datatypes, leave out some old archive tables, etc. The general idea is that LR does not work on the binary, 1-to-1 level as”pg_upgrade” does, but rather JSON-like data objects are sent over to another master / primary instance, providing quite some freedom on the details.

Safe

Before the final switchover you can anytime abort the process and re-try if something seems fishy. The old instances data is not changed in any way even after the final switchover! Meaning you can easily roll back (with cost of some data loss typically though) to the old version if some unforeseen issues arise. One should only watch out for the replication slot on the source / publisher DB if the target server just taken down suddenly.

CONS

Quite a few steps to take and possibly one needs to modify the schema a bit.
Always per DB.
Could take a long time for big databases.
Large objects, if in use (should be a thing of the past really), need to be exported / imported manually.

Preparing for LR

As LR has some prerequisites on the configuration and schema, you’d first need to see if it’s possible to start with the migration process at all or some changes are needed on the old master node, also called the “publisher” in LR context.

Action points:

1) Enable LR on the old master aka subscriber aka source DB if not done already. This means setting “wal_level” to “logical” in postgresql.conf and making sure that “replication” connections are allowed in “pg_hba.conf” from the new host (also called the “subscriber” in LR context). FYI – changing “wal_level” needs server restart! To enable any kind of streaming replication some other params are needed but they are actually already set accordingly out of the box as of v10 so it shouldn’t be a problem.

2) Check that all tables have a Primary Key (which is good database design anyways) or alternatively have REPLICA IDENTITY set. Primary Keys don’t need much explaining probably but what is this REPLICA IDENTITY thing? A bit simplified – basically it allows to say which columns formulate uniqueness within a table and PK-s are automatically counted as such.

3) If there’s no PK for a particular table, you should create one, if possible. If you can’t do that, set unique constraints / indexes to serve as REPLICA IDENTITY, if at all possible. If even that isn’t possible, you can set the whole row as REPLICA IDENTITY, a.k.a. REPLICA IDENTITY FULL, meaning all columns serve as PK’s in an LR context – with the price of very slow updates / deletes on the subscriber (new DB) side, meaning the whole process could take days or not even catch up, ever! It’s OK not to define a PK for a table, as long as it’s a write-only logging table that only gets inserts.

Sample code:


psql -c “ALTER SYSTEM SET wal_level TO logical;”
sudo systemctl postgresql@10-main restart

# find problematic tables (assuming we want to migrate everything "as is")
SELECT
    quote_ident(nspname) || '.' || quote_ident(relname) AS tbl
FROM
    pg_class c
    JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE
    relkind = 'r'
    AND NOT nspname LIKE ANY (ARRAY[E'pg\\_%', 'information_schema'])
    AND NOT relhaspkey
    AND NOT EXISTS (SELECT * FROM pg_index WHERE indrelid = c.oid
            AND indisunique AND indisvalid AND indisready AND indislive)
ORDER BY
    1;

# set replica identities on tables highlighted by the previous query
ALTER TABLE some_bigger_table REPLICA IDENTITY USING INDEX unique_idx ;
ALTER TABLE some_table_with_no_updates_deletes REPLICA IDENTITY FULL ;

Fresh setup of the new “subscriber” DB

Second most important step is to set up a new totally independent instance with a newer Postgres version – or at least create a new database on an existing instance with the latest major version. And as a side note – same version LR migrations are also possible, but you’d be solving some other problem in that case.

This step is actually very simple – just a standard install of PostgreSQL, no special steps needed! With the important addition that to make sure everything works exactly the same way as before for applications – same encoding and collation should be used!

-- on old
SELECT pg_catalog.pg_encoding_to_char(d.encoding) AS "Encoding", d.datcollate as "Collate" FROM pg_database d WHERE datname = current_database();
-- on new
CREATE DATABASE appdb TEMPLATE template0 ENCODING UTF8 LC_COLLATE "en_US.UTF-8";

NB! Before the final switchover it’s important that no normal users have access to the new DB – as they might alter table data or structures and thereby inadvertently produce replication conflicts that mostly mean starting from scratch (or a costly investigation / fix) as “replay” is a sequential process.

Schema / roles synchronization

Next we need to synchronize the old schema onto the new DB as Postgres does not take care of that automatically as of yet. The simplest way is to use the official PostgreSQL backup tool called “pg_dump”, but if you have your schema initialization scripts in Git or such and they’re up to date then this is fine also. For syncing roles “pg_dumpall” can be used.

NB! After this point it’s not recommended to introduce any changes to the schema or be at least very careful when doing it, e.g. creating new tables / columns first on the subscriber and refreshing the subscriptions when introducing new tables – otherwise data synchronization will break! Tip – a good way to disable unwanted schema changes is to use DDL triggers! An approximate example on that is here. Adding new tables only on the new DB is no issue though but during an upgrade not a good idea anyways – my recommendation is to first upgrade and then to evolve the schema.

pg_dumpall -h $old_instance --globals-only | psql -h $new_instance
pg_dump -h $old_instance --schema-only appdb | psql -h $new_instance appdb

Create a “publication” on the old DB

If preparations on the old DB has been finished (all tables having PK-s or replication identities) then this is a oneliner:

CREATE PUBLICATION upgrade FOR ALL TABLES;

Here we added all (current and those added in future) tables to a publication (a replication set) named “upgrade” but technically we could also leave out some or choose to only replicate some operations like UPDATE-s, but for a pure version upgrade you want typically all.

NB! As of this moment the replication identities become important – and you might run into trouble on the old master if the identities are not in place on all tables that get changes! In such case you might see errors like that:

UPDATE pgbench_history SET delta = delta WHERE aid = 1;
ERROR:  cannot update table "pgbench_history" because it does not have a replica identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.

Create a “subscription” on the target DB

Next step – create a “subscription” on the new DB. This is also a oneliner, that creates a logical replication slot on the old instance, pulls initial table snapshots and then starts to stream and apply all table changes as they happen on the source, resulting eventually in a mirrored dataset! Note that currently superuser rights are needed for creating the subscription and actually hit also makes life a lot easier on the publisher side.

CREATE SUBSCRIPTION upgrade_sub CONNECTION 'port=5432 user=postgres' PUBLICATION upgrade;
NOTICE:  created replication slot "upgrade_sub" on publisher
CREATE SUBSCRIPTION

WARNING! As of this step the 2 DB-s are “coupled” via a replication slot, carrying some dangers if the process is aborted abruptly and the old DB is not “notified” of that. If this sounds new please see the details from documentation.

Check replication progress

Depending on the amount of data it will take X minutes / days until everything is moved over and “live” synchronization is working.

Things to inspect for making sure there are no issues:

No errors in server logs on both sides
There’s an active “pg_replication_slots” entry on the master with the name that we used to create the “subscription” on the new DB
All tables are actively replicating on the subscriber side, i.e. “pg_subscription_rel.srsubstate” should be ‘r’ for all tables (ready – normal replication)

Basic data verification / switchover preparation

Although not a mandatory step, when it comes to data consistency / correctness, it always makes sense to go the extra mile and run some queries that validate that things (source – target) have the same data. For a running DB it’s of course a bit difficult as there’s always some replication lag but for “office hours” applications it should make a lot of sense. My sample script for comparing rowcounts (in a non-threaded way) is for example here but using some slightly more “costly” aggregation / hashing functions that really look at all the data would be even better there.

Also important to note if you’re using sequences (which you most probably are) – sequence state is not synchronized by LR and needs some manual work / scripting! The easiest option I think is that you leave the old DB ticking in read-only mode during switchover so that you can quickly access the last sequence values without touching the indexes for maximum ID-s on the subscriber side.

Switchover time!

We’re almost there with our little undertaking…with the sweaty part remaining – the actual switchover to start using the new DB! Needed steps are simple though and somewhat similar to switching over to a standard, “streaming replication” replica.

1) Re-check the system catalog views on replication status.
2) Stop the old instance. Make sure it’s a nice shutdown. The last logline should state “database system is shut down”, meaning all recent changes were delivered to connected replication clients, including our new DB. Start of downtime! PS Another alternative to make sure absolutely all data is received is to actually configure the new instance in “synchronous replication” mode! This has the usual synchronous replication implications of course so I’d avoid it for bigger / busier DBs.
3) Start the old DB in read-only mode by creating a recovery.conf file (from v12 this is achieved by declaring a “standby.signal” file)
4) Optionally make some more quick “health checks” if time constraints allow it – verify table sizes, row counts, your last transactions, etc. For “live” comparisons it makes sense to restart the old DB under a new, random port so that no-one else connects to it.
5) Synchronize the sequences. Given we’ll leave the old DB in read-only mode the easiest way is something like that:

psql -h $old_instance -XAtqc "SELECT $$select setval('$$ || quote_ident(schemaname)||$$.$$|| quote_ident(sequencename) || $$', $$ || last_value || $$); $$ AS sql FROM pg_sequences" appdb \
| psql -h $new_instance appdb

6) Reconfigure your pg_hba.conf to allow access for all “mortal” users, then reconfigure your application, connection pooler, DNS or proxy to start using the new DB! If the two DB-s were on the same machine then it’s even easier – just change the ports and restart. End of downtime!
7) Basically we’re done here, but would be nice of course to clean up and remove the (no-more needed) subscription not to accumulate errors in server log.

DROP SUBSCRIPTION upgrade_sub;

Note that if you won’t keep the old “publisher” accessible in read-only or normal primary mode (dangerous!) though, some extra steps are needed here before dropping:

ALTER SUBSCRIPTION  upgrade_sub DISABLE ;
ALTER SUBSCRIPTION  upgrade_sub SET (slot_name = NONE);
DROP SUBSCRIPTION upgrade_sub;

8) Time for some bubbly drinks 🙂

Summary

Although there are quite some steps and nuances involved, LR is worth adding to the standard upgrade toolbox for time-critical applications as it’s basically the best way to do major version upgrades nowadays – minimal dangers, minimal downtime!

FYI – if you’re planning to migrate dozens of DB-s the LR upgrade process can be fully automated! Even starting from version 9.4 actually, with the help of the “pglogical” extension. So feel free to contact us if you might need something like that and don’t particularly enjoy the details. Thanks for reading!

Upgrading Postgres major versions using Logical Replication