Upgrade to a new major version of PostgreSQL, by Kaarel Moppel - Soon it's that time of the year again - basically a 2nd Christmas for followers of the “blue elephant cult” if you will :). I'm, of course, referring to the upcoming release of the next PostgreSQL major version, v12. So I thought it's about time to go over some basics on upgrading to newer major versions! Database upgrades, not only Postgres, are quite a rare event for most people (at least for those running only a couple of DB-s). Since upgrades are so rare, it’s quite easy to forget how easy upgrading actually is. Yes, it is easy – so easy that I can barely wrap my head around why a lot of people still run some very old versions. Hopefully, this piece will help to convince you to upgrade faster in the future :). For the TLDR; in table version please scroll to the bottom of the article.
Table of Contents
Some main points that come to my mind:
And to make it even more easy for you - there’s a really cool website (much kudos to Depesz!) where you can directly get a highlighted listing of all the changes between two target Postgres versions! Link here!
Minor version changes a.k.a. bugfix releases happen regularly and can be used as a warmp-up here. There is a minimum of 1 minor release per quarter, if no security-critical stuff is found. It is, of course, highly recommended to apply them as soon as they are released.
The good thing about minor releases is that they’re as fast as restarting your server! I recommend to “prefix” the restart with a hand-initiated “checkpoint” so that effective downtime for applications is minimized since “pg_ctl stop / restart” already does checkpointing in exclusive mode, after active sessions have been kicked from the server.
Caution! When using some distribution-specific Postgres packages some extra attention is needed. For example running “apt upgrade postgresql-X” might mean an imminent restart to the running instance, after new binaries have been pulled in! For RedHat / CentOS it’s usually a bit better though and there you need to restart explicitly.
Also when running a clustered streaming replication setup where downtime on the primary node is scarce and restarts are to be avoided, my recommendation is to immediately update all replicas to the latest minor version (minor versions don’t need to match, only major). That way, when some accidental server issue appears (e.g.reboot, hardware maintenance etc) you’ll already be promoted to the latest version with a sort of “free downtime”.
Let's now move on to the real thing - major upgrade options.
This is the original way of doing major version upgrades and it's the only option up until version 9.0 which introduced the binary / in-place option. Logical dumps use “pg_dumpall” or “pg_dump” / “pg_restore” tools, that basically “reverse engineer” the database state to normal SQL statements. When those statements are executed sequentially on a target DB, they re-create exactly the same state for all user-level objects as it was on the source cluster / DB. NB! “Pg_dumpall” works on the instance level and “pg_dump” on the database level.
One more recommendation - when dumping out the DB contents it’s usually best to use the tools from the latest Postgres version. Note that this might not be officially supported, if you fall out of the “5 latest releases supported” window. By the way, on the topic of dump / restore and Postgres versions, I’d recommend to read this nice FAQ by our colleague Pavlo. It’s a few years old, but still correct.
Binary in-place upgrades are quite different from the logical ones as they happen on the filesystem level (i.e. always on the same machine if no shared storage is in play) and are a multi-step process with some additional complexity. However, things are still quite straightforward - after installing the new binaries and following a bit of preparation, it basically boils down to running a single command that typically finishes in 10 to 20 seconds in “--link” mode!! That's also why it's the most common upgrade option nowadays: nobody wants extended downtimes.
High-level steps to take (see documentation for details):
1. Install new major version binaries.
2. Initialize the new instance with new binaries (with the same major settings as on the old instance).
3. Do a test run with “pg_upgrade”. FYI - I tend to test first on a completely spare replica or a P.I.T.R. restored instance.
4. Stop the old instance.
5. Run “pg_upgrade” with or without file linking. Using hard-linking of files from the old data directory into the new one is the fastest way to upgrade! Basically only system catalogs are dumped and reloaded. Without linking all data files are copied over and it boils down to filesystem speed vs instance size, with the benefit that the old cluster is not modified.
6. Start the instance (it is usually also necessary to change the port back to 5432).
7. Start the “analyze” script generated and hinted at by the “pg_upgrade” tool. Performance for complex queries might suffer until it finishes.
NB! Also note that some distributions provide wrapper scripts (like “pg_upgradecluster” on Debian-based systems) to assist with in-place upgrades. They might be worth a look.
Logical Replication (LR) is the latest addition to the family of Postgres major version upgrade options and is available from v10 and upwards. By the way, unofficially it’s also already possible from 9.4 and upwards so feel free to contact us on that if you can’t afford extended downtime for your older instances. But OK, how does this new thing work? I won’t go into technical details here, but check out this blogpost or the sample code here. In general, it works on the DB level: you’re going to be feeding all old data and data changes from the old primary into a newly bootstrapped and schema-synchronized fully independent master / primary server! Basically LR is about decoupling from the binary “on-disk” format and sending over data objects that could be thought of as kind of JSON. Again, to highlight the major benefit here: the servers are for the most part fully decoupled and the syncing process happens in near realtime, so one can take some time to test and validate the results before flipping the final switch!
It looks awesome and relatively simple, doesn’t it? And it is mostly! To spoil the party a bit - there's also a lot of "small print" to read, and an out-of-the box LR process might not always be possible - it depends a bit on the data model being used. Some tweaking / changing might be needed.
Upgrade method | Pro | Contra |
---|---|---|
Dump / restore |
|
|
Binary in-place |
|
|
Logical Replication |
|
|
I guess besides the obvious pre-upgrade DBA activities (testing, backups, thoroughly analyzing the release notes) for all upgrade methods, it’s also good to discuss some other things like application compatibility and UI or monitoring tools compatibility with end users / application teams. But truth be told - there is actually a very low risk factor involved in upgrading, main SQL features are backwards compatible.
I can't of course 100% recommend doing "lone ranger" types of covert database migrations (although I’ve done my fair share of these) where the DB disappears for a minute to then reappear in a shiny new “coat” without anyone raising an eyebrow. However, if your schema consists only of a handful of plain tables with the most common data types (no stored procedures or other PostgreSQL specific features like LISTEN / NOTIFY) and standard ANSI SQL is performed then most likely everything will be just fine. This is due to the fact that the Postgres wire protocol has not been changed since v7-point-something. Changing it for basic SQL operations would be a massive problem for all users globally. That's why changing it has been avoided.
If you're not running an absolutely time-critical 24/7 operation (something like a space-shuttle) in my opinion there are no excuses not to upgrade at least once every couple of years. Keep in mind – if you ignore the topic for too long you’ll stop receiving security patches at some point. Then, when it is most needed, the upgrade process becomes slightly more tricky, since the PostgreSQL Global Development Group only supports releases from the previous 5 years in it's tooling. If you've waited too long, you might need to run "pg_upgrade" twice in a row and there's some additional risk of stumbling into some corner case backwards compatibility topics, as there's no testing for such scenarios. In short, better upgrade sooner than later!
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Are there clearly documented steps to be followed for a minor upgrade... say from 10.10 to 10.12 on a centos 7 vm? I'm an oracle/mysql guy new to postgresql.
You might want to mention these other CONs of pglogical/logical replication. No support for:
1. large objects
2. TRUNCATE
3. sequence changes
I think TRUNCATE got fixed with PG11 and LO-s are mostly a non-topic...but true, there are quite some details with LR - that's why I labelled it as the most complex method. Will add a note but planning a 2nd more detailed post on LR also.