Small Datum: Checking Postgres for perf regressions from 11.21 to 16.0 with sysbench and a small server

This has results for Sysbench, a small server and Postgres 11, 12, 13, 14, 15 and 16. The goal is to see if there are perf regressions over time. This is a common problem as code gets added over time the cumulative impact is using more CPU to do the same amount of work. MySQL is an example where there are significant regressions over time.

Context - the workload here is a cached database with 1 client. I am searching for CPU regressions. I will repeat this on a medium server with more concurrency (8 clients).

tl;dr

Postgres avoids perf regressions over time. This is starting to get boring.
In many cases Postgres 16.0 is ~5% faster than 11.21
For full scans Postgres 16.0 is ~20% faster than 11.21
Postgres 16.0 is also ~20% faster than 11.21 on several of the update-only benchmark steps
See the results for range query, part 2 below. Postgres 16.0 does better than 11.21 for aggregation + short range scan, but then worse than 11.21 for aggregation + long range scan. This benchmark step is a read-only variant of the traditional sysbench workload. This might be worth investigating. I am sure whether it is a regression, but it might be one.

Builds

I compiled Postgres 11.21, 12.16, 13.12, 14.9, 15.4 and 16.0 from source. The builds are named o3_native_lto which is shorthand for using: -O3 -march=native -mtune=native -flto.

Benchmark

I used sysbench and my usage is explained here. Postgres was configured to cache all tables.

This benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04.

The benchmark is run with 1 client and 1 table with 20M rows. The read-only tests ran for 600 seconds each and the other tests ran for 1200 seconds each. The command line for my wrapper scripts is:

bash r.sh 1 20000000 600 1200 nvme0n1 1 1 1

The benchmark used the a2 config and I commented out the usage of autovacuum_vacuum_insert_scale_factor for Postgres 11.21 and 12.16 as they do not support that option.

Results

A spreadsheet with all results is here. It has two sheets: one with absolute QPS for each version tested, the other with the relative QPS per version. The relative QPS is: (QPS for me) / (QPS for 11.21). With relative QPS it is easy to quantify improvements and regressions.

There are ~42 tests and each can be called a benchmark step or microbenchmark. I will call them benchmark steps. These are put into one of 5 groups based on the workload:

point query, part 1 - point queries that don't use the random-points benchmark step
point query, part 2 - point query variants that use the random-points benchmark step
range query, part 1 - range queries without aggregation
range query, part 2 - read-only variants of the original sysbench workload that does range queries with aggregation
writes - benchmark steps that do insert, update and delete

The y-axis starts at 0.9 rather than 0 to make it easier to see the changes.

The results below for range queries, part 2 have an interesting pattern. There are two sets of results there - the first 3 groups of bars from the left are run prior to subjecting Postgres to random writes. The other 3 groups bars are the same tests but run after subjecting Postgres to random writes. All of the tests do range scans with some aggregation -- these are read-only variants of the traditional sysbench workload. The difference is that the length of the range scans vary. So below, some