Small Datum: Postgres versions 11, 12, 13, 14, 15, and 16 vs sysbench with a medium server

This provides additional results for Postgres versions 11 through 16 vs Sysbench on a medium server. My previous post is here. The goal is to document how performance changes over time with a focus on CPU overheads.

tl;dr

Postgres avoids performance regressions from old to new versions. This is excellent.
Postgres 16.0 is mostly faster than 11.21 on read-only benchmark steps
Postgres 16.0 and 11.21 have similar throughput on read-write and write-only benchmark steps

Builds

I compiled Postgres 11.21, 12.16, 13.12, 14.9, 15.2, 15.3, 15.4 and 16.0. The builds are named o3_native_lto which is shorthand for using: -O3 -march=native -mtune=native -flto.

Benchmark

I used sysbench and my usage is explained here. Postgres was configured to cache all tables.

This benchmark used a c2-standard-30 server from GCP with 15 cores, hyperthreads disabled, 120G of RAM, Ubuntu 22.04 and 1.5TB of NVMe SSD with XFS (SW RAID 0 over 4 local devices).

The benchmark is run with 8 clients and 4 table with 20M rows per table. The read-only tests ran for 600 seconds each and the other tests ran for 1200 seconds each. The command line for my wrapper scripts is:

bash r.sh 4 20000000 600 1200 md127 1 1 8

The benchmark used the x7a28 config and for Postgres versions 11.21 and 12.16 I had to remove the usage of autovacuum_vacuum_insert_scale_factor as that arrives in version 14.

Results

There were two rounds of the benchmark. The first has results for 11.21, 12.16, 13.12, 14.9, 15.4 and then two sets of results for 16.0. The second has results for 11.21, 12.16, 13.12, 14.9, 15.2, 15.3 and 15.4.

A spreadsheet with results is here. The spreadsheet has 4 sheets, 2 for each round of the benchmark. And for each round there is one sheet with the absolute values (QPS) per benchmark step while the other has relative QPS -- (QPS for my version) / (QPS for Postgres 11.21).

There are ~42 tests and each can be called a benchmark step or microbenchmark. I will call them benchmark steps. These are put into one of 5 groups based on the workload:

point query, part 1 - point queries that don't use the random-points benchmark step
point query, part 2 - point query variants that use the random-points benchmark step
range query, part 1 - range queries without aggregation
range query, part 2 - read-only variants of the original sysbench workload that does range queries with aggregation
writes - benchmark steps that do insert, update and delete

The y-axis starts at 0.8 or 0.76 rather than 0 to make it easier to see the changes.

A summary

There are two benchmark steps that have too much variance which can look like a regression:

update-one_range=100 - this does updates, but all updates are to the same row
scan_range=100 - this does a full scan with a filter that no rows satisfy

Point queries, part 1 - Postgres 16.0 is as fast or faster than 11.21
Point queries, part 2 - Postgress 16.0 is as fast or faster than 11.21
Range queries, part 1 - Postgres 16.0 is as fast or faster than 11.21, ignoring scan_range
Range queries, part 2 - Postgress 16.0 is as fast or faster than 11.21
Writes - Postgres 16.0 and 11.21 have similar throughput, ignoring update-one

Point queries, part 1