Saturday, August 26, 2023

Postgres 16 beta3 and the Insert Benchmark on a small server

This post has results for Postgres 16 beta vs the Insert Benchmark on a small server. I am searching for performance regressions. A previous post using a medium server is here. The tests here used Postgres versions 15.3, 15.4, 16 beta1, 16 beta2 and 16 beta3.

tl;dr

  • I don't see regressions in Postgres 16 beta3 for the workload that is cached by Postgres
  • Results for the IO-bound workloads have too much variance. I don't know whether the issue is Postgres or my HW. It is summer time and the temperature in my home datacenter (aka upstairs) can vary.
  • I am repeating the tests using the o3_native_lto build
Builds

I compiled Postgres 15.3, 15.4, 16 beta1, 16 beta2 and 16 beta3 from source. The builds are named o2_nofp which is shorthand for using: -O2 -no-omit-frame-pointer.

Benchmark

The insert benchmark was run in two setups.

  • cached by Postgres - all tables are cached by Postgres
  • IO-bound - the database is larger than memory

This benchmark used the Beelink server explained here that has 8 cores, 16G RAM and 1TB of NVMe SSD with XFS and Ubuntu 22.04. 

The benchmark is run with 1 client and 1 table. The benchmark is a sequence of steps.

  • l.i0
    • insert X million rows per table where X is 20 for cached and 800 for IO-bound
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another X million rows per table with secondary index maintenance where X is 200 for Cached and 20 for IO-bound. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail. This step took ~20,000 seconds.
  • q100, q500, q1000
    • do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 3600 seconds.
Configurations

I used the a1 and a2 configurations. The difference between them is that a1 has wal_compression set to lz4 while a2 has it set to off.

Results

Reports are here for Cached (a1 config, a2 config) and IO-bound (a1 config, a2 config).

The results for average throughput are interesting and confusing. The tables linked below use absolute and relative throughput where relative throughput is (QPS for my version / QPS for base version) and the base version is Postgres 15.3.
  • Cached, a1 config (see here)
    • Postgres 16 beta2 and beta3 always get more throughput. Relative throughput is between 1.01 and 1.07 for them on every benchmark step.
    • From the metrics for the write-only benchmark steps (l.i0, l.x, l.i1) the CPU overhead (cpupq column) and write IO overhead (wkbpi column) are similar from 15.3 to 16 beta3
    • From the metrics for the read+write benchmark steps (q100, q500, q1000) there might be a slight reduction in CPU overhead (cpupq column)
  • Cached, a2 config (see here)
    • Postgres 16 beta2 and beta3 were faster than 15.3 on the write-only benchmark steps (l.i0, l.x, l.i1) where relative QPS was between 1.00 and 1.03 but almost always slower on the read-write benchmark steps (q100, q500, q1000) where relative QPS was between 0.91 and 1.03. In the read+write steps 16 beta2 had a relative QPS of 1.03 on q1000, otherwise these versions were slower than 15.3.
    • From the metrics for the write-only benchmark steps (l.i0, l.x, l.i1) the CPU overhead (cpupq column) and write overhead (wkbpi column) are similar from 15.3 to 16 beta3
    • From the metrics for the read+write benchmark steps (q100, q500, q1000) the CPU overhead (cpupq column) is frequently larger in 16 beta versions and this might explain the reduction in QPS.
  • IO-bound, a1 config (see here)
    • I am reluctant to characterize these because there is too much variance
  • IO-bound, a2 config (see here)
    • I am reluctant to characterize these because there is too much variance

No comments:

Post a Comment