Tuesday, September 26, 2023

Checking MyRocks 5.6 for regressions with the Insert Benchmark and a large server, revisited

I previously shared results for MyRocks 5.6 and claimed there was a perf regression in the Insert Benchmark. I then shared a follow up post as I searched for the source of the regression. The results were bogus, there is no regression and this post explains what happened.

The mistake I made is that all builds used FDO (feedback driven optimization) and while the builds tested span ~18 months (from early 2022 until mid 2023) they were all using the same profile input for FDO. While I don't know much about FDO I assume that is a bad idea. So I recompiled everything with FDO disabled and repeated the tests.

The results here use MyRocks builds from Feb 2022 to June 2023 which use RocksDB versions from 6.28 to 8.6. The goal is to determine whether there are perf regressions from the old to new versions. While MyRocks 5.6 did not change much in that time, RocksDB has changed significantly. So this is mostly a search for perf regressions in RocksDB.

tl;dr

  • There are no perf regressions for MyRocks from RocksDB 6.28.2 to 8.6.3
  • Throughput for the initial load (l.i0 benchmark step) is ~10% better starting with RocksDB 7.10.0
Builds

All builds used the Release build type, LTO (link time optimization) but disabled FDO (feedback driven optimization).

The builds are:

  • fbmy5635_20220307_6282_fdo0_lto1 - FB MySQL 5.6.35 at git hash e7d976ee (7 Mar 2022 tag) with RocksDB 6.28.2
  • fbmy5635_20220519_722_fdo0_lto1 - FB MySQL 5.6.35 at git hash d503bd77 (19 May 2022 tag) with RocksDB 7.2.2
  • fbmy5635_20220809_731_fdo0_lto1 - FB MySQL 5.6.35 at git hash 877a0e58 (9 Aug 2022 tag) with RocksDB 7.3.1
  • fbmy5635_20221011_731_fdo0_lto1 - FB MySQL 5.6.35 at git hash c691c716 (11 Oct 2022 tag) with RocksDB 7.3.1
  • fbmy5635_20230216_7100_fdo0_lto1 - FB MySQL 5.6.35 at git hash 21a2b0aa (16 Feb 2023 tag) with RocksDB 7.10.0
  • fbmy5635_20230412_7102_fdo0_lto1 - FB MySQL 5.6.35 at git hash 205c31dd (12 Apr 2023 tag) with RocksDB 7.10.2
  • fbmy5635_20230529_821_fdo0_lto1 - FB MySQL 5.6.35 at git hash b739eac1 (29 May 2023 tag) with RocksDB 8.2.1
  • fbmy5635_20230628_821_fdo0_lto1 - FB MySQL 5.6.35 at git hash 7e40af67 (28 Jun 2023 tag) with RocksDB 8.2.1
  • fbmy5635_20230628_833_fdo0_lto1 - FB MySQL 5.6.35 at git hash 7e40af67 (28 Jun 2023 tag) upgraded to RocksDB 8.3.3
  • fbmy5635_20230628_844_fdo0_lto1 - FB MySQL 5.6.35 at git hash 7e40af67 (28 Jun 2023 tag) upgraded to RocksDB 8.4.4
  • fbmy5635_20230628_853_fdo0_lto1 - FB MySQL 5.6.35 at git hash 7e40af67 (28 Jun 2023 tag) upgraded to RocksDB 8.5.3
  • fbmy5635_20230628_863_fdo0_lto1 - FB MySQL 5.6.35 at git hash 7e40af67 (28 Jun 2023 tag) upgraded to RocksDB 8.6.3
The c5 configuration file (my.cnf) was used.

Benchmark

The Insert Benchmark was run in two setups:

  • cached by RocksDB - all tables fit in the RocksDB block cache
  • IO-bound - the database is larger than memory
The server has 80 HW threads, 40 cores, 256G of RAM and fast NVMe storage with XFS.

The benchmark is run with 24 clients, 24 tables and a client per table. The benchmark is a sequence of steps.

  • l.i0
    • insert X million rows across all tables without secondary indexes where X is 20 for cached and 500 for IO-bound
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another 50 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail.
  • q100, q500, q1000
    • do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 3600 seconds.
Results

Performance reports are here for Cached by RocksDB and for IO-bound.

From the Summary sections for cached and for IO-bound there are tables for absolute and relative QPS per benchmark step. The relative QPS is (QPS for me / for QPS fbmy5635_20220307_6282_fdo0_lto1).

For cached benchmark steps in modern MyRocks the QPS for the l.i0 benchmark step (initial load) is up to 10% faster than the base case. Otherwise there are no regressions.

For IO-bound benchmark steps in modern MyRocks the QPS for the l.i0 benchmark step (initial load) is up to 10% faster than the base case. Most other benchmark steps get 1% or 2% more QPS in modern MyRocks than the base case.


No comments:

Post a Comment