Friday, August 25, 2023

Checking MyRocks 5.6 for regressions with the Insert Benchmark and a large server

This documents how performance changes from old to new releases of MyRocks using the Insert Benchmark and a large server. I use MyRocks 5.6 rather than 8.0 because the 5.6 release go back further in time. This post uses a large server. Previous posts are here for a small server and medium server.

Update - the builds I used were bad so the results here are bogus. I fixed the builds, repeated the tests and share results here. There are no regressions and the initial load of the benchmark is ~10% faster in modern MyRocks.

Comparing old MyRocks with modern MyRocks where old means fbmy5635_202203072101 with RocksDB 6.28.2 and modern means fbmy5635_rel_20230529_850 with RocksDB 8.5.0:

  • Write throughput drops by ~15%. Most of the regressions are confined to a few releases between fbmy5635_rel_202210112144 and fbmy5635_202302162102.
  • Read throughput drops by ~10%. The regressions occur in many releases.
  • In both cases, new CPU overhead explains the performance regression.
Builds

The builds are explained in a previous post but the oldest build used here is fbmy5635_202203072101 which has source as of 2022/03/07 while the other posts have builds going back to 2021/04/07.

The configuration files (my.cnf) are here: base and c5. The difference between them is that c5 adds rocksdb_max_subcompactions=4.

Benchmark

The insert benchmark was run in two setups:

  • cached by RocksDB - all tables fit in the RocksDB block cache
  • IO-bound - the database is larger than memory
The server has 80 HW threads, 40 cores, 256G of RAM and fast NVMe storage with XFS.

The benchmark is run with 24 clients, 24 tables and a client per table. The benchmark is a sequence of steps.

  • l.i0
    • insert X million rows across all tables without secondary indexes where X is 20 for cached and 500 for IO-bound
  • l.x
    • create 3 secondary indexes. I usually ignore performance from this step.
  • l.i1
    • insert and delete another 50 million rows per table with secondary index maintenance. The number of rows/table at the end of the benchmark step matches the number at the start with inserts done to the table head and the deletes done from the tail.
  • q100, q500, q1000
    • do queries as fast as possible with 100, 500 and 1000 inserts/s/client and the same rate for deletes/s done in the background. Run for 7200 seconds.
Results

Performance reports are here for Cached by RocksDB (base config and c5 config) and IO-bound (base config and c5 config).

Results: average throughput

This section explains the average throughput tables in the Summary section. I use relative throughput to save on typing where relative throughput is (throughput for some version  / throughput for base case). When relative throughput is > 1 then some version is faster than the base case. The base case is fbmy5635_202203072101 with source from 2022/03/07 and it uses RocksDB 6.28.2.

Comparing old MyRocks with modern MyRocks where old means fbmy5635_202203072101 with RocksDB 6.28.2 and modern means fbmy5635_rel_20230529_850 with RocksDB 8.5.0:
  • Write throughput drops by ~15%. Most of the regressions are confined to a few releases between fbmy5635_rel_202210112144 and fbmy5635_202302162102. The issue appears to be new CPU overhead -- see the cpupq (CPU/operation) column here.
  • Read throughput drops by ~10%. The regressions occur in many releases.
  • The issue appears to be new CPU overhead -- see the cpupq (CPU/operation column) here for writes and here for reads.
Cached by RocksDB, base config (see here)
  • For fbmy5635_202205192101 with RocksDB 7.2.2
    • Relative throughput is (0.96, 0.98, 0.95) for (l.i0, l.x, l.i1)  so write throughput dropped by ~5% vs the base case
    • Relative throughput is (0.98, 0.95, 0.95) for (q100, q500, q1000) so read throughput dropped by ~5% vs the base case
  • For fbmy5635_202302162102 with RocksDB 7.10.0
    • Relative throughput is (0.98, 0.99, 0.83) for (l.i0, l.x, l.i1). The result for l.i1 is much worse than it was on the next earlier build (fbmy5635_rel_202210112144 with RocksDB 7.3.1) and I am tracking that down.
  • For fbmy5635_rel_20230529_850 with RocksDB 8.5.0
    • Relative throughput is (0.94, 0.97, 0.83) for (l.i0, l.x, l.i1) so the regression to write perf for l.i1 hasn't gotten worse compared to fbmy5635_202302162102
    • Relative throughput is (0.91, 0.87, 0.89) for (q100, q500, q1000) and query performance has small regressions in every release tested
Cached by RocksDB, c5 config (see here)
  • Results are similar to Cached by RocksDB with the base config
IO-bound, base config (see here)
  • Results are similar to Cached by RocksDB with the base config
IO-bound, c5 config (see here)
  • Results are similar to Cached by RocksDB with the base config














No comments:

Post a Comment