Monday, January 1, 2024

Updated Insert benchmark: MyRocks 5.6 and 8.0, medium server, cached database

This has results for the Insert Benchmark using MyRocks 5.6 and 8.0 using a medium server and cached workload. This is my first report that includes MyRocks 8.0.32.

For old MyRocks 5.6.35 vs latest 5.6.35
  • Throughput is similar except for range queries where there might be a small regression of ~7%
For latest MyRocks 8.0.28 vs latest MyRocks 8.0.32
  • Throughput is is similar but there might be a small regression for point queries of ~5%
For latest MyRocks 5.6.35 vs latest MyRocks 8.0.32
  • Throughput in 8.0.32 is worse for write-heavy and better for read-heavy
  • For write-heavy the difference is <= 3% for l.x, l.i1, l.i2 and ~18% for l.i0
  • For read-heavy the difference is between 5% and 9%
Build + Configuration

I tested MyRocks 5.6.35, 8.0.28 and 8.0.32 using the latest code as of December 2023. I also repeated tests for older builds for MyRocks 5.6. These were compiled from source. All builds use CMAKE_BUILD_TYPE =Release.

For the builds with the latest version of MyRocks I used
  • MyRocks 5.6.35 (fbmy5635_rel_221222)
    • compiled from git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b
  • MyRocks 8.0.28 (fbmy8028_rel_221222)
    • compiled from git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b
  • MyRocks 8.0.32 (fbmy8032_rel_221222)
    • compiled from git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b
The older MyRocks 5.6 builds are
  • fbmy5635_rel_202104072149
    • compiled from code as of 2021-04-07 at git hash f896415f with RocksDB 6.19.0
  • fbmy5635_rel_202203072101
    • compiled from code as of 2022-03-07 at git hash e7d976ee with RocksDB 6.28.2
  • fbmy5635_rel_202205192101
    • compiled from code as of 2022-05-19 at git hash d503bd77 with RocksDB 7.2.2
  • fbmy5635_rel_202208092101
    • compiled from code as of 2022-08-09 at git hash 877a0e58 with RocksDB 7.3.1
  • fbmy5635_rel_202210112144
    • compiled from code as of 2022-10-11 at git hash c691c716 with RocksDB 7.3.1
  • fbmy5635_rel_202302162102
    • compiled from code as of 2023-02-16 at git hash 21a2b0aa with RocksDB 7.10.0
  • fbmy5635_rel_202304122154
    • compiled from code as of 2023-04-12 at git hash 205c31dd with RocksDB 7.10.2
  • fbmy5635_rel_202305292102
    • compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.2.1
  • fbmy5635_rel_20230529_832
    • compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.3.2
  • fbmy5635_rel_20230529_843
    • compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.4.3
  • fbmy5635_rel_20230529_850
    • compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.5.0
Most tests used the cza1_gcp_c2s30 my.cnf files that are here for 5.6.35 and for 8.0. Some 8.0 tests used the cza1ps0_gcp_c2s30 my.cnf file that disables the perf schema and is here.

Benchmark
 
The test server is a c2-standard-30 from GCP with 15 cores, hyperthreads disabled, 128G of RAM, Ubuntu 22.04 and XFS on SW RAID 0 over 4 local SSD. The benchmark is run with 8 clients to avoid over-subscribing the CPU.

I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:

  • l.i0
    • insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions).
  • qr100
    • use 3 connections/client. One does range queries for 1200 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.
  • qp100
    • like qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s
Results

The performance reports are here for
The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.

Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: 
  • insert/s for l.i0, l.i1, l.i2
  • indexed rows/s for l.x
  • range queries/s for qr100, qr500, qr1000
  • point queries/s for qp100, qp500, qp1000
From the summary for 5.6
  • The base case is fbmy5635_rel_202104072149
  • Throughput in fbmy5635_rel_221222 is similar to the base case, except for range queries where there might be a small regression of ~7%
    • l.i0 - relative QPS is 1.01
    • l.x - relative QPS is 0.96
    • l.i1, l.i2 - relative QPS is 0.96, 1.01
    • qr100, qr500, qr1000 - relative QPS is 0.93, 0.92, 0.99 
    • qp100, qp500, qp1000 - relative QPS is 0.98, 1.03, 1.01
From the summary for 8.0
  • The base case is fbmy8028_rel_221222
  • Results in MyRocks 8.0.32 with the performance schema disabled are mixed
  • Throughput in fbmy8032_rel_221222 is mostly similar to the base case. There might be a small regression for point queries.
    • l.i0 - relative QPS is 0.94
    • l.x - relative QPS is 1.02
    • l.i1, l.i2 - relative QPS is 0.99, 0.97
    • qr100, qr500, qr1000 - relative QPS is 1.05, 1.00, 1.02
    • qp100, qp500, qp1000 - relative QPS is 0.96, 0.96, 0.95
From the summary 5.6, 8.0 with many versions:
  • The base case is fbmy5635_rel_202104072149
  • Throughput in fbmy8032_rel_221222 relative to the base case is worse for write-heavy and better for read-heavy
    • l.i0 - relative QPS is 0.83
    • l.x - relative QPS is 0.93
    • l.i1, l.i2 - relative QPS is 0.94, 0.97
    • qr100, qr500, qr1000 - relative QPS is 0.98, 1.07, 1.08 
    • qp100, qp500, qp1000 - relative QPS is 1.05, 1.10, 1.08
From the summary for 5.6, 8.0 with latest versions
  • The base case is fbmy5635_rel_221222
  • Throughput in fbmy8032_rel_221222 relative to the base case is worse for write-heavy and better for read-heavy
    • l.i0 - relative QPS is 0.82
    • l.x - relative QPS is 0.97
    • l.i1, l.i2 - relative QPS is 0.98, 0.97
    • qr100, qr500, qr1000 - relative QPS is 1.05, 1.16, 1.09 
    • qp100, qp500, qp1000 - relative QPS is 1.07, 1.07, 1.07

No comments:

Post a Comment