Monday, January 1, 2024

Updated Insert benchmark: MyRocks 5.6 and 8.0, medium/large server, cached database

This has results for the Insert Benchmark with MyRocks 5.6.35, 8.0.28 and 8.0.32, a medium/large server and a cached workload. 

tl;dr

  • For read-heavy benchmark steps disabling the perf schema improves performance by ~5%
  • There might be a small regression (~3%) for point queries from 8.0.28 to 8.0.32
  • Throughput in MyRocks 8.0.32 relative to 5.6.35 by benchmark step
    • l.i0 - MyRocks 8.0.32 is ~16% slower
    • l.x - MyRocks 8.0.32 is ~3% faster
    • l.i1, l.i2 - MyRocks 8.0.32 is 3%, 26% faster
    • range queries - MyRocks 8.0.32 is ~15% faster 
    • point queries - MyRocks 8.0.32 is ~4% slower

Small, medium, medium/large and large

I have been describing my test servers as small, medium and large and now I am using medium/large. What does this mean? I will wave my hand and make up definitions:

  • small -  fewer than 10 CPU cores
  • medium - fewer than 20 CPU cores
  • medium/large - fewer than 30 CPU cores
  • large - at least 30 CPU cores

Build + Configuration

I tested MyRocks 5.6.35, 8.0.28 and 8.0.32 using the latest code as of December 2023. These were compiled from source. All builds use CMAKE_BUILD_TYPE =Release.

The versions tested were:
  • MyRocks 5.6.35 (fbmy5635_rel_221222)
    • compiled from git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b
    • used the cza1_c24r64 my.cnf file
  • MyRocks 8.0.28 (fbmy8028_rel_221222)
  • MyRocks 8.0.32 (fbmy8032_rel_221222)
The cza1_c24r64 and cza1ps0_r24c64 differ in one way -- cza1_c24r64 enables the perf schema while cza1ps0_c24r64 disables it.

Benchmark
 
The test server is a SuperMicro SuperWorkstation (Sys-7049A-T) with 2-sockets, 12 cores/socket, hyperthreads disabled, 64G RAM, Ubuntu 22.04 and XFS using a 2TB NVMe m.2 device. The benchmark is run with 12 clients to avoid over-subscribing the CPU. Next time I might use 16.

I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:

  • l.i0
    • insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
  • l.x
    • create 3 secondary indexes per table. There is one connection per client.
  • l.i1
    • use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
  • l.i2
    • like l.i1 but each transaction modifies 5 rows (small transactions).
  • qr100
    • use 3 connections/client. One does range queries for 1200 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.
  • qp100
    • like qr100 except uses point queries on the PK index
  • qr500
    • like qr100 but the insert and delete rates are increased from 100/s to 500/s
  • qp500
    • like qp100 but the insert and delete rates are increased from 100/s to 500/s
  • qr1000
    • like qr100 but the insert and delete rates are increased from 100/s to 1000/s
  • qp1000
    • like qp100 but the insert and delete rates are increased from 100/s to 1000/s
Results

The performance report is here.

The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.

Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: 
  • insert/s for l.i0, l.i1, l.i2
  • indexed rows/s for l.x
  • range queries/s for qr100, qr500, qr1000
  • point queries/s for qp100, qp500, qp1000
  • The base case is fbmy5635_rel_221222
  • For the read-heavy benchmark steps disabling the perf schema improves performance by ~5%
  • There might be a small regression (~3%) for point queries from 8.0.28 to 8.0.32
  • Throughput in fbmy8032_rel_221222 relative to the base case
    • l.i0 - relative QPS is 0.84
    • l.x - relative QPS is 1.03
    • l.i1, l.i2 - relative QPS is 1.03, 1.26
    • qr100, qr500, qr1000 - relative QPS is 1.16, 1.13, 1.18 
    • qp100, qp500, qp1000 - relative QPS is 0.96, 0.96, 0.97


    No comments:

    Post a Comment