Monday, January 10, 2022

Differential Flame Graphs to Highlight Performance Schema Waits Impact

Yet another type of flame graphs that I had not discussed yet is a differential flame graph (again invented by Brendan Gregg). It shows the difference of two flame graphs in a clear way (assuming they are comparable - it's on you to make sure comparison makes sense and interpret the output properly). The flame graph is drawn using the "after" profile (such that the frame widths show values for the second graph), and then colorized by the delta to show how we got there. If the metric for the specific frame in the same stack is larger then it is shown in red. If it's smaller the frame is blue (hence the name red/blue differential flame graphs). The saturation is relative to the delta, so dark red is for the frame that has much bigger value in the second graph.While and very light frames can be ignored.

Let me apply this approach to the flame graphs showing waits reported by MySQL's performance schema (built as described in this blog post). As a proof of concept I'll use an easy to interpret case where the same oltp_write_only.lua sysbench test is run with different values of innodb_flush_log_at_trx_commit and otherwise the same settings like 16 concurrent threads on my old 2 cores Ubuntu 20.04 "home server" with slow HDD.

So, here is the first run:

sysbench --table-size=1000000 --threads=1 --report-interval=5 --mysql-socket=/tmp/mysql8.sock --mysql-user=root --mysql-db=sbtest /usr/share/sysbench/oltp_write_only.lua cleanup
sysbench --table-size=1000000 --threads=1 --report-interval=5 --mysql-socket=/tmp/mysql8.sock --mysql-user=root --mysql-db=sbtest /usr/share/sysbench/oltp_write_only.lua prepare
bin/mysql -uroot --socket=/tmp/mysql8.sock performance_schema -B -e'truncate table events_waits_history_long'
bin/mysql -uroot --socket=/tmp/mysql8.sock performance_schema -B -e'set global innodb_flush_log_at_trx_commit=0'
sysbench --table-size=1000000 --threads=16 --time=120 --report-interval=5 --mysql-socket=/tmp/mysql8.sock --mysql-user=root --mysql-db=sbtest /usr/share/sysbench/oltp_write_only.lua run
bin/mysql -uroot --socket=/tmp/mysql8.sock performance_schema -B -e'select event_name, timer_wait from events_waits_history_long' >/tmp/waits64_0.txt
cat /tmp/waits64_0.txt | awk '{ printf("%s %d\n", $1, $2); }' | sed 's/\//;/g' > /tmp/w64_0.out

That ended up with this sysbench statistics:

SQL statistics:
    queries performed:
        read:                            0
        write:                           26252
        other:                           13126
        total:                           39378
    transactions:                        6563   (54.60 per sec.)
    queries:                             39378  (327.59 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          120.2017s
    total number of events:              6563

Latency (ms):
         min:                                   66.37
         avg:                                  292.94
         max:                                 1222.92
         95th percentile:                      530.08
         sum:                              1922587.72

Threads fairness:
    events (avg/stddev):           410.1875/4.69
    execution time (avg/stddev):   120.1617/0.05

and then the second:

sysbench --table-size=1000000 --threads=1 --report-interval=5 --mysql-socket=/tmp/mysql8.sock --mysql-user=root --mysql-db=sbtest /usr/share/sysbench/oltp_write_only.lua cleanup
sysbench --table-size=1000000 --threads=1 --report-interval=5 --mysql-socket=/tmp/mysql8.sock --mysql-user=root --mysql-db=sbtest /usr/share/sysbench/oltp_write_only.lua prepare
bin/mysql -uroot --socket=/tmp/mysql8.sock performance_schema -B -e'truncate table events_waits_history_long'
bin/mysql -uroot --socket=/tmp/mysql8.sock performance_schema -B -e'set global innodb_flush_log_at_trx_commit=1'
sysbench --table-size=1000000 --threads=16 --time=120 --report-interval=5 --mysql-socket=/tmp/mysql8.sock --mysql-user=root --mysql-db=sbtest /usr/share/sysbench/oltp_write_only.lua run
bin/mysql -uroot --socket=/tmp/mysql8.sock performance_schema -B -e'select event_name, timer_wait from events_waits_history_long' >/tmp/waits64_1.txt
cat /tmp/waits64_1.txt | awk '{ printf("%s %d\n", $1, $2); }' | sed 's/\//;/g' > /tmp/w64_1.out

that produced the following:

SQL statistics:
    queries performed:
        read:                            0
        write:                           16238
        other:                           8119
        total:                           24357
    transactions:                        4059   (33.71 per sec.)
    queries:                             24357  (202.27 per sec.)
    ignored errors:                      1      (0.01 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          120.4139s
    total number of events:              4059

Latency (ms):
         min:                                  151.87
         avg:                                  474.29
         max:                                 1316.65
         95th percentile:                      773.68
         sum:                              1925150.61

Threads fairness:
    events (avg/stddev):           253.6875/3.51
    execution time (avg/stddev):   120.3219/0.13

Now with resulting "collapsed stacks" in /tmp/w64_?.out files, we can build a simple differential flame graph 

openxs@ao756:~/dbs/8.0$ ~/git/FlameGraph/difffolded.pl /tmp/w64_0.out /tmp/w64_1.out | ~/git/FlameGraph/flamegraph.pl --count picoseconds --title Waits > /tmp/w64_01_diff.svg

that, when captured as a .png file for this blog post looks as follows:

Differential flame graph visualizing the impact of innodb_flush_log_at_trx_commit value (the difference 1 makes vs 0) on waits reported by performance_schema for the same oltp_write_only.lua test

We clearly see the negative impact on performance (54 TPS vs 33 TPS). We clearly see that time spent on InnoDB log I/O increased, same as idle time somewhat increased. The highest relative increase was for /wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done condition variable (5+%) and that was probably related to binlog group commit implementation where we could not write to the binary log until redo log is flushed. Wild guess, surely.

I deliberately used as simple flame graphs as possible to make interpretation of the difference based on just a screenshot easier. This lame test demonstrated that we really can see what we expected - somewhat increased redo log I/O waits, and they are highlighted as red. Good enough for the proof of concept and way easier to speculate about than usual Off-CPU differential flame graph like this:

that I built while attempting to reproduce one real life MariaDB performance problem...

No comments:

Post a Comment