Sunday, May 31, 2020

Dynamic Tracing of Memory Allocations in MySQL: First Steps with perf

I often have to deal with customers claiming there is a memory leak in MySQL affecting their production servers. To prove the leak is real and show where it happens usually running with tcmalloc and relying to it's heap profiler, or even more intrusive, running under Valgrind Massif is needed. This is normal when the test case demonstrating the leak is isolated and we prove a leak for some bug report. But hardly anyone in production in a normal situation would agree to do this, as performance impact is very serious.

So, recently I decided to check if dynamic tracing (along the lines of this page by Brendan Gregg, with Flame Graphs to show suspicious code path) is a more reasonable alternative. As I have to work with different Linux versions, often as old as CentOS 6, I can not rely on bcc tools for this. I mentioned more than once that in internal discussion that adding dynamic probe for malloc() (and other memory allocating functions, if you prefer) with perf may already be a good start and give some data to study. So, time to try this approach myself (so that I am ready to suggest it to customers with all the details at hand) and continue my series of posts on dynamic tracing. It can not be all the time about bugs, readers needs solutions as well :)

For this post I decided to work with latest and greatest MySQL 8.0.20 that I have built from source on my good old almost irrelevant Ubuntu 16.04 netbook:
openxs@ao756:~$ uname -a
Linux ao756 4.4.0-178-generic #208-Ubuntu SMP Sun Apr 5 23:45:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
openxs@ao756:~$ cd dbs/8.0
openxs@ao756:~/dbs/8.0$ bin/mysql -uroot --socket=/tmp/mysql8.sock -e "show variables like '%version%'"
+--------------------------+-----------------------+
| Variable_name            | Value                 |
+--------------------------+-----------------------+
| immediate_server_version | 999999                |
| innodb_version           | 8.0.20                |
| original_server_version  | 999999                |
| protocol_version         | 10                    |
| slave_type_conversions   |                       |
| tls_version              | TLSv1,TLSv1.1,TLSv1.2 |
| version                  | 8.0.20                |
| version_comment          | Source distribution   |
| version_compile_machine  | x86_64                |
| version_compile_os       | Linux                 |
| version_compile_zlib     | 1.2.11                |
+--------------------------+-----------------------+
openxs@ao756:~/dbs/8.0$
The reason is simple. Looks like I have more readers while writing about MySQL, not some fork of it :) I hope this tend will change soon.

First of all, I have to check what memory allocation library is used (the details of adding probe and even some features available to probe may be different for jemalloc, for example):
openxs@ao756:~/dbs/8.0$ ldd bin/mysqld
        linux-vdso.so.1 =>  (0x00007ffe8afb6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc9bdfe2000)
        libprotobuf-lite.so.3.6.1 => /home/openxs/dbs/8.0/bin/../lib/private/libprotobuf-lite.so.3.6.1 (0x00007fc9bdd8d000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc9bdb85000)
        libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007fc9bd91c000)
        libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007fc9bd4d7000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc9bd2d3000)
        libaio.so.1 => /lib/x86_64-linux-gnu/libaio.so.1 (0x00007fc9bd0d1000)
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007fc9bcec6000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc9bcb44000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc9bc83b000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc9bc625000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc9bc25b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fc9be1ff000)
openxs@ao756:~/dbs/8.0$
I do not see tcmalloc or jemalloc in the output above, but libc is linked dynamically. So, malloc() implementation would likely come from it (/lib/x86_64-linux-gnu/libc.so.6 binary, to be specific). Let me add the probe (see this older post for more details, some details about probes syntax presented here are also useful):
openxs@ao756:~$ sudo perf probe -x /lib/x86_64-linux-gnu/libc.so.6 'malloc size=%di:s64'
[sudo] password for openxs:
Added new events:
  probe_libc:malloc    (on malloc in /lib/x86_64-linux-gnu/libc-2.23.so with size=%di:s64)
  probe_libc:malloc_1  (on malloc in /lib/x86_64-linux-gnu/libc-2.23.so with size=%di:s64)

You can now use it in all perf tools, such as:

        perf record -e probe_libc:malloc_1 -aR sleep 1

One of the first things I want to be able to find out is how many bytes are allocated by malloc(). In case of libc malloc() on this Ubuntu I could just use the fist and only argument name (you'd be surprised what it really is, try to find out!) or just $params (see man perf-probe for details).  But in a general case (like with jemalloc on the same system) I should better refer to it based on the way it is passed - via %DI (not short form, NOT %EAI) register on x86 systems. This is why you see size=%di:s64 in the probe definition. Note also that to get real value as integer I had to use s64 type (for signed 64-bit integer, even though I;d expect the argument to be unsigned). By default and as unsigned we get hex value, and one day I'd like to be able to count also bytes allocated, not just calls.

I wanted to add return probe for the same function, in a hope to store (one day) the pointer and then maybe check (with other probe) if it is ever explicitly freed. But in my case it is not possible to get the return value:
openxs@ao756:~$ sudo perf probe -x  /lib/x86_64-linux-gnu/libc.so.6 malloc_ret='malloc%return $retval'
Failed to find "__libc_malloc%return",
 because __libc_malloc is an inlined function and has no return point.
Added new event:
  probe_libc:malloc_ret (on malloc%return in /lib/x86_64-linux-gnu/libc-2.23.so with $retval)

You can now use it in all perf tools, such as:

        perf record -e probe_libc:malloc_ret -aR sleep 1
I am yet to check the reason, but according to the above the function is inlined and thus does not have a return point. Sounds weird. Note that for jemalloc I was able to add similar probe successfully and get the pointer value traced.

So, what do I get when I try to trace this probe in perf? Let me show. If I run some stupid test surely allocating memory in one shell:
openxs@ao756:~/dbs/8.0$ bin/mysqlslap -uroot --socket=/tmp/mysql8.sock --concurrency=12 --create-schema=test --no-drop --number-of_queries=300000 --iterations=2 --query='select @a := repeat("a", 1000);'
While in the other I'll try to record some samples:
openxs@ao756:~$ sudo perf record -e 'probe_libc:malloc' -aRg sleep 10
[sudo] password for openxs:
Lowering default frequency rate to 3250.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 710 times to write data ]
[ perf record: Captured and wrote 179.217 MB perf.data (1017163 samples) ]
Then I'll get the output like this from perf script:
openxs@ao756:~$ sudo perf script | grep mysqld | more
...
mysqld   632 [001] 218853.089892: probe_libc:malloc: (7fc7312b7130) size=120
                  a18de2 std::__detail::_Hashtable_alloc<Malloc_allocator<std::_
_detail::_Hash_node<TABLE_LIST*, true> > >::_M_allocate_buckets (/home/openxs/db
s/8.0/bin/mysqld)
                  a19061 lock_table_names (/home/openxs/dbs/8.0/bin/mysqld)
                  a1adc8 open_tables (/home/openxs/dbs/8.0/bin/mysqld)
                  a1b576 open_tables_for_query (/home/openxs/dbs/8.0/bin/mysqld)
                  ae7313 Sql_cmd_dml::prepare (/home/openxs/dbs/8.0/bin/mysqld)
                  aef86f Sql_cmd_dml::execute (/home/openxs/dbs/8.0/bin/mysqld)
                  a92b0a mysql_execute_command (/home/openxs/dbs/8.0/bin/mysqld)
                  a97824 mysql_parse (/home/openxs/dbs/8.0/bin/mysqld)
                  a99605 dispatch_command (/home/openxs/dbs/8.0/bin/mysqld)
                  a9abd0 do_command (/home/openxs/dbs/8.0/bin/mysqld)
                  bba6b0 handle_connection (/home/openxs/dbs/8.0/bin/mysqld)
                 20d0355 pfs_spawn_thread (/home/openxs/dbs/8.0/bin/mysqld)
...
mysqld  2795 [000] 218853.091016: probe_libc:malloc: (7fc7312b7130) size=1040                  bb7ed3 String::real_alloc (/home/openxs/dbs/8.0/bin/mysqld)
                  dda12f Item_func_repeat::val_str (/home/openxs/dbs/8.0/bin/mys
qld)
                  da5272 Item_func_set_user_var::check (/home/openxs/dbs/8.0/bin
/mysqld)
                  da5874 Item_func_set_user_var::val_str (/home/openxs/dbs/8.0/b
in/mysqld)
                  d1d1cf Item::send (/home/openxs/dbs/8.0/bin/mysqld)
                  a23d29 THD::send_result_set_row (/home/openxs/dbs/8.0/bin/mysq
ld)
                  eb86c1 Query_result_send::send_data (/home/openxs/dbs/8.0/bin/
mysqld)
                  b55524 SELECT_LEX_UNIT::ExecuteIteratorQuery (/home/openxs/dbs
/8.0/bin/mysqld)
                  b5566e SELECT_LEX_UNIT::execute (/home/openxs/dbs/8.0/bin/mysq
ld)
                  ae8f52 Sql_cmd_dml::execute_inner (/home/openxs/dbs/8.0/bin/my
sqld)
                  aefc37 Sql_cmd_dml::execute (/home/openxs/dbs/8.0/bin/mysqld)
                  a92b0a mysql_execute_command (/home/openxs/dbs/8.0/bin/mysqld)
                  a97824 mysql_parse (/home/openxs/dbs/8.0/bin/mysqld)
                  a99605 dispatch_command (/home/openxs/dbs/8.0/bin/mysqld)
...
Note the following:
  • I tried to trace all calls to malloc() from glibc, for all processes, with stack traces. As a result I've got 179 MB of raw data in perf.data over just 10 seconds, with default frequency of sampling! This is too much for any pracrtical purpose, so make sure to set smaller frequency (like -F99) or performance impact may be too seriois. With newer kernels and eBPF this impact may be notably reduced.
  • I've got both allocation size reported (and may get a good use for it one day), and stack traces, showing where the allocation is coming from! This is exactly what developers need to check if it is really a possible leak.
Now what I can I do with the above? I can use perf report (sudo perf report > /tmp/report.txt) and get the call graphs in text:
openxs@ao756:~$ cat /tmp/report.txt | more                                     
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 556  of event 'probe_libc:malloc'
# Event count (approx.): 4935574
#
# Children      Self  Command          Shared Object        Symbol

# ........  ........  ...............  ...................  ....................
................................................................................
................................................................................
..........
#
...
   87.56%    87.56%  mysqlslap        libc-2.23.so         [.] malloc
                  |
                  ---start_thread
                     run_task
                     |
                     |--86.77%-- mysql_real_query
                     |          cli_read_query_result
                     |          read_com_query_metadata
                     |          cli_read_metadata_ex
                     |          MEM_ROOT::AllocSlow
                     |          MEM_ROOT::AllocBlock
                     |          malloc
                     |
                      --0.79%-- mysql_store_result
                                cli_read_rows
                                MEM_ROOT::AllocSlow
                                MEM_ROOT::AllocBlock
                                malloc
...
    10.18%     0.00%  mysqld           mysqld               [.] do_command
                     |
                     ---do_command
                        dispatch_command
                        |
                        |--9.00%-- mysql_parse
                        |          |
                        |          |--7.88%-- mysql_execute_command
                        |          |          Sql_cmd_dml::execute
                        |          |          |
                        |          |          |--6.64%-- Sql_cmd_dml::prepare
                        |          |          |          open_tables_for_query
                        |          |          |          open_tables
                        |          |          |          lock_table_names
                        |          |          |          |
                        |          |          |          |--6.04%-- std::__detai
l::_Hashtable_alloc<Malloc_allocator<std::__detail::_Hash_node<TABLE_LIST*, true
> > >::_M_allocate_buckets
                        |          |          |          |          malloc
                        |          |          |          |
                        |          |          |           --0.59%-- get_and_lock
_tablespace_names
                        |          |          |                     std::__detai
l::_Hashtable_alloc<Malloc_allocator<std::__detail::_Hash_node<std::__cxx11::bas
ic_string<char, std::char_traits<char>, std::allocator<char> >, true> > >::_M_al
locate_buckets
                        |          |          |                     malloc
                        |          |          |
                        |          |           --1.24%-- Sql_cmd_dml::execute_inner
                        |          |                     SELECT_LEX_UNIT::execute
                        |          |                     SELECT_LEX_UNIT::ExecuteIteratorQuery
                        |          |                     Query_result_send::send_data
                        |          |                     THD::send_result_set_row
                        |          |                     Item::send
                        |          |                     |
                        |          |                     |--0.66%-- Item_func_set_user_var::val_str
                        |          |                     |          Item_func_set_user_var::check
                        |          |                     |          Item_func_repeat::val_str
                        |          |                     |          String::real_alloc
                        |          |                     |          malloc
                        |          |                     |
                        |          |                      --0.58%-- user_var_entry::val_str
                        |          |                                String::copy
                        |          |                                String::real_alloc
                        |          |                                malloc
                        |          |
                        |           --1.12%-- parse_sql
                        |                     |
                        |                     |--0.66%-- THD::sql_parser
                        |                     |          MYSQLparse
                        |                     |          THD::raise_condition
                        |                     |          Diagnostics_area::push_warning
                        |                     |          MEM_ROOT::AllocSlow
                        |                     |          MEM_ROOT::AllocBlock
                        |                     |          malloc
                        |                     |
                        |                      --0.46%-- Diagnostics_area::copy_sql_conditions_from_da
                        |                                Diagnostics_area::push_warning
                        |                                Diagnostics_area::push_warning
                        |                                MEM_ROOT::AllocSlow
                        |                                MEM_ROOT::AllocBlock
                        |                                malloc
                        |
                         --1.18%-- THD::swap_rewritten_query
                                   String::mem_realloc
                                   malloc
...
I can also create a flame graph of number of malloc() calls per function/sequence of calls. I had to modify steps from the original source a big, to come up with the following:
openxs@ao756:~$ sudo perf script > out.stack
openxs@ao756:~$ git/FlameGraph/stackcollapse-perf.pl < out.stack | git/FlameGraph/flamegraph.pl --color=mem --title='malloc( Fale Graph' --countname="calls" -- > malloc.svg
For this test I've included allocations for all the processes and had to use stackcollapse-perf.pl script, as stack traces are coming in perf format.

The result (actually a part of it, just for the mysqld process, copy/pasted as .png static file) may look as follows:

Sample malloc() calls flame graph build from one of tests.
Now you can see where most malloc() calls happen, and this is the first step in checking where leaks may come from if we really suspect them. Quite a lot of fun for a simple probe added with perf, on any Linux system younger than 10 years!

Next time I am going to apply this approach to some real life use case that looks like a leak and see if anything useful, not just fun, can be derived from it. For now I know it's doable and I am sure about the steps (documented here and more in my background notes) to trace malloc() with perf dynamic probe.

No comments:

Post a Comment