Thursday, January 7, 2021

Linux /proc Filesystem for MySQL DBAs - Part II, Threads of the mysqld Process

It's common knowledge that MySQL server (mysqld process) is multi-threaded. Let me quote:

"The MySQL Server (mysqld) executes as a single OS process, with multiple threads executing concurrent activities. MySQL does not have its own thread implementation, but relies on the thread implementation of the underlying OS."

In the previous post we've seen that on Linux the /proc/[pid]/task/ subdirectory contains subdirectories of the form [tid]/, one for each thread of the process (pid). This is what I have for the Percona Server 5.7.32-35 I use on Ubuntu 16.04 as an example in this series of blog posts:

mysql> select connection_id(), version(), @@pid_file;
+-----------------+---------------+----------------------------+
| connection_id() | version()     | @@pid_file                 |
+-----------------+---------------+----------------------------+
|             167 | 5.7.32-35-log | /var/run/mysqld/mysqld.pid |
+-----------------+---------------+----------------------------+
1 row in set (0.00 sec)

mysql> \! sudo cat /var/run/mysqld/mysqld.pid
30580
mysql> \! ls -F /proc/30580/task
2389/  2394/   30582/  30586/  30590/  30594/  30600/  30604/  30608/  3620/
2390/  2488/   30583/  30587/  30591/  30597/  30601/  30605/  30609/
2391/  30580/  30584/  30588/  30592/  30598/  30602/  30606/  31000/
2393/  30581/  30585/  30589/  30593/  30599/  30603/  30607/  31002/
mysql>

I've also shown a poor man's monitoring shell script idea to study some /proc information for each thread. Let me present it nicely structured for the case of wchan check to see where thread waits (I started sysbench test with 16 threads to put some load and see more different wait channels):

openxs@ao756:~$ for dir in `ls /proc/30580/task`
> do
>   echo -n $dir': '
>   2>/dev/null sudo strings /proc/$dir/wchan
> done
2389: futex_wait_queue_me
2390: futex_wait_queue_me
2391: futex_wait_queue_me
2393: 2394: futex_wait_queue_me
2488: futex_wait_queue_me
30580: poll_schedule_timeout
30581: do_sigtimedwait
30582: read_events
...
30591: read_events
30592: futex_wait_queue_me
30593: hrtimer_nanosleep
30594: futex_wait_queue_me
30597: futex_wait_queue_me
30598: futex_wait_queue_me
30599: hrtimer_nanosleep
30600: futex_wait_queue_me
...
30607: futex_wait_queue_me
30608: do_sigtimedwait
30609: futex_wait_queue_me
31000: jbd2_log_wait_commit
31002: futex_wait_queue_me
3620: futex_wait_queue_me
...
6514: futex_wait_queue_me
openxs@ao756:~$

Note that wchan was empty for the thread with tid 2393 based on the output, so it was running at the moment.

But as comm and cmdline files content for MySQL threads is not changed, how do we know what each thread is doing or what kind of thread is it? Probably if we attach gdb... but the topic of this series is different. Luckily, in MySQL 5.7 we have a nice insight in the performance_schema.threads table, the thread_os_id column contains exactly what we need:

mysql> select thread_id, thread_os_id, name from performance_schema.threads where type = 'BACKGROUND';
+-----------+--------------+----------------------------------------+
| thread_id | thread_os_id | name                                   |
+-----------+--------------+----------------------------------------+
|         1 |        30580 | thread/sql/main                        |
|         2 |        30581 | thread/sql/thread_timer_notifier       |
|         3 |        30582 | thread/innodb/io_ibuf_thread           |
|         4 |        30584 | thread/innodb/io_read_thread           |
|         5 |        30583 | thread/innodb/io_log_thread            |
|         6 |        30585 | thread/innodb/io_read_thread           |
|         7 |        30586 | thread/innodb/io_read_thread           |
|         8 |        30587 | thread/innodb/io_read_thread           |
|         9 |        30588 | thread/innodb/io_write_thread          |
|        10 |        30589 | thread/innodb/io_write_thread          |
|        11 |        30590 | thread/innodb/io_write_thread          |
|        12 |        30591 | thread/innodb/io_write_thread          |
|        13 |        30592 | thread/innodb/page_cleaner_thread      |
|        14 |        30593 | thread/innodb/buf_lru_manager_thread   |
|        15 |        30594 | thread/innodb/srv_monitor_thread       |
|        17 |        30603 | thread/innodb/srv_worker_thread        |
|        18 |        30602 | thread/innodb/srv_worker_thread        |
|        19 |        30601 | thread/innodb/srv_worker_thread        |
|        20 |        30600 | thread/innodb/srv_purge_thread         |
|        21 |        30599 | thread/innodb/srv_master_thread        |
|        22 |        30598 | thread/innodb/srv_error_monitor_thread |
|        23 |        30597 | thread/innodb/srv_lock_timeout_thread  |
|        24 |        30605 | thread/innodb/dict_stats_thread        |
|        25 |        30604 | thread/innodb/buf_dump_thread          |
|        26 |        30608 | thread/sql/signal_handler              |
+-----------+--------------+----------------------------------------+
25 rows in set (0.05 sec)

This is what I've used to state in the previous post that 30592 was a page cleaner thread. Here we can see that threads waiting for timer are InnoDB's master thread and LRU manager thread that are woken up periodically... What was that thread 31000 waiting on something related to commit at the filesystem level? It was a user connection thread, obviously:

mysql> select thread_id, thread_os_id, type, name from performance_schema.threads where thread_os_id = 31000;
+-----------+--------------+------------+---------------------------+
| thread_id | thread_os_id | type       | name                      |
+-----------+--------------+------------+---------------------------+
|       217 |        31000 | FOREGROUND | thread/sql/one_connection |
+-----------+--------------+------------+---------------------------+
1 row in set (0.00 sec)

But are we really forced to run scripts and access /proc directly? No, and fine MySQL manual gives some hints, for example, about ps -L option:

"For Linux, THREAD_OS_ID corresponds to the value of the gettid() function. This value is exposed, for example, using the perf or ps -L commands, or in the proc file system (/proc/[pid]/task/[tid]). For more information, see the perf-stat(1), ps(1), and proc(5) man pages. "

So, with ps -L we can see the threads of the mysqld process:

openxs@ao756:~$ ps -L aux | grep -e PID -e `pidof mysqld` | more
USER       PID   LWP %CPU NLWP %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
...
mysql    30580 30580  0.0   38  7.7 746308 299448 ?       Sl   Jan02   0:00 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
mysql    30580 30581  0.0   38  7.7 746308 299448 ?       Sl   Jan02   0:00 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
mysql    30580 30582  0.0   38  7.7 746308 299448 ?       Sl   Jan02   0:06 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
mysql    30580 30583  0.0   38  7.7 746308 299448 ?       Sl   Jan02   0:07 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
mysql    30580 30584  0.0   38  7.7 746308 299448 ?       Sl   Jan02   0:06 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
mysql    30580 30585  0.0   38  7.7 746308 299448 ?       Sl   Jan02   0:06 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
...

 We can clearly see tid in the column titled LWP. grep allows us to apply conditions to the rows we see in the command line output. It is similar to the WHERE clause of SELECT statement. But as the rest of the information in the output looks the same for all threads we need something better in ps command line options to show the list of columns with useful values. Note that grep over all threads is actually not mandatory, we can use -p option to show threads only for the given pid:

openxs@ao756:~$ ps -L -p `pidof mysqld` | more
  PID   LWP TTY          TIME CMD
30580 30580 ?        00:00:00 mysqld
30580 30581 ?        00:00:00 mysqld
30580 30582 ?        00:00:06 mysqld
30580 30583 ?        00:00:07 mysqld
30580 30584 ?        00:00:06 mysqld
30580 30585 ?        00:00:06 mysqld
30580 30586 ?        00:00:07 mysqld
30580 30587 ?        00:00:06 mysqld
...

The real problem is the list of columns - the default one we see above is even less useful. Time to read man ps where we can find the -o option:

 -o format

              User-defined format.  format is a single argument in the
              form of a blank-separated or comma-separated list, which
              offers a way to specify individual output columns.  The
              recognized keywords are described in the STANDARD FORMAT
              SPECIFIERS section below. ...

So, after reading the details we are ready to proceed to listing all columns we need explicitly, like this:

openxs@ao756:~$ ps -o lwp,pcpu,state,wchan -L -p `pidof mysqld`
  LWP %CPU S WCHAN
30580  0.0 S -
30581  0.0 S -
...
30592  0.0 S -
...
30609  0.0 S -
 7192  0.6 D -
 7194  0.6 S -
 7195  0.6 S -
 7377  0.6 S -
 7379  0.6 S -
 9112  0.7 S -
 9115  0.7 S -
 9119  0.7 S -
 9120  0.7 D -
 9436  1.5 S -
...

We can also sort the output by different criteria either with ps options or with external sort as a kind of ORDER BY, do GROUP BY with sort + uniq etc. For example, we can summarize thread states in the process like this:

openxs@ao756:~$ ps -o state -L -p `pidof mysqld` | sort | uniq -c | sort -nbr
     37 S
      7 R
      1 D

The idea is clear, hopefully. For quick checks and summarizing it may be enough to use ps with options to show threads (-L) and specific fields (-o) and process the output by standard Linux command line utilities.

As a side note, you can monitor threads of the mysqld process with top -H option as well:

openxs@ao756:~$ top -version
  procps-ng version 3.3.10

Usage:

  top -hv | -bcHiOSs -d secs -n max -u|U user -p pid(s) -o field -w [cols]

openxs@ao756:~$ top -H -b -n 1 -o +%CPU -p `pidof mysqld`
top - 19:40:20 up 30 days,  7:47,  4 users,  load average: 4.08, 2.87, 2.06
Threads:  44 total,   9 running,  35 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.3 us,  6.3 sy,  5.2 ni, 59.7 id,  3.5 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem :  3859384 total,   188328 free,  1052156 used,  2618900 buff/cache
KiB Swap:  4001788 total,  2464336 free,  1537452 used.  2008620 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
10002 mysql     20   0  746308 307432   8872 D  6.2  8.0   0:15.70 mysqld
10977 mysql     20   0  746308 307432   8872 S  6.2  8.0   0:01.97 mysqld
10978 mysql     20   0  746308 307432   8872 R  6.2  8.0   0:01.95 mysqld
10981 mysql     20   0  746308 307432   8872 R  6.2  8.0   0:01.92 mysqld

30580 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:00.35 mysqld
30581 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:00.00 mysqld
30582 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:07.42 mysqld
30583 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:07.77 mysqld
30584 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:07.46 mysqld
30585 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:07.52 mysqld
30586 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:07.79 mysqld
30587 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:07.50 mysqld
30588 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:07.72 mysqld
30589 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:08.08 mysqld
30590 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:07.70 mysqld
30591 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:07.80 mysqld
30592 mysql     20   0  746308 307432   8872 S  0.0  8.0   3:09.57 mysqld
30593 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:04.68 mysqld
30594 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:00.91 mysqld
30597 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:04.62 mysqld
30598 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:07.12 mysqld
30599 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:08.67 mysqld
30600 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:41.00 mysqld
30601 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:11.73 mysqld
30602 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:12.69 mysqld
30603 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:14.40 mysqld
30604 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:00.00 mysqld
30605 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:01.74 mysqld
30606 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:00.97 mysqld
30607 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:00.00 mysqld
30608 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:00.00 mysqld
30609 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:00.00 mysqld
 7195 mysql     20   0  746308 307432   8872 S  0.0  8.0   1:34.78 mysqld
10106 mysql     20   0  746308 307432   8872 R  0.0  8.0   0:11.63 mysqld
10108 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:11.58 mysqld
10854 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:06.62 mysqld
10855 mysql     20   0  746308 307432   8872 R  0.0  8.0   0:06.69 mysqld
10856 mysql     20   0  746308 307432   8872 R  0.0  8.0   0:06.64 mysqld
10858 mysql     20   0  746308 307432   8872 R  0.0  8.0   0:06.64 mysqld
10859 mysql     20   0  746308 307432   8872 S  0.0  8.0   0:06.70 mysqld
10975 mysql     20   0  746308 307432   8872 R  0.0  8.0   0:01.94 mysqld
10976 mysql     20   0  746308 307432   8872 R  0.0  8.0   0:01.96 mysqld
10979 mysql     20   0  746308 307432   8872 R  0.0  8.0   0:01.91 mysqld
10980 mysql     20   0  746308 307432   8872 D  0.0  8.0   0:01.94 mysqld
openxs@ao756:~$

In the command above I asked to run top in the batch mode and capture the output once, sort it by highest CPU usage first. Configuring columns in the output (from a relatively small subset of what is visible via /proc) is possible the interactive mode and final set can be saved in the ~/.toprc file. Usually I do not care to do that.

Multi-threaded mooring of boats somewhere at Regent's Canal

* * *

So, for quick checks and summarizing different details about MySQL server threads from /proc you can use not only some poor man's monitoring shell scripts, but also ps -L and top -H commands with proper options and then process the output with usual Linux command line utilities like grep, sort and uniq. One can surely write more advanced tools working directly with /proc content, in C, Python etc if not just shell, and in the next post in this series we'll see some of them from https://0x.tools/ applied to MySQL. 

Also performance_schema.threads table in MySQL 5.7+ (and MariaDB 10.5+) allows to easily find out the role of each OS thread in the MySQL server and link the OS level and internal metrics for it.

No comments:

Post a Comment