Tuesday, June 25, 2019

Getting past 2400 threads with sysbench and dbdeployer for benchmarking MySQL 5.7.26

I am currently running tests with sysbench and dbdeployer on a Google Cloud Platform Debian 9 instance.  I ran into an interesting limit and lifting it was not straightforward.  I hope that by sharing this, you can avoid losing too much time.

The benchmark I am using is the insert benchmark.  I am able to run it with 2048 threads but I am not able to run it with 4096 threads (and yes, it makes sense to run this benchmark with this number of threads in the situation I am testing).  Narrowing down the problem, I am able to run it with 2400 threads but not with 2500 threads.  I get this error message when running sysbench:
$ sysbench --mysql-host=127.0.0.1 --mysql-port=5726 \
  /usr/share/sysbench/oltp_insert.lua --threads=2500 --time=1 run
sysbench 1.0.17 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2500
Initializing random number generator from current time


Initializing worker threads...

FATAL: sb_thread_create() for thread #2446 failed. errno = 11 (Resource temporarily unavailable)
FATAL: unable to connect to MySQL server on host '127.0.0.1', port 5726, aborting...
FATAL: error 1135: Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug
FATAL: `thread_init' function failed: /usr/share/sysbench/oltp_common.lua:349: connection creation failed
FATAL: unable to connect to MySQL server on host '127.0.0.1', port 5726, aborting...
FATAL: error 1135: Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug
FATAL: `thread_init' function failed: /usr/share/sysbench/oltp_common.lua:349: connection creation failed
FATAL: unable to connect to MySQL server on host '127.0.0.1', port 5726, aborting...
FATAL: error 1135: Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug
FATAL: `thread_init' function failed: /usr/share/sysbench/oltp_common.lua:349: connection creation failed
Segmentation fault
It is not a max_connections problem as I created my sandbox with the following command:
$ dbdeployer deploy single mysql_5.7.26 -c 'max_connections = 10000'
And a max_connection problem would show as follows (the default is 151):
$ sysbench [...] --threads=152 run
[...]
Initializing worker threads...

FATAL: unable to connect to MySQL server on host '127.0.0.1', port 5726, aborting...
FATAL: error 1040: Too many connections
FATAL: `thread_init' function failed: /usr/share/sysbench/oltp_common.lua:349: connection creation failed
FATAL: Thread initialization failed!
Error in my_thread_global_end(): 152 threads didn't exit
It is not a ulimit -n problem (open files  search the bash man page for the description) as I set it to 10000 (the default on Debian 9 GCP image is 1024):
$ ulimit -n 10000
It is not a threads-max problem (search the proc man page for the description) as it is set to 59697:
$ cat /proc/sys/kernel/threads-max 
59697
Ok, all the obvious things are checked but let's go back to the basics and let's check the sysbench error message and the mysqld error logs.  We have those lines in the sysbench error message:
FATAL: sb_thread_create() for thread #2446 failed. errno = 11 (Resource temporarily unavailable)
FATAL: unable to connect to MySQL server on host '127.0.0.1', port 5726, aborting...
FATAL: error 1135: Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug
FATAL: `thread_init' function failed: /usr/share/sysbench/oltp_common.lua:349: connection creation failed
And this in the mysqld error logs:
2019-06-22T17:15:56.533664Z 0 [ERROR] Can't create thread to handle new connection(errno= 11)
At this point, I am starting to doubt that this is a MySQL limit as in above, we have sysbench telling us it is not able to complete sb_thread_create.  I am becoming suspicious of a Linux limit.  To get MySQL out of the loop, I ran sysbench CPU tests. Those two work:
$ sysbench --test=cpu --threads=2500 --time=1 run
[...]
$ sysbench --test=cpu --threads=4900 --time=1 run
[...]
But this one does not work:
$ sysbench --test=cpu --threads=5000 --time=1 run
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.17 (using bundled LuaJIT 2.1.0-beta2)
[...]
Initializing worker threads...

FATAL: sb_thread_create() for thread #4907 failed. errno = 11 (Resource temporarily unavailable)
And this is where I lost tons of time, until I checked at the right place, and this was /var/log/messages:
Jun 22 17:15:56 my_hostname kernel: [  325.584026] cgroup: fork rejected by pids controller in /system.slice/ssh.service
I did not know systemd was running services in cgroups !  And obviously there are limits enforced here.  We can see the limit by running systemctl status:
$ sudo systemctl status ssh
● ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2019-06-22 17:10:36 UTC; 20s ago
  Process: 719 ExecStartPre=/usr/sbin/sshd -t (code=exited, status=0/SUCCESS)
 Main PID: 722 (sshd)
    Tasks: 9 (limit: 4915)
   CGroup: /system.slice/ssh.service
           ├─722 /usr/sbin/sshd -D
           ├─788 sshd: my_user [priv]
           ├─794 sshd: my_user@pts/0
           ├─795 screen -Rd
           ├─796 SCREEN -Rd
           ├─797 /bin/bash
           ├─800 sudo systemctl status ssh
           ├─801 systemctl status ssh
           └─802 systemctl status ssh

Jun 22 17:10:36 my_hostname systemd[1]: Starting OpenBSD Secure Shell server...
Jun 22 17:10:36 my_hostname sshd[722]: Server listening on 0.0.0.0 port 22.
Jun 22 17:10:36 my_hostname sshd[722]: Server listening on :: port 22.
Jun 22 17:10:36 my_hostname systemd[1]: Started OpenBSD Secure Shell server.
Jun 22 17:10:50 my_hostname sshd[788]: Accepted publickey for my_user from my_ip port my_port ssh2: RSA SHA256:my_key_hash
Jun 22 17:10:50 my_hostname sshd[788]: pam_unix(sshd:session): session opened for user my_user by (uid=0)
Jun 22 17:10:56 my_hostname sudo[800]:   my_user : TTY=pts/1 ; PWD=/home/my_user ; USER=root ; COMMAND=/bin/systemctl status ssh
Jun 22 17:10:56 my_hostname sudo[800]: pam_unix(sudo:session): session opened for user root by my_user(uid=0)
The solution is to override the default by adding below in systemctl edit ssh, and I rebooted to take the new value into account  (I am sure there is a way to modify this without a reboot, but I did not look for it as a reboot is easy in my case):
[Service]
TasksMax=10000
And now I can run those without problem:
$ sysbench --test=cpu --threads=5000 --time=1 run
[...]
$ sysbench --test=cpu --threads=9000 --time=1 run
[...]
And obviously this does not work (because I set TasksMax to only 10000):
$ sysbench --test=cpu --threads=10000 --time=1 run
[...]
Initializing worker threads...

FATAL: sb_thread_create() for thread #9993 failed. errno = 11 (Resource temporarily unavailable)
So back to the sysbench error messages and the mysqld error logs, sometimes it was MySQL that was not able to create a new thread to accept a connection from an existing sysbench thread, sometimes it was sysbench that was not able to create a new client thread.

I hope this was useful.  The benchmark results are coming soon.

No comments:

Post a Comment