Streaming backups in parallel using pssh

So you need to build a new set of databases, perhaps in a new location or geographical zone and the business wants it done yesterday cause the newly launched product hit the front page of hacker news and your website and back-end databases are getting the hug-of-death.

The standard method would be to stream a backup from the master or a dedicated backup slave and build each new read only slave from that backup.

What if you have network bandwidth to burn? perhaps there is a way to run the streaming backups in parallel. Ivan G came up with one way to do it. See his article “Streaming Percona XtraBackup for MySQL to Multiple Destinations”

His article actually gave me the spark of an idea to determine if we could do something similar in parallel.

Enter pssh (parallel-ssh) which was hosted on google code but now is available in most linux distros.

pssh runs ssh commands on a list of hosts in parallel.

Update:

pssh uses python and os.fork to create the parallel processes.

This means that the data copied/streamed is streamed in one chunk. So if the data is greater than available RAM (x3 if compression is used) then the command will fail with “OSError: [Errno 12] Cannot allocate memory”.

Memory << Data workaround:

  1. Use gzip or pigz to compress the stream of data. This will allow the stream to be up to 2-3 times the size of available RAM.
  2. Use split to split stream into chunks and then combine those chunks on the destination. See example below.
  3. See my next post for a different method.

Here is an simple pssh example:

[root@db1 ~]# pssh -i --host='192.168.56.111 192.168.56.112 192.168.56.113' "hostname"
[1] 01:04:40 [SUCCESS] 192.168.56.112
db2
[2] 01:04:40 [SUCCESS] 192.168.56.111
db1
[3] 01:04:40 [SUCCESS] 192.168.56.113
db3

So to stream a backup from one db to another you run something like this:

mariabackup --backup --slave-info --tmpdir=/tmp --stream=xbstream --parallel=4 --datadir=/var/lib/mysql 2>backup.log | ssh -q 192.168.56.112 -t "mbstream --directory=/var/lib/mysql -x --parallel=4"

Example of a streaming backup to multiple hosts using pssh:

Note: The critical option is -I which allows pssh to take input from standard input.

mariabackup --backup --slave-info --tmpdir=/tmp --stream=xbstream --parallel=4 --datadir=/var/lib/mysql 2>backup.log | pssh -I --host='192.168.56.112 192.168.56.113' "mbstream --directory=/var/lib/mysql -x --parallel=4"

Preparing the backup on multiple hosts:

pssh -i --host='192.168.56.112 192.168.56.113' "mariabackup --prepare --target-dir=/var/lib/mysql"

Change ownership of multiple hosts:

pssh -i --host='192.168.56.112 192.168.56.113' "chown -R mysql:mysql /var/lib/mysql"

Start the db on multiple hosts:

pssh -i --host='192.168.56.112 192.168.56.113' "systemctl start mariadb.service"

Thanks to Ivan for the idea. Most people would use ansible or something similar to configure servers now over plain ssh command line. But thanks to pssh that option exists as well.

Until next time.

Full example (assumes small dataset):

[root@db1 ~]# mariabackup --backup --slave-info --tmpdir=/tmp --stream=xbstream --parallel=4 --datadir=/var/lib/mysql 2>backup.log | pssh -I --host='192.168.56.112 192.168.56.113' "mbstream --directory=/var/lib/mysql -x --parallel=4"
[1] 04:43:16 [SUCCESS] 192.168.56.112
[2] 04:43:16 [SUCCESS] 192.168.56.113


[root@db1 ~]# pssh -i --host='192.168.56.112 192.168.56.113' "ls -l /var/lib/mysql"
[1] 04:43:23 [SUCCESS] 192.168.56.112
total 12352
-rw-r-----. 1 root root 32768 Aug 26 04:43 aria_log.00000001
-rw-r-----. 1 root root 52 Aug 26 04:43 aria_log_control
-rw-r-----. 1 root root 324 Aug 26 04:43 backup-my.cnf
-rw-r-----. 1 root root 972 Aug 26 04:43 ib_buffer_pool
-rw-r-----. 1 root root 12582912 Aug 26 04:43 ibdata1
-rw-r-----. 1 root root 2560 Aug 26 04:43 ib_logfile0
drwx------. 2 root root 4096 Aug 26 04:43 mysql
drwx------. 2 root root 20 Aug 26 04:43 performance_schema
drwx------. 2 root root 20 Aug 26 04:43 test
-rw-r-----. 1 root root 29 Aug 26 04:43 xtrabackup_binlog_info
-rw-r-----. 1 root root 73 Aug 26 04:43 xtrabackup_checkpoints
-rw-r-----. 1 root root 554 Aug 26 04:43 xtrabackup_info
[2] 04:43:23 [SUCCESS] 192.168.56.113
total 12352
-rw-r-----. 1 root root 32768 Aug 26 04:43 aria_log.00000001
-rw-r-----. 1 root root 52 Aug 26 04:43 aria_log_control
-rw-r-----. 1 root root 324 Aug 26 04:43 backup-my.cnf
-rw-r-----. 1 root root 972 Aug 26 04:43 ib_buffer_pool
-rw-r-----. 1 root root 12582912 Aug 26 04:43 ibdata1
-rw-r-----. 1 root root 2560 Aug 26 04:43 ib_logfile0
drwx------. 2 root root 4096 Aug 26 04:43 mysql
drwx------. 2 root root 20 Aug 26 04:43 performance_schema
drwx------. 2 root root 20 Aug 26 04:43 test
-rw-r-----. 1 root root 29 Aug 26 04:43 xtrabackup_binlog_info
-rw-r-----. 1 root root 73 Aug 26 04:43 xtrabackup_checkpoints
-rw-r-----. 1 root root 554 Aug 26 04:43 xtrabackup_info

-- prepare

[root@db1 ~]# pssh -i --host='192.168.56.112 192.168.56.113' "mariabackup --prepare --target-dir=/var/lib/mysql"
[1] 04:43:56 [SUCCESS] 192.168.56.112
Stderr: mariabackup based on MariaDB server 10.4.14-MariaDB Linux (x86_64)
[00] 2020-08-26 04:43:55 cd to /var/lib/mysql/
[00] 2020-08-26 04:43:55 This target seems to be not prepared yet.
[00] 2020-08-26 04:43:55 mariabackup: using the following InnoDB configuration for recovery:
[00] 2020-08-26 04:43:55 innodb_data_home_dir = .
[00] 2020-08-26 04:43:55 innodb_data_file_path = ibdata1:12M:autoextend
[00] 2020-08-26 04:43:55 innodb_log_group_home_dir = .
[00] 2020-08-26 04:43:55 InnoDB: Using Linux native AIO
[00] 2020-08-26 04:43:55 Starting InnoDB instance for recovery.
[00] 2020-08-26 04:43:55 mariabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
2020-08-26 4:43:55 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-08-26 4:43:55 0 [Note] InnoDB: Uses event mutexes
2020-08-26 4:43:55 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2020-08-26 4:43:55 0 [Note] InnoDB: Number of pools: 1
2020-08-26 4:43:55 0 [Note] InnoDB: Using SSE2 crc32 instructions
2020-08-26 4:43:55 0 [Note] InnoDB: Initializing buffer pool, total size = 100M, instances = 1, chunk size = 100M
2020-08-26 4:43:55 0 [Note] InnoDB: Completed initialization of buffer pool
2020-08-26 4:43:55 0 [Note] InnoDB: page_cleaner coordinator priority: -20
2020-08-26 4:43:55 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=60967
[00] 2020-08-26 04:43:55 Last binlog file , position 0
[00] 2020-08-26 04:43:56 completed OK!
[2] 04:43:56 [SUCCESS] 192.168.56.113
Stderr: mariabackup based on MariaDB server 10.4.14-MariaDB Linux (x86_64)
[00] 2020-08-26 04:43:55 cd to /var/lib/mysql/
[00] 2020-08-26 04:43:55 This target seems to be not prepared yet.
[00] 2020-08-26 04:43:55 mariabackup: using the following InnoDB configuration for recovery:
[00] 2020-08-26 04:43:55 innodb_data_home_dir = .
[00] 2020-08-26 04:43:55 innodb_data_file_path = ibdata1:12M:autoextend
[00] 2020-08-26 04:43:55 innodb_log_group_home_dir = .
[00] 2020-08-26 04:43:55 InnoDB: Using Linux native AIO
[00] 2020-08-26 04:43:55 Starting InnoDB instance for recovery.
[00] 2020-08-26 04:43:55 mariabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
2020-08-26 4:43:55 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-08-26 4:43:55 0 [Note] InnoDB: Uses event mutexes
2020-08-26 4:43:55 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2020-08-26 4:43:55 0 [Note] InnoDB: Number of pools: 1
2020-08-26 4:43:55 0 [Note] InnoDB: Using SSE2 crc32 instructions
2020-08-26 4:43:55 0 [Note] InnoDB: Initializing buffer pool, total size = 100M, instances = 1, chunk size = 100M
2020-08-26 4:43:55 0 [Note] InnoDB: Completed initialization of buffer pool
2020-08-26 4:43:55 0 [Note] InnoDB: page_cleaner coordinator priority: -20
2020-08-26 4:43:55 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=60967
[00] 2020-08-26 04:43:55 Last binlog file , position 0
[00] 2020-08-26 04:43:56 completed OK!

-- change ownership and start

[root@db1 ~]# pssh -i --host='192.168.56.112 192.168.56.113' "chown -R mysql:mysql /var/lib/mysql"
[1] 04:44:25 [SUCCESS] 192.168.56.112
[2] 04:44:25 [SUCCESS] 192.168.56.113

[root@db1 ~]# pssh -i --host='192.168.56.112 192.168.56.113' "systemctl start mariadb.service"
[1] 04:44:34 [SUCCESS] 192.168.56.112
[2] 04:44:34 [SUCCESS] 192.168.56.113

-- Bonus: Generate change master from xtrabackup_binlog_info file
-- Change the host, user and password.
-- \x27 is the quote character

awk '{ print "change master to master_host=\x27192.168.56.111\x27,master_user=\x27repl\x27,master_password=\x27lper\x27,master_log_file=\x27"$1"\x27,m
aster_log_pos="$2 }' xtrabackup_binlog_info
change master to master_host=.168.56.111',master_user='repl',master_password='lper',master_log_file='mysql-bin.000001',master_log_pos=2194

Full Example using split to get around memory issue

[root@db1 test_db]# du -sh /var/lib/mysql
1.7G /var/lib/mysql

[root@db1 test_db]# mariabackup --backup --slave-info --tmpdir=/tmp --stream=xbstream --parallel=4 --datadir=/var/lib/mysql 2>backup.log | pssh -I --host='192.168.56.112 192.168.56.113' "mbstream --directory=/var/lib/mysql -x --parallel=4"
Traceback (most recent call last):
File "/bin/pssh", line 118, in <module>
do_pssh(hosts, cmdline, opts)
File "/bin/pssh", line 89, in do_pssh
statuses = manager.run()
File "/usr/lib/python2.7/site-packages/psshlib/manager.py", line 68, in run
self.update_tasks(writer)
File "/usr/lib/python2.7/site-packages/psshlib/manager.py", line 133, in update_tasks
self._start_tasks_once(writer)
File "/usr/lib/python2.7/site-packages/psshlib/manager.py", line 146, in _start_tasks_once
task.start(self.taskcount, self.iomap, writer, self.askpass_socket)
File "/usr/lib/python2.7/site-packages/psshlib/task.py", line 99, in start
close_fds=False, preexec_fn=os.setsid, env=environ)
File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1224, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

-- Split in 100Mbyte chunks

[root@db1 test_db]# mariabackup --backup --slave-info --tmpdir=/tmp --stream=xbstream --parallel=4 --datadir=/var/lib/mysql 2>backup.log | split -b 100M --filter="exec cat $FILE |pssh -I --host='192.168.56.112 192.168.56.113' 'cat $FILE >> /var/lib/mysql/streamfile ' "
[1] 04:19:28 [SUCCESS] 192.168.56.112
[2] 04:19:28 [SUCCESS] 192.168.56.113
[1] 04:19:30 [SUCCESS] 192.168.56.113
[2] 04:19:30 [SUCCESS] 192.168.56.112
[1] 04:19:33 [SUCCESS] 192.168.56.112
[2] 04:19:33 [SUCCESS] 192.168.56.113
[1] 04:19:35 [SUCCESS] 192.168.56.112
[2] 04:19:35 [SUCCESS] 192.168.56.113
[1] 04:19:36 [SUCCESS] 192.168.56.113
[2] 04:19:37 [SUCCESS] 192.168.56.112
[1] 04:19:39 [SUCCESS] 192.168.56.113
[2] 04:19:39 [SUCCESS] 192.168.56.112
[1] 04:19:41 [SUCCESS] 192.168.56.112
[2] 04:19:41 [SUCCESS] 192.168.56.113
[1] 04:19:43 [SUCCESS] 192.168.56.112
[2] 04:19:43 [SUCCESS] 192.168.56.113
[1] 04:19:45 [SUCCESS] 192.168.56.113
[2] 04:19:45 [SUCCESS] 192.168.56.112
[1] 04:19:47 [SUCCESS] 192.168.56.112
[2] 04:19:47 [SUCCESS] 192.168.56.113
[1] 04:19:50 [SUCCESS] 192.168.56.112
[2] 04:19:50 [SUCCESS] 192.168.56.113
[1] 04:19:53 [SUCCESS] 192.168.56.112
[2] 04:19:53 [SUCCESS] 192.168.56.113

[root@db1 test_db]# pssh -i --host='192.168.56.112 192.168.56.113' "ls -l /var/lib/mysql/"
[1] 04:20:02 [SUCCESS] 192.168.56.112
total 1221516
-rw-r--r--. 1 root root 1250829999 Aug 31 04:19 streamfile
[2] 04:20:02 [SUCCESS] 192.168.56.113
total 1221516
-rw-r--r--. 1 root root 1250829999 Aug 31 04:19 streamfile

-- Run the streamfile thru mbstream to produce the un-prepared datafiles

[root@db1 test_db]# pssh -i --host='192.168.56.112 192.168.56.113' "cat /var/lib/mysql/streamfile | mbstream --directory=/var/lib/mysql -x --parallel=4"
[1] 04:23:13 [SUCCESS] 192.168.56.112
[2] 04:23:13 [SUCCESS] 192.168.56.113
[root@db1 test_db]# pssh -i --host='192.168.56.112 192.168.56.113' "ls -l /var/lib/mysql/"
[1] 04:23:35 [SUCCESS] 192.168.56.112
total 1496016
-rw-r-----. 1 root root 32768 Aug 31 04:23 aria_log.00000001
-rw-r-----. 1 root root 52 Aug 31 04:23 aria_log_control
-rw-r-----. 1 root root 324 Aug 31 04:23 backup-my.cnf
drwx------. 2 root root 4096 Aug 31 04:23 employees
-rw-r-----. 1 root root 976 Aug 31 04:23 ib_buffer_pool
-rw-r-----. 1 root root 281018368 Aug 31 04:23 ibdata1
-rw-r-----. 1 root root 2560 Aug 31 04:23 ib_logfile0
drwx------. 2 root root 4096 Aug 31 04:23 mysql
drwx------. 2 root root 20 Aug 31 04:23 performance_schema
-rw-r--r--. 1 root root 1250829999 Aug 31 04:19 streamfile
drwx------. 2 root root 20 Aug 31 04:23 test
-rw-r-----. 1 root root 35 Aug 31 04:23 xtrabackup_binlog_info
-rw-r-----. 1 root root 83 Aug 31 04:23 xtrabackup_checkpoints
-rw-r-----. 1 root root 565 Aug 31 04:23 xtrabackup_info
[2] 04:23:35 [SUCCESS] 192.168.56.113
total 1496016
-rw-r-----. 1 root root 32768 Aug 31 04:23 aria_log.00000001
-rw-r-----. 1 root root 52 Aug 31 04:23 aria_log_control
-rw-r-----. 1 root root 324 Aug 31 04:23 backup-my.cnf
drwx------. 2 root root 4096 Aug 31 04:23 employees
-rw-r-----. 1 root root 976 Aug 31 04:23 ib_buffer_pool
-rw-r-----. 1 root root 281018368 Aug 31 04:23 ibdata1
-rw-r-----. 1 root root 2560 Aug 31 04:23 ib_logfile0
drwx------. 2 root root 4096 Aug 31 04:23 mysql
drwx------. 2 root root 20 Aug 31 04:23 performance_schema
-rw-r--r--. 1 root root 1250829999 Aug 31 04:19 streamfile
drwx------. 2 root root 20 Aug 31 04:23 test
-rw-r-----. 1 root root 35 Aug 31 04:23 xtrabackup_binlog_info
-rw-r-----. 1 root root 83 Aug 31 04:23 xtrabackup_checkpoints
-rw-r-----. 1 root root 565 Aug 31 04:23 xtrabackup_info

-- Prepare the backup

[root@db1 test_db]# pssh -i --host='192.168.56.112 192.168.56.113' "mariabackup --prepare --target-dir=/var/lib/mysql"
[1] 04:24:14 [SUCCESS] 192.168.56.112
Stderr: mariabackup based on MariaDB server 10.4.14-MariaDB Linux (x86_64)
[00] 2020-08-31 04:24:14 cd to /var/lib/mysql/
[00] 2020-08-31 04:24:14 This target seems to be not prepared yet.
[00] 2020-08-31 04:24:14 mariabackup: using the following InnoDB configuration for recovery:
[00] 2020-08-31 04:24:14 innodb_data_home_dir = .
[00] 2020-08-31 04:24:14 innodb_data_file_path = ibdata1:12M:autoextend
[00] 2020-08-31 04:24:14 innodb_log_group_home_dir = .
[00] 2020-08-31 04:24:14 InnoDB: Using Linux native AIO
[00] 2020-08-31 04:24:14 Starting InnoDB instance for recovery.
[00] 2020-08-31 04:24:14 mariabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
2020-08-31 4:24:14 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-08-31 4:24:14 0 [Note] InnoDB: Uses event mutexes
2020-08-31 4:24:14 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2020-08-31 4:24:14 0 [Note] InnoDB: Number of pools: 1
2020-08-31 4:24:14 0 [Note] InnoDB: Using SSE2 crc32 instructions
2020-08-31 4:24:14 0 [Note] InnoDB: Initializing buffer pool, total size = 100M, instances = 1, chunk size = 100M
2020-08-31 4:24:14 0 [Note] InnoDB: Completed initialization of buffer pool
2020-08-31 4:24:14 0 [Note] InnoDB: page_cleaner coordinator priority: -20
2020-08-31 4:24:14 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2008409552
2020-08-31 4:24:14 0 [Note] InnoDB: Last binlog file './mysql-bin.000003', position 172310127
[00] 2020-08-31 04:24:14 Last binlog file ./mysql-bin.000003, position 172310127
[00] 2020-08-31 04:24:14 completed OK!
[2] 04:24:14 [SUCCESS] 192.168.56.113
Stderr: mariabackup based on MariaDB server 10.4.14-MariaDB Linux (x86_64)
[00] 2020-08-31 04:24:14 cd to /var/lib/mysql/
[00] 2020-08-31 04:24:14 This target seems to be not prepared yet.
[00] 2020-08-31 04:24:14 mariabackup: using the following InnoDB configuration for recovery:
[00] 2020-08-31 04:24:14 innodb_data_home_dir = .
[00] 2020-08-31 04:24:14 innodb_data_file_path = ibdata1:12M:autoextend
[00] 2020-08-31 04:24:14 innodb_log_group_home_dir = .
[00] 2020-08-31 04:24:14 InnoDB: Using Linux native AIO
[00] 2020-08-31 04:24:14 Starting InnoDB instance for recovery.
[00] 2020-08-31 04:24:14 mariabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
2020-08-31 4:24:14 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-08-31 4:24:14 0 [Note] InnoDB: Uses event mutexes
2020-08-31 4:24:14 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2020-08-31 4:24:14 0 [Note] InnoDB: Number of pools: 1
2020-08-31 4:24:14 0 [Note] InnoDB: Using SSE2 crc32 instructions
2020-08-31 4:24:14 0 [Note] InnoDB: Initializing buffer pool, total size = 100M, instances = 1, chunk size = 100M
2020-08-31 4:24:14 0 [Note] InnoDB: Completed initialization of buffer pool
2020-08-31 4:24:14 0 [Note] InnoDB: page_cleaner coordinator priority: -20
2020-08-31 4:24:14 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2008409552
2020-08-31 4:24:14 0 [Note] InnoDB: Last binlog file './mysql-bin.000003', position 172310127
[00] 2020-08-31 04:24:14 Last binlog file ./mysql-bin.000003, position 172310127
[00] 2020-08-31 04:24:14 completed OK!

[root@db1 test_db]# pssh -i --host='192.168.56.112 192.168.56.113' "chown -R mysql:mysql /var/lib/mysql"
[1] 04:25:08 [SUCCESS] 192.168.56.112
[2] 04:25:08 [SUCCESS] 192.168.56.113
[root@db1 test_db]# pssh -i --host='192.168.56.112 192.168.56.113' "ls -l /var/lib/mysql/"
[1] 04:25:21 [SUCCESS] 192.168.56.112
total 1496012
-rw-r-----. 1 mysql mysql 32768 Aug 31 04:23 aria_log.00000001
-rw-r-----. 1 mysql mysql 52 Aug 31 04:23 aria_log_control
-rw-r-----. 1 mysql mysql 324 Aug 31 04:23 backup-my.cnf
drwx------. 2 mysql mysql 4096 Aug 31 04:23 employees
-rw-r-----. 1 mysql mysql 976 Aug 31 04:23 ib_buffer_pool
-rw-r-----. 1 mysql mysql 281018368 Aug 31 04:24 ibdata1
-rw-r-----. 1 mysql mysql 0 Aug 31 04:24 ib_logfile0
drwx------. 2 mysql mysql 4096 Aug 31 04:23 mysql
drwx------. 2 mysql mysql 20 Aug 31 04:23 performance_schema
-rw-r--r--. 1 mysql mysql 1250829999 Aug 31 04:19 streamfile
drwx------. 2 mysql mysql 20 Aug 31 04:23 test
-rw-r-----. 1 mysql mysql 35 Aug 31 04:23 xtrabackup_binlog_info
-rw-r-----. 1 mysql mysql 81 Aug 31 04:24 xtrabackup_checkpoints
-rw-r-----. 1 mysql mysql 565 Aug 31 04:23 xtrabackup_info
[2] 04:25:21 [SUCCESS] 192.168.56.113
total 1496012
-rw-r-----. 1 mysql mysql 32768 Aug 31 04:23 aria_log.00000001
-rw-r-----. 1 mysql mysql 52 Aug 31 04:23 aria_log_control
-rw-r-----. 1 mysql mysql 324 Aug 31 04:23 backup-my.cnf
drwx------. 2 mysql mysql 4096 Aug 31 04:23 employees
-rw-r-----. 1 mysql mysql 976 Aug 31 04:23 ib_buffer_pool
-rw-r-----. 1 mysql mysql 281018368 Aug 31 04:24 ibdata1
-rw-r-----. 1 mysql mysql 0 Aug 31 04:24 ib_logfile0
drwx------. 2 mysql mysql 4096 Aug 31 04:23 mysql
drwx------. 2 mysql mysql 20 Aug 31 04:23 performance_schema
-rw-r--r--. 1 mysql mysql 1250829999 Aug 31 04:19 streamfile
drwx------. 2 mysql mysql 20 Aug 31 04:23 test
-rw-r-----. 1 mysql mysql 35 Aug 31 04:23 xtrabackup_binlog_info
-rw-r-----. 1 mysql mysql 81 Aug 31 04:24 xtrabackup_checkpoints
-rw-r-----. 1 mysql mysql 565 Aug 31 04:23 xtrabackup_info

[root@db1 test_db]# pssh -i --host='192.168.56.112 192.168.56.113' "systemctl start mariadb"
[1] 04:26:59 [SUCCESS] 192.168.56.112
[2] 04:26:59 [SUCCESS] 192.168.56.113
[root@db1 test_db]# pssh -i --host='192.168.56.112 192.168.56.113' "systemctl status mariadb"
[1] 04:27:05 [SUCCESS] 192.168.56.113
● mariadb.service - MariaDB 10.4.14 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: active (running) since Mon 2020-08-31 04:26:59 UTC; 6s ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 4072 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 4016 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 4014 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 4039 (mysqld)
Status: "Taking your SQL requests now..."
CGroup: /system.slice/mariadb.service
└─4039 /usr/sbin/mysqld

Aug 31 04:26:59 db3 systemd[1]: Starting MariaDB 10.4.14 database server...
Aug 31 04:26:59 db3 mysqld[4039]: 2020-08-31 4:26:59 0 [Note] /usr/sbin/mysqld (mysqld 10.4.14-MariaDB-log) starting as process 4039 ...
Aug 31 04:26:59 db3 mysqld[4039]: 2020-08-31 4:26:59 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Aug 31 04:26:59 db3 systemd[1]: Started MariaDB 10.4.14 database server.
[2] 04:27:05 [SUCCESS] 192.168.56.112
● mariadb.service - MariaDB 10.4.14 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: active (running) since Mon 2020-08-31 04:26:59 UTC; 6s ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 4026 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 3970 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 3968 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 3993 (mysqld)
Status: "Taking your SQL requests now..."
CGroup: /system.slice/mariadb.service
└─3993 /usr/sbin/mysqld

Aug 31 04:26:59 db2 systemd[1]: Starting MariaDB 10.4.14 database server...
Aug 31 04:26:59 db2 mysqld[3993]: 2020-08-31 4:26:59 0 [Note] /usr/sbin/mysqld (mysqld 10.4.14-MariaDB-log) starting as process 3993 ...
Aug 31 04:26:59 db2 mysqld[3993]: 2020-08-31 4:26:59 0 [Warning] Could not increase number of max_open_files to more than 16384 (request: 32183)
Aug 31 04:26:59 db2 systemd[1]: Started MariaDB 10.4.14 database server.

Getting core files and systemd Restart

So you have waited two weeks (cause the crash isn’t easily repeatable) and finally you get the crash again. You check your non-datadir core file directory with loads of free space and discover nothing was written.

When MySQL crashes, you want it to produce at a minimum a stack trace into the error log or a core dump file so you can use gdb to produce the stack trace. The stack trace is an important diagnosis tool to determining what part of the code was running.

One of the goals of this blog is to spread information that will make your job supporting databases easier. This is one of those posts.

So why do I mention systemd?

The Problem:

systemd can restart a service if it fails, so if MySQL crashes, systemd will automatically restart mysql, sometimes preventing the core dump process from completing.
If you are getting the above situation, i.e. everything is already setup to dump a core file and you get nothing and you double-checked the settings, you may need to disable automatic restarts in systemd.

Risks:

The risks of turning off the automatic restart is your db is down until you manually restart it, so make sure you are monitoring your db appropriately.

Setup to dump a core file:

How what needs to be setup to get a core file? Here are some useful articles.

https://www.percona.com/blog/2011/08/26/getting-mysql-core-file-on-linux/

https://mariadb.com/kb/en/enabling-core-dumps/

https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-in-core-file.html

https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/

The Change:

Make sure every other setting to dump a core file is correct before making this change.

In the conf file under your service eg: /etc/systemd/system/mysql.service.d

# we need the damn core file
Restart=no

Like any changes, follow the gold standard for changes on production systems…

How to apply the change:

  1. Email to stakeholders (developers, reporting users, managers)
  2. Prepare a maintenance plan and gain approval to change.
  3. Organize maintenance window with specific time and date and gain approval for change.
  4. Follow the maintenance plan, which will be something like this
    – Announce start of maintenance window in your chat/communication channel.
    –  apply the change to systemd and reload the daemon.
    – Announce the end of the maintenance window in your chat/communication channel.
  5. Wait for your next mysql crash.

Until next time.

Credit for this tip must go to Rick Pizzi.