Getting core files and systemd Restart

So you have waited two weeks (cause the crash isn’t easily repeatable) and finally you get the crash again. You check your non-datadir core file directory with loads of free space and discover nothing was written.

When MySQL crashes, you want it to produce at a minimum a stack trace into the error log or a core dump file so you can use gdb to produce the stack trace. The stack trace is an important diagnosis tool to determining what part of the code was running.

One of the goals of this blog is to spread information that will make your job supporting databases easier. This is one of those posts.

So why do I mention systemd?

The Problem:

systemd can restart a service if it fails, so if MySQL crashes, systemd will automatically restart mysql, sometimes preventing the core dump process from completing.
If you are getting the above situation, i.e. everything is already setup to dump a core file and you get nothing and you double-checked the settings, you may need to disable automatic restarts in systemd.

Risks:

The risks of turning off the automatic restart is your db is down until you manually restart it, so make sure you are monitoring your db appropriately.

Setup to dump a core file:

How what needs to be setup to get a core file? Here are some useful articles.

https://www.percona.com/blog/2011/08/26/getting-mysql-core-file-on-linux/

https://mariadb.com/kb/en/enabling-core-dumps/

https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-in-core-file.html

https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/

The Change:

Make sure every other setting to dump a core file is correct before making this change.

In the conf file under your service eg: /etc/systemd/system/mysql.service.d

# we need the damn core file
Restart=no

Like any changes, follow the gold standard for changes on production systems…

How to apply the change:

  1. Email to stakeholders (developers, reporting users, managers)
  2. Prepare a maintenance plan and gain approval to change.
  3. Organize maintenance window with specific time and date and gain approval for change.
  4. Follow the maintenance plan, which will be something like this
    – Announce start of maintenance window in your chat/communication channel.
    –  apply the change to systemd and reload the daemon.
    – Announce the end of the maintenance window in your chat/communication channel.
  5. Wait for your next mysql crash.

Until next time.

Credit for this tip must go to Rick Pizzi.