Faster streaming backups using mariabackup and pigz

Are you looking for a faster way to stream a backup from one db server to another?

Maybe you are rebuilding a replica from the master after someone wrote to the slave and now the data is out of sync.

I had a look at what was out there in terms of articles about streaming backups.

Most of them got part of the way but clearly haven’t had people asking in chat “are we there yet?” over and over, wondering when their replicas will be back so they can direct the read traffic at them.
The key here is to use the parallel options that come with mariabackup and use pigz to compress and decompress in parallel.

Do you want speed?
Do you want to saturate the 10Gigabit bonded link or InfiniBand (I wish…) ?
Do you have spare CPUs to use?

WARNING:

  • This command assumes the destination directory is empty. 
  • The command will 8 CPUs (4 for mariabackup and 4 for pigz) on the source
  • The command can use up to 8 CPUs (4 for pigz to decompress, 4 for mbstream)
  • Don’t run pigz without -p option. As by default it will use all CPUs available.

The command:

mariabackup --backup --slave-info --tmpdir=/tmp --stream=xbstream --parallel=4 --datadir=/var/lib/mysql 2>backup.log | pigz -p 4 | ssh -q replica -t "pigz -dc -p 4 | mbstream --directory=/var/lib/mysql -x --parallel=4"

Explanation of the command:

  1. This command will take a streaming backup (using xbstream) in parallel from the database in /var/lib/mysql.
  2. It will write output the backup.log (just in case you need to debug…)
  3. Then it will pipe that toq pigz (parellel gzip) which will compress in parallel.
  4. This stream of compressed xbstream will be sent encrypted via ssh to host called replica.
  5. On the replica, the compressed xbstream will be uncompressed by pigz, then piped thru mbstream into the /var/lib/mysql directory.

Next steps:

  1. Prepare the backup using the appropriate amount of RAM.
    mariabackup --prepare --use-memory=16G --target-dir=/var/lib/mysql/
  2. Change the ownership of the destination datadir
    chown -R mysql:mysql /var/lib/mysql
  3. Start the db
    systemctl start mariadb
  4. Look at xtrabackup_slave_info to get the slave info
    cat /var/lib/mysql/xtrabackup_slave_info
  5. Change the master using CHANGE MASTER using that info.

Until next time.

 

 

Fixing Error: InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.

This error appears in your error log. It means there is a duplicate key in your full text index, sadly it doesn’t tell you which table or which full text index.

HINT for MySQL/MariaDB/Percona devs: just add the index_id to the error message.

SQL to find all text indexes in your db:

select table_schema,table_name,index_name, index_type from information_schema.statistics
where index_type = 'FULLTEXT' order by table_schema;

+-----------------+-----------------------+-------------+------------+
| table_schema    | table_name            | index_name  | index_type |
+-----------------+-----------------------+-------------+------------+
| db0001_fullname | Activity_fullname_log | message     | FULLTEXT   |
| db0002_fullname | Activity_fullname_log | message     | FULLTEXT   |
+-----------------+-----------------------+-------------+------------+
2 rows in set (0.715 sec)

So what is the next step?

If you follow the documentation it will tell you to rebuild the table or use OPTIMIZE TABLE with innodb_optimize_fulltext_only=ON to rebuild just the text index.

https://dev.mysql.com/doc/refman/8.0/en/fulltext-fine-tuning.html#fulltext-rebuild-innodb-indexes

https://dev.mysql.com/doc/refman/8.0/en/fulltext-fine-tuning.html#fulltext-optimize

My recommendation: Rebuild the table (including the full text indexes)

-- Plain old ALTER TABLE
use db0001_fullname;
ALTER ONLINE TABLE Activity_fullname_log ENGINE=INNODB;

-- Or pt-osc equivalent

pt-online-schema-change --dry-run --alter="ENGINE=INNODB" \
--alter-foreign-keys-method=rebuild_constraints --max-lag=60 \
--max-load Threads_running=50 --critical-load Threads_running=60 \
h=localhost,D=db0001_fullname,t=Activity_fullname_log

pt-online-schema-change --execute --alter="ENGINE=INNODB" \
--alter-foreign-keys-method=rebuild_constraints --max-lag=60 \
--max-load Threads_running=50 --critical-load Threads_running=60 \
h=localhost,D=db0001_fullname,t=Activity_fullname_log

The OPTIMIZE TABLE method:

The trouble with this method is it will only do a set number of words per execution of OPTIMIZE TABLE.

SQL to set the specific table so you can query INNODB_FT_INDEX_TABLE

set global innodb_ft_aux_table = 'db0001_fullname/Activity_fullname_log';
Query OK, 0 rows affected (0.002 sec)

SQL to find the number of distinct words in a specific text index:

SELECT count(distinct word) FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE;
+----------------------+
| count(distinct word) |
+----------------------+
|               600000 |
+----------------------+
1 row in set (1 min 53.333 sec)

Depending how much impact you want per OPTIMIZE TABLE set innodb_ft_num_word_optimize appropriately.

The remaining steps to rebuilding your full text index using OPTIMIZE TABLE

set GLOBAL innodb_ft_num_word_optimize=10000;
set GLOBAL innodb_optimize_fulltext_only=ON;
OPTIMIZE TABLE Activity_fullname_log; 
OPTIMIZE TABLE Activity_fullname_log;
OPTIMIZE TABLE Activity_fullname_log;
OPTIMIZE TABLE Activity_fullname_log;
OPTIMIZE TABLE Activity_fullname_log;
... -- repeat round(distinct words/innodb_ft_num_word_optimize)+1
OPTIMIZE TABLE Activity_fullname_log;

Until next time…

Find a corrupted innodb table or a corrupted index from index id

Your favourite monitoring software throws an alert or sends an email after it finds the following in the mysql error log:

InnoDB: page [page id: space=8858, page number=206777] (95 records, index id 29361)

How do you find which index is corrupted from this line?

Example using the above index id listed in the error log message:

 select * from information_schema.INNODB_SYS_INDEXES where index_id = 29361;

+----------+--------------------------------+----------+------+----------+---------+-------+-----------------+
| INDEX_ID | NAME                           | TABLE_ID | TYPE | N_FIELDS | PAGE_NO | SPACE | MERGE_THRESHOLD |
+----------+--------------------------------+----------+------+----------+---------+-------+-----------------+
|    29361 | blah_longwinded_name_for_index |    11238 |    0 |        3 |       4 |  8858 |              50 |
+----------+--------------------------------+----------+------+----------+---------+-------+-----------------+
1 row in set (0.009 sec)

How do you find which table is corrupted?

Example using the above table_id listed in the output from the above SQL statement.

select * from information_schema.INNODB_SYS_TABLES where table_id = 11238;

+----------+-----------------------------------------------------+------+--------+-------+------------+---------------+------------+
| TABLE_ID | NAME                                                | FLAG | N_COLS | SPACE | ROW_FORMAT | ZIP_PAGE_SIZE | SPACE_TYPE |
+----------+-----------------------------------------------------+------+--------+-------+------------+---------------+------------+
|    11238 | database0001_production/blah_longwinded_name_table  |   41 |      8 |  8858 | Compressed |          8192 | Single     |
+----------+-----------------------------------------------------+------+--------+-------+------------+---------------+------------+

What about just as one statement…

select t.name as table_name,i.name as index_name,t.table_id,i.index_id 
from information_schema.INNODB_SYS_TABLES t
JOIN information_schema.INNODB_SYS_INDEXES i on t.table_id=i.table_id
where index_id = 29361;

+-----------------------------------------------------+--------------------------------+----------+----------+
| table_name                                          | index_name                     | table_id | index_id |
+-----------------------------------------------------+--------------------------------+----------+----------+
| database0001_production/blah_longwinded_name_table  | blah_longwinded_name_for_index | 11238    | 29361    |
+-----------------------------------------------------+--------------------------------+----------+----------+

How do I fix this corrupted index?

Use a null alter to rebuild the table.

WARNING:

  1. Check the size first.
  2. use a tool like pt-online-schema-change if the table is large or accessed heavily, or you are running on Galera, Percona Cluster or equivalent.
  3. Did you get permission to run this on production?

eg:

ALTER TABLE <tablename> ENGINE=INNODB ;

Database security tasks.

How serious are companies and the people who support their databases about security? Not serious enough it seems.
How many databases are running with users with excessive privileges, OS privileges elevated?
What about securing via encryption your network traffic both inside your network and to external networks?

http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/
https://en.wikipedia.org/wiki/Data_breach
https://en.wikipedia.org/wiki/List_of_data_breaches

Over the next series of posts rather than give you a list of security tips, I am going to detail how to make your database more secure in detail.
These are not tips that your read with your morning coffee and then forget about, rather tasks. Tasks which you add to your project or task list
and work through until they are completed, signed off if necessary.

Task 1: Fix database users with wildcard hosts.

The Problem:

Database users with wildcard hosts who can potentially connect from any host from anywhere.
The SQL to check:

select user,host from mysql.user where host = '%';

The Excuses:

  •  Users outside our datacentre (or cloud) network such as corporate headquarters or via home/vpn connections need access to the database.
  • It is too hard to determine which user ip ranges or ip subnets will connect to our database.
  • Monitoring software needs to connect and its ip address could change.
  • If we change from wildcards, “things” could break.
  • We don’t need to worry about wildcard hosts as our firewall will save us/stop any problems.

The example:

Company X has grown from startup hacked up over a few weekends to a global tech VC darling with a huge, or rather HUUGGEE valuation.
The companies application and database originally ran on a pc under the founders desk, but over time has migrated to the cloud to become an Agile, Resilient and Scalable Enterprise.
You, the humble, meek, and lowly DBA have been hired and after onboarding, are tasked amongst other things with securing the database.

You get your credentials for the host and database sorted out and connect to the db and run the following SQL.

MariaDB [(none)]> select user,host from mysql.user where host = '%';
+-------------+------+
| user | host |
+-------------+------+
| hackmeplz | % |
| replication | % |
+-------------+------+
2 rows in set (0.00 sec)

The fix:

Change all users with wildcard host to a specific ip address or a specific ip subnet.

If all application traffic comes from your datacentre or cloud network for example from ip subnet 10.0.0.X then in our example,
‘hackmeplz’@’%’ becomes ‘hackmeplz’@’10.0.0.%’

If all application traffic comes via a Enterprise Load Balancer (ELB) or haproxy or Virtual IP (VIP) such as 10.0.0.3 then in our example,
‘hackmeplz’@’%’ becomes ‘hackmeplz’@’10.0.0.3’

The How to do the fix and not get fired for being a cowboy:

  1. Prepare a security audit report/review.
  2. Present or email to stakeholders (developers, reporting users, managers)
  3. Prepare a maintenance plan and gain approval to change.
  4. Organize maintenance window with specific time and date and gain approval for change.
  5. Follow the maintenance plan which will be something like this
    – Announce start of maintenance window in your chat/communication channel.
    – create a new user with the host part that contains a ip subnet
    – drop old user with wildcard host.
    – Get developers or report users to test to make sure their application works
    – Confirm good.
    – Announce the end of the maintenance window in your chat/communication channel.
  6. Go home and bask in the feeling that you have proactively and pre-emptively prevented a more than probable private pilllager
    from causing your company a publicly painful punishment.[1]

[1] Yahoo USD 1 Billion discount
https://en.wikipedia.org/wiki/Data_breach#cite_note-17
https://techcrunch.com/2016/10/06/report-verizon-wants-1-billion-discount-after-yahoo-privacy-concerns/

 

 

 

 

 

Lookout: crashbug using innodb_track_changed_pages with O_DIRECT

Summary:

If you are using innodb_flush_method = O_DIRECT (which is highly recommended for a bunch of reasons) and innodb_track_changed_pages the instance will crash if you query any tables related to that feature.

So innodb_track_changed_pages is a Percona system variable which is used to make incremental backups much faster.

So if you are using Percona’s great backup and recovery software suite and running at least MySQL 5.5.27 you might have enabled this variable to make your incremental backups run faster.

You may want to reconsider your choice and turn it off … continue to read why…

 

Background:

I was working on a client a while back (12 months ago) and I was looking to improve their incremental backup durations.

When I turned on the feature and restarted the MySQL instance, as soon as I queried the table the MySQL instance crashed.

After a couple of tests, the change was hastily rolled back and I raised this bug ticket with Percona.

https://bugs.launchpad.net/percona-server/+bug/1538403

Testing, preparing and getting a bug raised took time, but apart from Rick Pizzi also posting that he had a similar crash, there has been no activity on this. It is like the crickets chirping in the Simpsons.

http://tvtropes.org/pmwiki/pmwiki.php/Main/ChirpingCrickets

Zero activity, zero response, the bug is Status is new and unassigned since Jan 27th 2016.

Yes this is an older version MySQL Percona 5.5, but it also affects versions up to 5.7.

How many installations are running between 5.5 and 5.7? I would have thought the bulk of the installed base.

Goal of this post:

Hopefully, this article will prompt someone to take a look at the bug.

If it seems like some of my recent blog posts have been raising issues with Percona, it is because I use, support and recommend Percona MySQL databases and tools every day even though I don’t work at Percona. I love that Percona provide the tools and software they do. Having a crash bug open for more than 12 months is disappointing.

The bug in action:

mysql> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 5.5.54-38.6-log | <== The latest 5.5 version
+-----------------+
1 row in set (0.00 sec)

mysql> show global variables like 'innodb%track%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| innodb_track_changed_pages | OFF   |
+----------------------------+-------+
1 row in set (0.00 sec)

-- restart after adding variable to .cnf file

mysql> show global variables like 'innodb%track%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| innodb_track_changed_pages | ON    |
+----------------------------+-------+
1 row in set (0.00 sec)

mysql> select count(*) from INFORMATION_SCHEMA.INNODB_CHANGED_PAGES;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

-- it works because there is nothing in the bitmap file

root@db1:~# ls -l /var/lib/mysql/
total 28704
-rw-r--r-- 1 root  root         0 Feb 18 03:01 debian-5.5.flag
-rw-rw---- 1 mysql mysql 18874368 Feb 18 03:06 ibdata1
-rw-rw---- 1 mysql mysql  5242880 Feb 18 03:06 ib_logfile0
-rw-rw---- 1 mysql mysql  5242880 Feb 18 03:01 ib_logfile1
-rw-rw---- 1 mysql mysql        0 Feb 18 03:06 ib_modified_log_1_0.xdb <=== BITMAP FILE
drwx------ 2 mysql root      4096 Feb 18 03:01 mysql
-rw-rw---- 1 mysql mysql      126 Feb 18 03:05 mysql-bin.000001
-rw-rw---- 1 mysql mysql      126 Feb 18 03:06 mysql-bin.000002
-rw-rw---- 1 mysql mysql      107 Feb 18 03:06 mysql-bin.000003
-rw-rw---- 1 mysql mysql       96 Feb 18 03:06 mysql-bin.index
-rw------- 1 root  root        11 Feb 18 03:01 mysql_upgrade_info
drwx------ 2 mysql mysql     4096 Feb 18 03:01 performance_schema
drwx------ 2 mysql root      4096 Feb 18 03:01 test


-- nothing wrong at this point... now change some rows


mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+
4 rows in set (0.00 sec)

mysql> use test;
Database changed
mysql> show tables;
Empty set (0.00 sec)

-- Don't use column names like this at home folks...

mysql> create table test (blah varchar(10));
Query OK, 0 rows affected (0.02 sec)

-- Fred the world renowned tester

mysql> insert into test values ('fred');
Query OK, 1 row affected (0.00 sec)

mysql> exit
Bye
root@db1:~# ls -l /var/lib/mysql/
total 28708
-rw-r--r-- 1 root  root         0 Feb 18 03:01 debian-5.5.flag
-rw-rw---- 1 mysql mysql 18874368 Feb 18 03:08 ibdata1
-rw-rw---- 1 mysql mysql  5242880 Feb 18 03:08 ib_logfile0
-rw-rw---- 1 mysql mysql  5242880 Feb 18 03:01 ib_logfile1
-rw-rw---- 1 mysql mysql     4096 Feb 18 03:08 ib_modified_log_1_0.xdb <=== BITMAP FILE
drwx------ 2 mysql root      4096 Feb 18 03:01 mysql
-rw-rw---- 1 mysql mysql      126 Feb 18 03:05 mysql-bin.000001
-rw-rw---- 1 mysql mysql      126 Feb 18 03:06 mysql-bin.000002
-rw-rw---- 1 mysql mysql      396 Feb 18 03:08 mysql-bin.000003
-rw-rw---- 1 mysql mysql       96 Feb 18 03:06 mysql-bin.index
-rw------- 1 root  root        11 Feb 18 03:01 mysql_upgrade_info
drwx------ 2 mysql mysql     4096 Feb 18 03:01 performance_schema
drwx------ 2 mysql root      4096 Feb 18 03:08 test

-- still works as innodb_flush_method isn't set

mysql> show global variables like 'innodb_flush_method';
+---------------------+-------+
| Variable_name       | Value |
+---------------------+-------+
| innodb_flush_method |       |
+---------------------+-------+
1 row in set (0.00 sec)

mysql> select count(*) from INFORMATION_SCHEMA.INNODB_CHANGED_PAGES;
+----------+
| count(*) |
+----------+
|       17 |
+----------+
1 row in set (0.01 sec)

-- OK now turn on O_DIRECT

root@db1:~# vi /etc/mysql/my.cnf 
root@db1:~# service mysql restart
 * Stopping MySQL (Percona Server) mysqld                                                                        [ OK ] 
 * Starting MySQL (Percona Server) database server mysqld                                                        [ OK ] 
 * Checking for corrupt, not cleanly closed and upgrade needing tables.

mysql> show global variables like 'innodb_flush_method';
+---------------------+----------+
| Variable_name       | Value    |
+---------------------+----------+
| innodb_flush_method | O_DIRECT |
+---------------------+----------+
1 row in set (0.00 sec)

-- Bingo! Houston we have a problem.

mysql> select count(*) from INFORMATION_SCHEMA.INNODB_CHANGED_PAGES;
ERROR 2013 (HY000): Lost connection to MySQL server during query

MySQL Practice challenges part one.

Are you ready to accept the challenge? Really?

Can you prove that you have got what it takes to be an effective DBA?

Go grab the tests from https://github.com/dbadojo/test-mysql-restore and see if you do…

Background:

These tests are designed to test your ability to do basic restores and recoveries of a MySQL database.

Each restore gets progressively more complex.

The key feature is, once you are successfully restored the MySQL database you will get the encryption key/passphrase
to do the next test.

Use: ccrypt -d <filename>.cpt to decrypt the file.

Requirements:

Virtualbox and vagrant if using a virtual machine for running the MySQL instance.
MySQL 5.6
ccrypt or equivalent that can read ccrypt encrypted files.

A vagrantfile is provided as an example to spawn a simple virtualbox VM with 1Gig of memory to run the small MySQL database required for the tests.

Once you have proved you are awesome…

Once you have completed all the current tests, email dbadojo@gmail.com and we will keep you informed when the next set becomes ready.

For the uber awesome DBAs, if you have ideas for more tests, email them to dbadojo@gmail.com or comment here or send a merge request.

P.S. There are more and harder restores to come… stay tuned.

innodb_fast_checksum=1 and upgrading to MySQL 5.6

The Percona version of MySQL has been such a good replacement for the generic MySQL version that many of the features and options that existed in Percona have been merged into the generic MySQL.

Innodb_fast_checksum was an option added to improve the performance of checksums.

The system variable was replaced by innodb_checksum_algorithm in 5.6.

Unfortunately, when you go to upgrade from Percona 5.x to Percona (or generic mysql) 5.6, an in-place upgrade will fail.

The error(s) will be generally mysql complaining it can’t read the file. This is because fast checksums can’t be read by the 5.6 version.

Example errors:

InnoDB: checksum mismatch in data file
InnoDB: Could not open

The recommended option is do the default upgrade process: use mysqldump to dump your data out and reload after you replace the binaries.

For large datasets or servers suffering poor IO performance, the time it takes to do that, even using a parallel dump and load tool is prohibitive.

So are you looking for a workaround?

How about a mysql tool which has been around for a while, called innochecksum.

This tool can check your datafiles to make sure the checksums are correct, or in our case, force the checksums to be written a specific way. I was thinking, prep work is done, now it is just process work. But alas, the versions of innochecksum for 5.5 and 5.6 don’t support files sizes over 2Gigabytes.

Luckily, innochecksum for 5.7 actually does support larger file sizes and best of all it works on old version datafiles too. For people hitting this article in the future, 5.7 at the time was just a RC (Release candidate).

To use this method:

  1. Backup your db or have good backups.
  2. Organize downtime for your db (slave preferably so you aren’t affecting traffic)
  3. Shutdown mysql
  4. Repeat for each innodb datafile: example command: innochecksum -vS –no-check –write=innodb <path to innodb datafile>
  5. Replace innodb_fast_checksum = 1 with innodb_fast_checksum = 0 in your my.cnf (and chef/puppet/ansible repo)
  6. Restart mysql

I will cover the whole procedure for upgrading from Percona MySQL 5.5 to Percona MySQL 5.6 in more detail in a later post.

Fun tool tip:

I have had to compile the MySQL 5.7 innochecksum for an older linux kernel running glibc older than 2.14, and it works fine as well. The biggest headache was sorting out cmake, boost etc to enable the compilation of the MySQL 5.7 source code.

Have Fun