Faster streaming backups using mariabackup and pigz

Are you looking for a faster way to stream a backup from one db server to another?

Maybe you are rebuilding a replica from the master after someone wrote to the slave and now the data is out of sync.

I had a look at what was out there in terms of articles about streaming backups.

Most of them got part of the way but clearly haven’t had people asking in chat “are we there yet?” over and over, wondering when their replicas will be back so they can direct the read traffic at them.
The key here is to use the parallel options that come with mariabackup and use pigz to compress and decompress in parallel.

Do you want speed?
Do you want to saturate the 10Gigabit bonded link or InfiniBand (I wish…) ?
Do you have spare CPUs to use?


  • This command assumes the destination directory is empty. 
  • The command will 8 CPUs (4 for mariabackup and 4 for pigz) on the source
  • The command can use up to 8 CPUs (4 for pigz to decompress, 4 for mbstream)
  • Don’t run pigz without -p option. As by default it will use all CPUs available.

The command:

mariabackup --backup --slave-info --tmpdir=/tmp --stream=xbstream --parallel=4 --datadir=/var/lib/mysql 2>backup.log | pigz -p 4 | ssh -q replica -t "pigz -dc -p 4 | mbstream --directory=/var/lib/mysql -x --parallel=4"

Explanation of the command:

  1. This command will take a streaming backup (using xbstream) in parallel from the database in /var/lib/mysql.
  2. It will write output the backup.log (just in case you need to debug…)
  3. Then it will pipe that toq pigz (parellel gzip) which will compress in parallel.
  4. This stream of compressed xbstream will be sent encrypted via ssh to host called replica.
  5. On the replica, the compressed xbstream will be uncompressed by pigz, then piped thru mbstream into the /var/lib/mysql directory.

Next steps:

  1. Prepare the backup using the appropriate amount of RAM.
    mariabackup --prepare --use-memory=16G --target-dir=/var/lib/mysql/
  2. Change the ownership of the destination datadir
    chown -R mysql:mysql /var/lib/mysql
  3. Start the db
    systemctl start mariadb
  4. Look at xtrabackup_slave_info to get the slave info
    cat /var/lib/mysql/xtrabackup_slave_info
  5. Change the master using CHANGE MASTER using that info.

Until next time.



Fixing Error: InnoDB: (Duplicate key) writing word node to FTS auxiliary index table.

This error appears in your error log. It means there is a duplicate key in your full text index, sadly it doesn’t tell you which table or which full text index.

HINT for MySQL/MariaDB/Percona devs: just add the index_id to the error message.

SQL to find all text indexes in your db:

select table_schema,table_name,index_name, index_type from information_schema.statistics
where index_type = 'FULLTEXT' order by table_schema;

| table_schema    | table_name            | index_name  | index_type |
| db0001_fullname | Activity_fullname_log | message     | FULLTEXT   |
| db0002_fullname | Activity_fullname_log | message     | FULLTEXT   |
2 rows in set (0.715 sec)

So what is the next step?

If you follow the documentation it will tell you to rebuild the table or use OPTIMIZE TABLE with innodb_optimize_fulltext_only=ON to rebuild just the text index.

My recommendation: Rebuild the table (including the full text indexes)

-- Plain old ALTER TABLE
use db0001_fullname;

-- Or pt-osc equivalent

pt-online-schema-change --dry-run --alter="ENGINE=INNODB" \
--alter-foreign-keys-method=rebuild_constraints --max-lag=60 \
--max-load Threads_running=50 --critical-load Threads_running=60 \

pt-online-schema-change --execute --alter="ENGINE=INNODB" \
--alter-foreign-keys-method=rebuild_constraints --max-lag=60 \
--max-load Threads_running=50 --critical-load Threads_running=60 \


The trouble with this method is it will only do a set number of words per execution of OPTIMIZE TABLE.

SQL to set the specific table so you can query INNODB_FT_INDEX_TABLE

set global innodb_ft_aux_table = 'db0001_fullname/Activity_fullname_log';
Query OK, 0 rows affected (0.002 sec)

SQL to find the number of distinct words in a specific text index:

| count(distinct word) |
|               600000 |
1 row in set (1 min 53.333 sec)

Depending how much impact you want per OPTIMIZE TABLE set innodb_ft_num_word_optimize appropriately.

The remaining steps to rebuilding your full text index using OPTIMIZE TABLE

set GLOBAL innodb_ft_num_word_optimize=10000;
set GLOBAL innodb_optimize_fulltext_only=ON;
OPTIMIZE TABLE Activity_fullname_log; 
OPTIMIZE TABLE Activity_fullname_log;
OPTIMIZE TABLE Activity_fullname_log;
OPTIMIZE TABLE Activity_fullname_log;
OPTIMIZE TABLE Activity_fullname_log;
... -- repeat round(distinct words/innodb_ft_num_word_optimize)+1
OPTIMIZE TABLE Activity_fullname_log;

Until next time…

Find a corrupted innodb table or a corrupted index from index id

Your favourite monitoring software throws an alert or sends an email after it finds the following in the mysql error log:

InnoDB: page [page id: space=8858, page number=206777] (95 records, index id 29361)

How do you find which index is corrupted from this line?

Example using the above index id listed in the error log message:

 select * from information_schema.INNODB_SYS_INDEXES where index_id = 29361;

| INDEX_ID | NAME                           | TABLE_ID | TYPE | N_FIELDS | PAGE_NO | SPACE | MERGE_THRESHOLD |
|    29361 | blah_longwinded_name_for_index |    11238 |    0 |        3 |       4 |  8858 |              50 |
1 row in set (0.009 sec)

How do you find which table is corrupted?

Example using the above table_id listed in the output from the above SQL statement.

select * from information_schema.INNODB_SYS_TABLES where table_id = 11238;

| TABLE_ID | NAME                                                | FLAG | N_COLS | SPACE | ROW_FORMAT | ZIP_PAGE_SIZE | SPACE_TYPE |
|    11238 | database0001_production/blah_longwinded_name_table  |   41 |      8 |  8858 | Compressed |          8192 | Single     |

What about just as one statement…

select as table_name, as index_name,t.table_id,i.index_id 
from information_schema.INNODB_SYS_TABLES t
JOIN information_schema.INNODB_SYS_INDEXES i on t.table_id=i.table_id
where index_id = 29361;

| table_name                                          | index_name                     | table_id | index_id |
| database0001_production/blah_longwinded_name_table  | blah_longwinded_name_for_index | 11238    | 29361    |

How do I fix this corrupted index?

Use a null alter to rebuild the table.


  1. Check the size first.
  2. use a tool like pt-online-schema-change if the table is large or accessed heavily, or you are running on Galera, Percona Cluster or equivalent.
  3. Did you get permission to run this on production?



Database security tasks.

How serious are companies and the people who support their databases about security? Not serious enough it seems.
How many databases are running with users with excessive privileges, OS privileges elevated?
What about securing via encryption your network traffic both inside your network and to external networks?

Over the next series of posts rather than give you a list of security tips, I am going to detail how to make your database more secure in detail.
These are not tips that your read with your morning coffee and then forget about, rather tasks. Tasks which you add to your project or task list
and work through until they are completed, signed off if necessary.

Task 1: Fix database users with wildcard hosts.

The Problem:

Database users with wildcard hosts who can potentially connect from any host from anywhere.
The SQL to check:

select user,host from mysql.user where host = '%';

The Excuses:

  •  Users outside our datacentre (or cloud) network such as corporate headquarters or via home/vpn connections need access to the database.
  • It is too hard to determine which user ip ranges or ip subnets will connect to our database.
  • Monitoring software needs to connect and its ip address could change.
  • If we change from wildcards, “things” could break.
  • We don’t need to worry about wildcard hosts as our firewall will save us/stop any problems.

The example:

Company X has grown from startup hacked up over a few weekends to a global tech VC darling with a huge, or rather HUUGGEE valuation.
The companies application and database originally ran on a pc under the founders desk, but over time has migrated to the cloud to become an Agile, Resilient and Scalable Enterprise.
You, the humble, meek, and lowly DBA have been hired and after onboarding, are tasked amongst other things with securing the database.

You get your credentials for the host and database sorted out and connect to the db and run the following SQL.

MariaDB [(none)]> select user,host from mysql.user where host = '%';
| user | host |
| hackmeplz | % |
| replication | % |
2 rows in set (0.00 sec)

The fix:

Change all users with wildcard host to a specific ip address or a specific ip subnet.

If all application traffic comes from your datacentre or cloud network for example from ip subnet 10.0.0.X then in our example,
‘hackmeplz’@’%’ becomes ‘hackmeplz’@’10.0.0.%’

If all application traffic comes via a Enterprise Load Balancer (ELB) or haproxy or Virtual IP (VIP) such as then in our example,
‘hackmeplz’@’%’ becomes ‘hackmeplz’@’’

The How to do the fix and not get fired for being a cowboy:

  1. Prepare a security audit report/review.
  2. Present or email to stakeholders (developers, reporting users, managers)
  3. Prepare a maintenance plan and gain approval to change.
  4. Organize maintenance window with specific time and date and gain approval for change.
  5. Follow the maintenance plan which will be something like this
    – Announce start of maintenance window in your chat/communication channel.
    – create a new user with the host part that contains a ip subnet
    – drop old user with wildcard host.
    – Get developers or report users to test to make sure their application works
    – Confirm good.
    – Announce the end of the maintenance window in your chat/communication channel.
  6. Go home and bask in the feeling that you have proactively and pre-emptively prevented a more than probable private pilllager
    from causing your company a publicly painful punishment.[1]

[1] Yahoo USD 1 Billion discount






Lookout: crashbug using innodb_track_changed_pages with O_DIRECT


If you are using innodb_flush_method = O_DIRECT (which is highly recommended for a bunch of reasons) and innodb_track_changed_pages the instance will crash if you query any tables related to that feature.

So innodb_track_changed_pages is a Percona system variable which is used to make incremental backups much faster.

So if you are using Percona’s great backup and recovery software suite and running at least MySQL 5.5.27 you might have enabled this variable to make your incremental backups run faster.

You may want to reconsider your choice and turn it off … continue to read why…



I was working on a client a while back (12 months ago) and I was looking to improve their incremental backup durations.

When I turned on the feature and restarted the MySQL instance, as soon as I queried the table the MySQL instance crashed.

After a couple of tests, the change was hastily rolled back and I raised this bug ticket with Percona.

Testing, preparing and getting a bug raised took time, but apart from Rick Pizzi also posting that he had a similar crash, there has been no activity on this. It is like the crickets chirping in the Simpsons.

Zero activity, zero response, the bug is Status is new and unassigned since Jan 27th 2016.

Yes this is an older version MySQL Percona 5.5, but it also affects versions up to 5.7.

How many installations are running between 5.5 and 5.7? I would have thought the bulk of the installed base.

Goal of this post:

Hopefully, this article will prompt someone to take a look at the bug.

If it seems like some of my recent blog posts have been raising issues with Percona, it is because I use, support and recommend Percona MySQL databases and tools every day even though I don’t work at Percona. I love that Percona provide the tools and software they do. Having a crash bug open for more than 12 months is disappointing.

The bug in action:

mysql> select @@version;
| @@version       |
| 5.5.54-38.6-log | <== The latest 5.5 version
1 row in set (0.00 sec)

mysql> show global variables like 'innodb%track%';
| Variable_name              | Value |
| innodb_track_changed_pages | OFF   |
1 row in set (0.00 sec)

-- restart after adding variable to .cnf file

mysql> show global variables like 'innodb%track%';
| Variable_name              | Value |
| innodb_track_changed_pages | ON    |
1 row in set (0.00 sec)

| count(*) |
|        0 |
1 row in set (0.00 sec)

-- it works because there is nothing in the bitmap file

root@db1:~# ls -l /var/lib/mysql/
total 28704
-rw-r--r-- 1 root  root         0 Feb 18 03:01 debian-5.5.flag
-rw-rw---- 1 mysql mysql 18874368 Feb 18 03:06 ibdata1
-rw-rw---- 1 mysql mysql  5242880 Feb 18 03:06 ib_logfile0
-rw-rw---- 1 mysql mysql  5242880 Feb 18 03:01 ib_logfile1
-rw-rw---- 1 mysql mysql        0 Feb 18 03:06 ib_modified_log_1_0.xdb <=== BITMAP FILE
drwx------ 2 mysql root      4096 Feb 18 03:01 mysql
-rw-rw---- 1 mysql mysql      126 Feb 18 03:05 mysql-bin.000001
-rw-rw---- 1 mysql mysql      126 Feb 18 03:06 mysql-bin.000002
-rw-rw---- 1 mysql mysql      107 Feb 18 03:06 mysql-bin.000003
-rw-rw---- 1 mysql mysql       96 Feb 18 03:06 mysql-bin.index
-rw------- 1 root  root        11 Feb 18 03:01 mysql_upgrade_info
drwx------ 2 mysql mysql     4096 Feb 18 03:01 performance_schema
drwx------ 2 mysql root      4096 Feb 18 03:01 test

-- nothing wrong at this point... now change some rows

mysql> show databases;
| Database           |
| information_schema |
| mysql              |
| performance_schema |
| test               |
4 rows in set (0.00 sec)

mysql> use test;
Database changed
mysql> show tables;
Empty set (0.00 sec)

-- Don't use column names like this at home folks...

mysql> create table test (blah varchar(10));
Query OK, 0 rows affected (0.02 sec)

-- Fred the world renowned tester

mysql> insert into test values ('fred');
Query OK, 1 row affected (0.00 sec)

mysql> exit
root@db1:~# ls -l /var/lib/mysql/
total 28708
-rw-r--r-- 1 root  root         0 Feb 18 03:01 debian-5.5.flag
-rw-rw---- 1 mysql mysql 18874368 Feb 18 03:08 ibdata1
-rw-rw---- 1 mysql mysql  5242880 Feb 18 03:08 ib_logfile0
-rw-rw---- 1 mysql mysql  5242880 Feb 18 03:01 ib_logfile1
-rw-rw---- 1 mysql mysql     4096 Feb 18 03:08 ib_modified_log_1_0.xdb <=== BITMAP FILE
drwx------ 2 mysql root      4096 Feb 18 03:01 mysql
-rw-rw---- 1 mysql mysql      126 Feb 18 03:05 mysql-bin.000001
-rw-rw---- 1 mysql mysql      126 Feb 18 03:06 mysql-bin.000002
-rw-rw---- 1 mysql mysql      396 Feb 18 03:08 mysql-bin.000003
-rw-rw---- 1 mysql mysql       96 Feb 18 03:06 mysql-bin.index
-rw------- 1 root  root        11 Feb 18 03:01 mysql_upgrade_info
drwx------ 2 mysql mysql     4096 Feb 18 03:01 performance_schema
drwx------ 2 mysql root      4096 Feb 18 03:08 test

-- still works as innodb_flush_method isn't set

mysql> show global variables like 'innodb_flush_method';
| Variable_name       | Value |
| innodb_flush_method |       |
1 row in set (0.00 sec)

| count(*) |
|       17 |
1 row in set (0.01 sec)

-- OK now turn on O_DIRECT

root@db1:~# vi /etc/mysql/my.cnf 
root@db1:~# service mysql restart
 * Stopping MySQL (Percona Server) mysqld                                                                        [ OK ] 
 * Starting MySQL (Percona Server) database server mysqld                                                        [ OK ] 
 * Checking for corrupt, not cleanly closed and upgrade needing tables.

mysql> show global variables like 'innodb_flush_method';
| Variable_name       | Value    |
| innodb_flush_method | O_DIRECT |
1 row in set (0.00 sec)

-- Bingo! Houston we have a problem.

ERROR 2013 (HY000): Lost connection to MySQL server during query

MySQL Practice challenges part one.

Are you ready to accept the challenge? Really?

Can you prove that you have got what it takes to be an effective DBA?

Go grab the tests from and see if you do…


These tests are designed to test your ability to do basic restores and recoveries of a MySQL database.

Each restore gets progressively more complex.

The key feature is, once you are successfully restored the MySQL database you will get the encryption key/passphrase
to do the next test.

Use: ccrypt -d <filename>.cpt to decrypt the file.


Virtualbox and vagrant if using a virtual machine for running the MySQL instance.
MySQL 5.6
ccrypt or equivalent that can read ccrypt encrypted files.

A vagrantfile is provided as an example to spawn a simple virtualbox VM with 1Gig of memory to run the small MySQL database required for the tests.

Once you have proved you are awesome…

Once you have completed all the current tests, email and we will keep you informed when the next set becomes ready.

For the uber awesome DBAs, if you have ideas for more tests, email them to or comment here or send a merge request.

P.S. There are more and harder restores to come… stay tuned.

MySQL upgrade 5.6 with innodb_fast_checksum=1


My checklist for performing an in-place MySQL upgrade to 5.6.


In my previous post, I discussed the problem I had when doing an in-place MySQL upgrade from 5.5 to 5.6 when the database had been running with innodb_fast_checksum=1.

The solution was to use the MySQL 5.7 version of the tool innochecksum. Using this tool on a shutdown database, you can force the checksums on the innodb datafiles to be rewritten into either INNODB or CRC32 format.

Once the MySQL 5.6 upgrade is done, the 5.6 version of mysqld will be able to read the datafiles correctly and not fail with an error.

There is already plenty of good documentation on the MySQL website on how to upgrade from 5.x to 5.6.


My checklist for in-place upgrading to MySQL 5.6:

  1. Perform application and database performance testing on your test environment to make sure your application performance doesn’t get worse when running on MySQL 5.6.
  2. Make sure you have backups and verified that your backups are good aka you have restored databases from those backups.
  3. Check that all users have updated their passwords to use the new mysql password hash (plugin) Doc URL
  4. Organize downtime in advance.
  5. If running with innodb_fast_checksum=1, proceed with steps to replace the fast checksums with INNODB or CRC32.
    Note: if you use CRC32, you will need to make sure your cnf file is updated for 5.6 to use innodb_checksum_algorithm = CRC32. This is because innodb_checksum_algorithm = INNODB is the default setting. See this post for a sample procedure.
  6. Run a quick search of all existing .cnf files to find any other system variables which have been removed and either replace or remove them.
  7. Run the in-place upgrade.
  8. Run mysql_upgrade, it will flag if it doesn’t need to be run again.

I am trying something new with a poll. Enjoy.

innodb_fast_checksum=1 and upgrading to MySQL 5.6

The Percona version of MySQL has been such a good replacement for the generic MySQL version that many of the features and options that existed in Percona have been merged into the generic MySQL.

Innodb_fast_checksum was an option added to improve the performance of checksums.

The system variable was replaced by innodb_checksum_algorithm in 5.6.

Unfortunately, when you go to upgrade from Percona 5.x to Percona (or generic mysql) 5.6, an in-place upgrade will fail.

The error(s) will be generally mysql complaining it can’t read the file. This is because fast checksums can’t be read by the 5.6 version.

Example errors:

InnoDB: checksum mismatch in data file
InnoDB: Could not open

The recommended option is do the default upgrade process: use mysqldump to dump your data out and reload after you replace the binaries.

For large datasets or servers suffering poor IO performance, the time it takes to do that, even using a parallel dump and load tool is prohibitive.

So are you looking for a workaround?

How about a mysql tool which has been around for a while, called innochecksum.

This tool can check your datafiles to make sure the checksums are correct, or in our case, force the checksums to be written a specific way. I was thinking, prep work is done, now it is just process work. But alas, the versions of innochecksum for 5.5 and 5.6 don’t support files sizes over 2Gigabytes.

Luckily, innochecksum for 5.7 actually does support larger file sizes and best of all it works on old version datafiles too. For people hitting this article in the future, 5.7 at the time was just a RC (Release candidate).

To use this method:

  1. Backup your db or have good backups.
  2. Organize downtime for your db (slave preferably so you aren’t affecting traffic)
  3. Shutdown mysql
  4. Repeat for each innodb datafile: example command: innochecksum -vS –no-check –write=innodb <path to innodb datafile>
  5. Replace innodb_fast_checksum = 1 with innodb_fast_checksum = 0 in your my.cnf (and chef/puppet/ansible repo)
  6. Restart mysql

I will cover the whole procedure for upgrading from Percona MySQL 5.5 to Percona MySQL 5.6 in more detail in a later post.

Fun tool tip:

I have had to compile the MySQL 5.7 innochecksum for an older linux kernel running glibc older than 2.14, and it works fine as well. The biggest headache was sorting out cmake, boost etc to enable the compilation of the MySQL 5.7 source code.

Have Fun

Prewarm your EBS backed EC2 MySQL slaves

This is the story of cold blocks and mismatched instances and how they will cause you pain and cost you money until you understand why.

Most of the clients that we support run on the Amazon cloud using either RDS or running MySQL on plain EC2 instances using (Provisioned IOPS) PIOPS EBS for data storage.

As expected the common architecture is running a master with one or more slaves handling the read traffic.

A common problem is that after the slaves are provisioned (normally created from an EBS snapshot) they lag badly due to slow IO performance.

Unfortunately what tends to be lost in the “speed of provisioning new resources” fetish is some limitations in terms of data persistence layer (EBS).

If you are using EBS and you have created the EBS volume from snapshot or created a new volume you have to pre-warm the EBS volume otherwise you will suffer a bad (I mean seriously bad) first usage penalty.  Bad? I am talking up to 50% performance drop[1]. So that expensive PIOPS EBS volume you created is going to perform like rubbish every time it reads/writes a cold block.

The other thing which also tends to happen is mixing up the wrong instance (network performance) with the PIOPS EBS. This the classic networked storage, the network is the bottleneck. If your instance type has limited network performance, having a higher PIOPS than the network can handle means you are wasting money (on PIOPS) you can’t use. A bit like in the old days (of dedicated servers and SAN storage) where the SAN could deliver 200-300Mbytes per sec, but the 1 Gigabit network could only do 40-50Mbytes per sec.

Here is the real downside, using the cloud you can provision new resources to handle peak load (in the case more MySQL slaves to handle read load) as fast as you can click, or faster using API calls, or even automagically, if you have some algo forecast the need for additional resources. But… the EBS is all cold blocks, so these new instances will be up and available in minutes but the IO performance will be poor until you either pre-warm or the slave gets around to writing/reading all blocks.

So the common solution is to pre-warm the blocks using dd to read the EBS device (and warm the block) to /dev/null

eg: sudo dd if=/dev/xvdf of=/dev/null bs=1M

Consider how long this will take for any reasonable sized DB (200GBytes) using an instance with 1 Gigabit network.

200Gigabytes read at 50Mbytes/sec  = 200,000 Mbytes/50 = 4000 secs = 3600 (1hr) + 400 (6 mins 40 secs) =~ more than 1 hr.

So you or your algo provisioned a new EC2 instance for the database in minutes but either your IO will be rubbish for an extended period, or you wait more than 1 hr per 200GB to have the EBS pre-warmed.

What are the solutions?

  1. Forecast further in advance depending on the size of your db (or any other persistent storage layer eg NoSQL etc)
  2. Use ephemeral storage and manage the increased risk of data loss in the event of instance termination.
  3. Break your DB or your application into smaller pieces aka micro services.[2]
  4. Pay more $ and have your databases stay around longer so waiting for a instance to be ready in the beginning is not a problem.

As you can expect, most businesses are happy with option 4. Pay more, leave instances around like they were dedicated servers (base load). Amazon is happy too.

Option 3 whilst requiring some thought (argh) and additional complexity is where the real speed of provisioning, dare I say it, agile nature of the cloud will bear the most fruit.




mysqlslap howto

I noticed that people were hitting the site for information on how to run mysqlslap.

To help out those searchers, here is a quick mysqlslap howto

  1. Make sure you have mysql 5.1.4 or higher. Download MySQL from the MySQL website
  2. Make sure your MySQL database is running.
  3. Run mysqlslap, using progressively more concurrent threads:
    mysqlslap  --concurrency=1,25,50,100 --iterations=10 --number-int-cols=2 \
    --number-char-cols=3 --auto-generate-sql --csv=/tmp/mysqlslap.csv \
    --engine=blackhole,myisam,innodb --auto-generate-sql-add-autoincrement \
    --auto-generate-sql-load-type=mixed --number-of-queries=100 --user=root \

For detailed descriptions of each parameter see the MySQL documentation:

If you want to see how I used mysqlslap to test mysql performance on Amazon EC2, here are the list of posts