MySQL Error: error reconnecting to master

Error message:

Slave I/O thread: error reconnecting to master
Last_IO_Error: error connecting to master

Diagnosis:

Check that the slave can connect to the master instance, using the following steps:

  1. Use ping to check the master is reachable. eg ping master.yourdomain.com
  2. Use ping with ip address to check that DNS isn’t broken. eg. ping 192.168.1.2
  3. Use mysql client to connect from slave to master. eg mysql -u repluser -pREPLPASS –host=master.yourdomain.com –port=3306 (substitute whatever port you are connecting to the master on)
  4. If all steps work, then check that the repluser (the SLAVE replication user has the REPLICATION SLAVE privilege). eg. show grants for ‘repl’@’slave.yourdomain.com’;

Resolution:

  • If step 1 and 2 fail, you have a network or firewall issue. Check with a network/firewall administrator or check the logs if you wear those hats.
  • If Step 1 fails but Step 2 works, you have a DNS or names resolution issue. Check that the slave can connect and resolves names using mysql client or ssh/telnet/remote desktop.
  • If Step 3 fails, you need to check the error reported, it will either be a authentication issue (login failed/denied) or an issue with the TCP port the master is listening on. A good way to verify that port is open is to use: telnet master.yourdomain.com 3306 (or the port the master is listening on) if that fails then there is a firewall(s) in the network which are blocking that port.
  • If you get to step 4 and everything looks fine and the slave does reconnect fine on retrying. Then you have probably had either temporary, network failure, names resolution failure, firewall failure or any of the prior together.

Continuing Sporadic issues:

Get hold of the network and firewall logs.
If this is not possible, setup a script to periodically ping, connect, mysql connect and log that over
time to prove to your friendly network admin that there is an problem with the network.

How MySQL deals with it:

MySQL will try and reconnect by itself after a network failure or query timeout.

The process is governed by a few variables:

master-connect-retry
slave-net-timeout
master-retry-count

In a nutshell, a MySQL slave will try to reconnect after getting a timeout (slave-net-timeout) after waiting the number of seconds in master-connect-retry but only for the number of times
specified in master-retry-count.
By default, a MySQL slave waits one hour before retry, and will then retry every 60 seconds for 86,400 times. That is every minute for 60 days.

If the one hour slave-net-timeout is too long for your DR/Slave read strategy you will need to adjust it accordingly.

Edit: 2011/02/02

Thanks to leBolide. He discovered that there is a 32 character limit on the password for replication.

Have Fun

Paul

About these ads

5 thoughts on “MySQL Error: error reconnecting to master

  1. One thing to check, too — check to see if replication is really running.

    What does Slave_IO_Running show? If it's “Yes” then the I/O thread is running.

    Does Seconds_behind_master in SHOW SLAVE STATUS show NULL? If so, then replication isn't running.

    Sometimes the Last_IO_Errno and Last_IO_Error will show the last error, even if replication has resumed and is working fine. So be careful if you think that replication is broken, because it may just be misleading…

  2. I have an odd problem where replication *should* be working, but it isn't.

    Each MySQL server can ping the other by hostname and IP; a traceroute from either shows identical routes in opposite directions. These are CentOS, and I've turned off iptables and SELinux.

    Connecting from the slave to the master:
    mysql -u uk_slave_user –host ccukdb1.domain.int –port=3306 -p

    allows login, but I don't know if this is a problem, slave_user can only show the information_schema
    mysql> show schemas;
    +——————–+
    | Database |
    +——————–+
    | information_schema |
    +——————–+
    1 row in set (0.00 sec)

    Showing grants on the master gives:

    GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'uk_slave_user'@'ccukdb3.domain.int' IDENTIFIED BY PASSWORD '*encryptedpassword*'

    Telnetting to the db1 port gives:

    [root@ccukdb3 logs]# telnet ccukdb1 3306
    Trying 192.168.249.26…
    Connected to ccukdb1.domain.int (192.168.249.26).
    Escape character is '^]'.
    B
    5.1.51-community-log/)TJWU(b 5>vZ?l#eWbf[

    I don't know whether the garbled text there is normal or shows a problem; but it looks like that for every MySQL host I telnet to, so I assume normal.

    BUT, whenever I try starting the slave (after setting CHANGE MASTER etc), this error pops up in the slave's log:

    110126 16:07:48 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000002' at position 106, relay log '/mnt/sdb1/mysql/logs/relay-bin.000003' position: 4
    110126 16:07:48 [ERROR] Slave I/O: error connecting to master 'uk_slave_user@ccukdb1.domain.int:3306' – retry-time: 60 retries: 86400, Error_code: 1045

    And SHOW SLAVE STATUS\g shows this error:

    Last_IO_Errno: 1045
    Last_IO_Error: error connecting to master 'uk_slave_user@ccukdb1.domain.int:3306' – retry-time: 60 retries: 86400

    Error 1045 indicates a credential error, but the credentials otherwise work correctly.

    One caveat: these are both virtual machines, with the slave being a VM-clone from the master, and then the IP and hostname changed. I wondered if legacy user@host entries in the slave's database was causing conflicts, so I dropped all the databases, ran mysql_install_db, re-created necessary users, and then did a mysqldump & load from the master to the slave for a fresh start.

    Same error. Any tips?

  3. @leBolide:

    It looks like you have systematically gone through most steps in trying to resolve the problem.

    Try granting a replication slave user with the slave IP instead of hostname.

    Test the connection from slave to master first using mysql.

    Also check mysql.user on both the master and slave to make sure the hostnames are correct, and again after using a IP address.

    If you need more help drop me an email

    paulmoen /\^$at^$/\ gmail.com

  4. I found out that MASTER_PASSWORD has a maximum length of 32 characters for replication… although a 40-character password works fine for the manual logins and checks you'd run to troubleshoot replication.

Comments are closed.