How-to fix Mysql slave after relay log corrupted

We run into issues like “Relay log read failure: Could not parse relay log event entry” many times on Mysql slave node. Most of time in our cases, it’s due to Linux host crashed unexpectedly.

The detail error is like this:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: alexzeng.wordpress.com
                  Master_User: mysql_rep
                  Master_Port: 3306
                Connect_Retry: 60
             Master_Log_File: LBMS-bin.000012
          Read_Master_Log_Pos: 239005305
               Relay_Log_File: LBMS-relay-bin.000004
                Relay_Log_Pos: 221245258
        Relay_Master_Log_File: LBMS-bin.000012
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 221245113
              Relay_Log_Space: 239007065
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)

In this case, we can check if the bin log at master node is still available.

        Relay_Master_Log_File: LBMS-bin.000012
        ...
          Exec_Master_Log_Pos: 221245113

If it’s still available at master node, we can reset slave to that point, and let it recover by itself as follows:

Be careful, using Relay_Master_Log_File value in “show slave status\G” for master_log_file, not Master_Log_File value. They’re the same in this case.


mysql> slave stop;
Query OK, 0 rows affected (0.00 sec)

mysql> change master to master_log_file='LBMS-bin.000012',master_log_pos=221245113;
Query OK, 0 rows affected (0.04 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

Check slave status again and again, make sure Exec_Master_Log_Pos is increasing. Seconds_Behind_Master should be decreasing. If master is too busy, the delay may still increase, but it’ll catch up sooner or later.


mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: alexzeng.wordpress.com
                  Master_User: mysql_rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: LBMS-bin.000012
          Read_Master_Log_Pos: 239014770
               Relay_Log_File: LBMS-relay-bin.000002
                Relay_Log_Pos: 1905303
        Relay_Master_Log_File: LBMS-bin.000012
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 223150164
              Relay_Log_Space: 17770064
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 159147
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
1 row in set (0.00 sec)

I’d like to end this blog with Peter Zaitsev‘s article How SHOW SLAVE STATUS relates to CHANGE MASTER TO words :

It is very important to understand the difference between IO Thread position in Master logs (Master_Log_File:Read_Master_Log_Pos) and SQL Thread position in Master logs (Relay_Master_Log_File:Exec_Master_Log_Pos).

It’s really help us understanding the differences of these values.

Advertisements

Backup MySQL using mysqldump script

It’s a simple script to backup MySQL db using mysqldump.

#!/bin/bash

if [ $# -lt 3 ]; then
cat <<EOF
  Backup MySQL DB using mysqldump

  usage:  backup_mysql_by_dump.sh <BACKUP_DB> <BACKUP_DIR> <BACKUP_KEEP_DAYS>
  examples:      
          backup_mysql_by_dump.sh cacti /data/cacti/mysql_backup/dump 7       --backup cacti db, and keep 7 days
          backup_mysql_by_dump.sh all /data/cacti/mysql_backup/dump 3         --backup all db, and keep 3 days

EOF
  exit 1
fi

# arguments
BACKUP_DB="$1"
BACKUP_DIR="$2"
BACKUP_KEEP_DAYS="$3"

# configuration
MYSQL_USER="cacti"
MYSQL_PASS="cacti"
MYSQL_HOST="localhost"
MYSQL_HOME="/usr"
DB_TYPE="InnoDB"
MAIL_TO="alexzeng@wordpress.com"

# prepare backup
DATE=`date +"%Y.%m.%d"`
BACKUP_FILE="$BACKUP_DIR/$BACKUP_DB.$DATE.sql.gz"
BACKUP_CMD="$MYSQL_HOME/bin/mysqldump"
BACKUP_CMD="$BACKUP_CMD -u $MYSQL_USER -p$MYSQL_PASS -h $MYSQL_HOST "
BACKUP_CMD="$BACKUP_CMD --routines --events "

if [ $DB_TYPE = "InnoDB" ]; then
  BACKUP_CMD="$BACKUP_CMD --single-transaction"
else
  BACKUP_CMD="$BACKUP_CMD --lock-tables FALSE"
fi

BACKUP_DB=`echo $BACKUP_DB | awk '{print tolower($0)}'`
if [ $BACKUP_DB = "all" ]; then
  BACKUP_CMD="$BACKUP_CMD --all-databases"
else
  BACKUP_CMD="$BACKUP_CMD $BACKUP_DB"
fi

if [ ! -d "$BACKUP_DIR" ]; then
    echo "Backup dir $BACKUP_DIR does not exist"
    exit 1
fi

# backup
$BACKUP_CMD | gzip > "$BACKUP_FILE" 
if [ $? -ne 0 ]; then
  mailx -s "Backup mysql cron job $0 failed" $MAIL_TO <<EOF
Backup files list :
`ls -lt $BACKUP_DIR/$BACKUP_DB.*.sql.gz`
EOF
fi

#delete old files
cd $BACKUP_DIR
for backup in `find . -ctime +$BACKUP_KEEP_DAYS -name "$BACKUP_DB.*.sql.gz"`; do rm -f $backup; done;

How-to setup MySQL HA by using keepalived

With MySQL replication and keepalived, we can setup a quite robust high available MySQL environment in a few steps:

Environment:
Host1: db01.wordpress.com
Host2: db02.wordpress.com
DBVIP: mysql.wordpress.com 10.0.0.1

1. Setup MySQL Master-Master replication
Ref: Setup MySQL replication

2. Install keepalived at both hosts

--using apt-get, for Ubuntu
apt-get install keepalived

--using yum, for Redhat
yum install keepalive

3. Config keepalived

1) Add keepalived config file /etc/keepalived/keepalived.conf
Config file for host db01:

! Configuration File for keepalived
global_defs {
      notification_email {
        alexzeng@wordpress.com
      }
      notification_email_from alexzeng@wordpress.com
      smtp_server mx.wordpress.com
      smtp_connect_timeout 30
      router_id mysql-ha
      }

vrrp_script check_mysql {
   script "/mysql/keepalived_check.sh db02.wordpress.com"
   interval 2
   weight 2
}

vrrp_instance VI_1 {
      state BACKUP
      interface eth1
      virtual_router_id 51
      priority 100
      advert_int 1
      nopreempt  # only needed on higher priority node
      authentication {
      auth_type PASS
      auth_pass 1111
      }

      track_script {
        check_mysql
      }
      virtual_ipaddress {
        10.0.0.1/24 dev eth1 label eth1:1
      }
      notify_master /mysql/keepalived_master.sh
      notify_backup /mysql/keepalived_backup.sh
}

Config file of host db02:

Copy the config file in db01, and change this line:
From
   script "/mysql/keepalived_check.sh db02.wordpress.com"
to
   script "/mysql/keepalived_check.sh db01.wordpress.com"

2) Add scripts to both nodes
/mysql/keepalived_check.sh : monitor MySQL (for the host/network down, keepalived has internal mechanism to monitor them)

#!/bin/bash
# monitor mysql status
# if this node mysql is dead and its slave delay less than 120 seconds, then stop its keepalived. The other node will bind the IP.

export MYSQL_HOME=/mysql
export PATH=$MYSQL_HOME/bin:$PATH

mysql="$MYSQL_HOME/bin/mysql"
delay_file="$MYSQL_HOME/slave_delay_second.log"
slave_host=$1

$mysql -u root --connect_timeout=3 --execute="select version();"

if [ $? -ne 0 ]; then
 delayseconds=`cat $delay_file`
 if [ $delayseconds -le 120 ]; then
   /etc/init.d/keepalived stop
 fi
 exit 1 #bad
fi

# Get slave delay time and save it
$mysql -urepluser -prepluser -h$slave_host --connect_timeout=3 -e"select version();"
if [ $? -eq 0 ]; then
  delayseconds=`$mysql -urepluser -prepluser -h$slave_host --connect_timeout=3 -e"show slave status\G"|grep Seconds_Behind_Master|awk '{print \$2}'`
  if [[ "$delayseconds" =~ ^[0-9]+$ ]] ; then
     echo "$delayseconds" > $delay_file
  else
     echo "9999" > $delay_file
  fi
fi
exit 0 #good

/mysql/keepalived_master.sh : it will be called when the node becomes master

#!/bin/bash

my_host=`hostname`
current_date=`/bin/date +"%b %d %H:%M:%S"`
From="$my_host"
mail_list=alexzeng@wordpress.com

Subject="$my_host is MASTER"
Msgboday="$current_date : mysql.wordpress.com is online at $my_host"
echo "$Msgboday" | /usr/bin/mailx  -s "$Subject" "$mail_list"

/mysql/keepalived_backup.sh : it will be called when the node becomes slave

#!/bin/bash

my_host=`hostname`
current_date=`/bin/date +"%b %d %H:%M:%S"`
From="$my_host"
mail_list=alexzeng@wordpress.com

Subject="$my_host is BACKUP"
Msgboday="$current_date : mysql.wordpress.com is offline at $my_host"
echo "$Msgboday" | /usr/bin/mailx  -s "$Subject" "$mail_list"

4. Start keepalived at both nodes

service keepalived start
or 
/etc/init.d/keepalived start

Check its log file at /var/log/messages

5. Test it
Scenarios:
A. Stop MySQL at the master node
B. Shutdown master node network
C. Shutdown master node OS
D. Split-brain (the nodes cannot connect to each other) – In my test, keepalived didn’t do anything in this situation.

Check result:
1) Check emails
2) Check IP using ifconfig at both nodes
2) Connect to DB without stop:

 while true loop
 do
 date
 mysql -urepluser -prepluser -hmysql.wordpress.com -e"select @@hostname;"
 sleep 1
 done;

In my test, the db cannot be connected for just 2 seconds.

Mon Oct 29 22:30:51 GMT+7 2012
+---------------+
| @@hostname    |
+---------------+
| db01          | 
+---------------+
Mon Oct 29 22:30:52 GMT+7 2012
ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql.wordpress.com' (111)
Mon Oct 29 22:30:53 GMT+7 2012
ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql.wordpress.com' (111)
Mon Oct 29 22:30:54 GMT+7 2012
+---------------+
| @@hostname    |
+---------------+
| db02          | 
+---------------+

How-to setup mysql replication in simple steps

MySQL replication is quite straight forward. Here are the setup steps:

1. Create a user for replication at master.

mysql> GRANT REPLICATION SLAVE ON *.* TO 'repluser'@'%' identified by '';

2. Copy master data directory to slave.

For Innodb, use mysqldump with option –single-transaction to get a consistent dump (you can get the log file and position in the dumpfile), or shutdown the source DB to get a cold consistent copy.

For all other DB engine, following these steps:
1) Lock the tables at master

mysql> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.00 sec)

mysql> show master status;
+----------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+----------------+----------+--------------+------------------+
| bin-log.000074 | 915 | | |
+----------------+----------+--------------+------------------+
1 row in set (0.00 sec)

–Copy the log file name and position number, it will be used later

2) Copy the data over
$ rsync -avz /mysql/data :/mysql/data

Remove file master.info and relay-log.info if you build the slave from Master.

3) After the copy is done, unlock tables at Master:

mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)

or you can exist the session which will release the lock as well.

3. Setup slave

If slave binary is not setup, you can also copy them from master.

1) Startup mysql
a. Remember change the server_id to a different value in my.cnf
b. Binary logging is required for replication. And relay-log is recommended.
log-bin=mysql-bin
relay-log = relay-log
relay-log-index = relay-log.index

2) Set master

mysql> change master to
    ->                     master_host='alexzeng.wordpress.com',
    ->                     master_user='repluser',
    ->                     master_password='<password>',
    ->                     master_port = 3306,
    ->                     master_log_file='bin-log.000074',
    ->                     master_log_pos=915;
Query OK, 0 rows affected (0.00 sec)
--use the log file name and postion we got previously

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: alexzeng.wordpress.com
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin-log.000074
          Read_Master_Log_Pos: 1199
               Relay_Log_File: relay-log.000003
                Relay_Log_Pos: 533
        Relay_Master_Log_File: bin-log.000074
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1199
              Relay_Log_Space: 682
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
1 row in set (0.00 sec)

Now the Master-Slave replication is setup, is it easy 🙂

In addition,

A. If you want to setup Master-Master replication, continue the following steps:
At slave :

mysql>  show master status;
+----------------+----------+--------------+------------------+
| File           | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+----------------+----------+--------------+------------------+
| bin-log.000077 |      106 |              |                  |
+----------------+----------+--------------+------------------+
1 row in set (0.00 sec)

At master:

mysql> change master to
    ->                     master_host='alexzeng2.wordpress.com',
    ->                     master_user='repluser',
    ->                     master_password='<password>',
    ->                     master_port = 3306,
    ->                     master_log_file='bin-log.000077',
    ->                     master_log_pos=106;
Query OK, 0 rows affected (0.03 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

B. If you want to setup a slave from an existing slave, using the following steps:

* Shutdown the existing slave db

* Copy the data to new slave:
$ rsync -avz /mysql/data <target_host>:/mysql/data
--Keep file master.info and relay-log.info if you build the slave from another slave.

* Startup mysql at new slave.
make sure it uses a different server_id in my.cnf
(cd $MYSQL_BASE; ./bin/mysqld_safe &)
start slave;
--Do NOT need to run "change master to ..." statement as it already in master.info and relay-log.info

How-to fix MyISAM table after disk full

Recently, one of MySQL host run into disk full issue. This make one of the table corrupted.

mysql> show create table poller_output;
ERROR 1017 (HY000): Can't find file: 'poller_output' (errno: 2)

[cacti]# ls poller_output*
poller_output.frm

As you can see, the MYD, MYI file of the table are missed.

We can simply truncate the table to fix it:

mysql> truncate table poller_output;
Query OK, 0 rows affected (0.00 sec)

--probably you can use below commands to fix them if you need the data there.
SQL> check table poller_output;
SQL> repair table poller_output;

The table is back 🙂

mysql>  show create table poller_output;
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table         | Create Table                                                                                                                                                                                                                                                                                                             |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| poller_output | CREATE TABLE `poller_output` (
  `local_data_id` mediumint(8) unsigned NOT NULL default '0',
  `rrd_name` varchar(19) NOT NULL default '',
  `time` datetime NOT NULL default '0000-00-00 00:00:00',
  `output` text NOT NULL,
  PRIMARY KEY  (`local_data_id`,`rrd_name`,`time`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 | 
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
|        0 | 
+----------+
1 row in set (0.00 sec)

[cacti]#  ls poller_output*
poller_output.frm  poller_output.MYD  poller_output.MYI