How-to setup MySQL HA by using keepalived

With MySQL replication and keepalived, we can setup a quite robust high available MySQL environment in a few steps:

Environment:
Host1: db01.wordpress.com
Host2: db02.wordpress.com
DBVIP: mysql.wordpress.com 10.0.0.1

1. Setup MySQL Master-Master replication
Ref: Setup MySQL replication

2. Install keepalived at both hosts

--using apt-get, for Ubuntu
apt-get install keepalived

--using yum, for Redhat
yum install keepalive

3. Config keepalived

1) Add keepalived config file /etc/keepalived/keepalived.conf
Config file for host db01:

! Configuration File for keepalived
global_defs {
      notification_email {
        alexzeng@wordpress.com
      }
      notification_email_from alexzeng@wordpress.com
      smtp_server mx.wordpress.com
      smtp_connect_timeout 30
      router_id mysql-ha
      }

vrrp_script check_mysql {
   script "/mysql/keepalived_check.sh db02.wordpress.com"
   interval 2
   weight 2
}

vrrp_instance VI_1 {
      state BACKUP
      interface eth1
      virtual_router_id 51
      priority 100
      advert_int 1
      nopreempt  # only needed on higher priority node
      authentication {
      auth_type PASS
      auth_pass 1111
      }

      track_script {
        check_mysql
      }
      virtual_ipaddress {
        10.0.0.1/24 dev eth1 label eth1:1
      }
      notify_master /mysql/keepalived_master.sh
      notify_backup /mysql/keepalived_backup.sh
}

Config file of host db02:

Copy the config file in db01, and change this line:
From
   script "/mysql/keepalived_check.sh db02.wordpress.com"
to
   script "/mysql/keepalived_check.sh db01.wordpress.com"

2) Add scripts to both nodes
/mysql/keepalived_check.sh : monitor MySQL (for the host/network down, keepalived has internal mechanism to monitor them)

#!/bin/bash
# monitor mysql status
# if this node mysql is dead and its slave delay less than 120 seconds, then stop its keepalived. The other node will bind the IP.

export MYSQL_HOME=/mysql
export PATH=$MYSQL_HOME/bin:$PATH

mysql="$MYSQL_HOME/bin/mysql"
delay_file="$MYSQL_HOME/slave_delay_second.log"
slave_host=$1

$mysql -u root --connect_timeout=3 --execute="select version();"

if [ $? -ne 0 ]; then
 delayseconds=`cat $delay_file`
 if [ $delayseconds -le 120 ]; then
   /etc/init.d/keepalived stop
 fi
 exit 1 #bad
fi

# Get slave delay time and save it
$mysql -urepluser -prepluser -h$slave_host --connect_timeout=3 -e"select version();"
if [ $? -eq 0 ]; then
  delayseconds=`$mysql -urepluser -prepluser -h$slave_host --connect_timeout=3 -e"show slave status\G"|grep Seconds_Behind_Master|awk '{print \$2}'`
  if [[ "$delayseconds" =~ ^[0-9]+$ ]] ; then
     echo "$delayseconds" > $delay_file
  else
     echo "9999" > $delay_file
  fi
fi
exit 0 #good

/mysql/keepalived_master.sh : it will be called when the node becomes master

#!/bin/bash

my_host=`hostname`
current_date=`/bin/date +"%b %d %H:%M:%S"`
From="$my_host"
mail_list=alexzeng@wordpress.com

Subject="$my_host is MASTER"
Msgboday="$current_date : mysql.wordpress.com is online at $my_host"
echo "$Msgboday" | /usr/bin/mailx  -s "$Subject" "$mail_list"

/mysql/keepalived_backup.sh : it will be called when the node becomes slave

#!/bin/bash

my_host=`hostname`
current_date=`/bin/date +"%b %d %H:%M:%S"`
From="$my_host"
mail_list=alexzeng@wordpress.com

Subject="$my_host is BACKUP"
Msgboday="$current_date : mysql.wordpress.com is offline at $my_host"
echo "$Msgboday" | /usr/bin/mailx  -s "$Subject" "$mail_list"

4. Start keepalived at both nodes

service keepalived start
or 
/etc/init.d/keepalived start

Check its log file at /var/log/messages

5. Test it
Scenarios:
A. Stop MySQL at the master node
B. Shutdown master node network
C. Shutdown master node OS
D. Split-brain (the nodes cannot connect to each other) – In my test, keepalived didn’t do anything in this situation.

Check result:
1) Check emails
2) Check IP using ifconfig at both nodes
2) Connect to DB without stop:

 while true loop
 do
 date
 mysql -urepluser -prepluser -hmysql.wordpress.com -e"select @@hostname;"
 sleep 1
 done;

In my test, the db cannot be connected for just 2 seconds.

Mon Oct 29 22:30:51 GMT+7 2012
+---------------+
| @@hostname    |
+---------------+
| db01          | 
+---------------+
Mon Oct 29 22:30:52 GMT+7 2012
ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql.wordpress.com' (111)
Mon Oct 29 22:30:53 GMT+7 2012
ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql.wordpress.com' (111)
Mon Oct 29 22:30:54 GMT+7 2012
+---------------+
| @@hostname    |
+---------------+
| db02          | 
+---------------+

About Alex Zeng
I would be very happy if this blog can help you. I appreciate every honest comments. Please forgive me if I'm too busy to reply your comments in time.

9 Responses to How-to setup MySQL HA by using keepalived

  1. Marama says:

    Hi!
    I don’t get it, why are both keepalived.conf having the same priority? And state (BACKUP)? Shouldn’t the one with higher priority be prefered master and slave should take over only when prefered master looses the points?

    I’ve tried to setup my system as specified above, but no luck: I can’t seem to be able to get the server01 to be prefered server.

    • Alex Zeng says:

      In my case, I just let it keep the same. Once the primary is down, the slave will take the role. The VIP will not failover back to the old primary even it’s up later.
      Regards,
      Alex

  2. Stephen says:

    What happens to database connections which were established prior to a failover? Do they get interrupted and have to be re-established?

  3. tquang says:

    With this guid, MySQL server just right like as title of thread is HA (failed-over). Good for repication Master-Master

    But I want to config it as Load Balancing sharing. If have 2 request at concurent:
    _1 request will go to MySQL 1
    _1 request will go to MySQL2

    How can I config that with KeepAlived?

  4. xtium says:

    No luck… I cannot ping the VIP after setup with the configurations in this blog… use “ip add” command cannot show the VIP on both server.

    • xtium says:

      Okey, I found the problem… I have another pair LVS + Keepalived directors which have the same virtual_router_id as my MySQL-HA pair. That will cause errors in /var/log/syslog like:

      Mar 9 07:32:52 keepalivet2 Keepalived_vrrp: VRRP_Instance(VI_1) Dropping received VRRP packet…
      Mar 9 07:32:53 keepalivet2 Keepalived_vrrp: ip address associated with VRID not present in received packet : 1992032266
      Mar 9 07:32:53 keepalivet2 Keepalived_vrrp: one or more VIP associated with VRID mismatch actual MASTER advert
      Mar 9 07:32:53 keepalivet2 Keepalived_vrrp: bogus VRRP packet received on eth0 !!!

      just change the virtual_router_id value to another one will fix the problem.

  5. ajames says:

    When the keepalived_check.sh fails I am having problems getting keepalived process to stop:

    ” Process [6393] didn’t respond to SIGTERM”

Leave a comment