How-to setup MySQL HA by using keepalived
October 31, 2012 9 Comments
With MySQL replication and keepalived, we can setup a quite robust high available MySQL environment in a few steps:
Environment:
Host1: db01.wordpress.com
Host2: db02.wordpress.com
DBVIP: mysql.wordpress.com 10.0.0.1
1. Setup MySQL Master-Master replication
Ref: Setup MySQL replication
2. Install keepalived at both hosts
--using apt-get, for Ubuntu apt-get install keepalived --using yum, for Redhat yum install keepalive
3. Config keepalived
1) Add keepalived config file /etc/keepalived/keepalived.conf
Config file for host db01:
! Configuration File for keepalived global_defs { notification_email { alexzeng@wordpress.com } notification_email_from alexzeng@wordpress.com smtp_server mx.wordpress.com smtp_connect_timeout 30 router_id mysql-ha } vrrp_script check_mysql { script "/mysql/keepalived_check.sh db02.wordpress.com" interval 2 weight 2 } vrrp_instance VI_1 { state BACKUP interface eth1 virtual_router_id 51 priority 100 advert_int 1 nopreempt # only needed on higher priority node authentication { auth_type PASS auth_pass 1111 } track_script { check_mysql } virtual_ipaddress { 10.0.0.1/24 dev eth1 label eth1:1 } notify_master /mysql/keepalived_master.sh notify_backup /mysql/keepalived_backup.sh }
Config file of host db02:
Copy the config file in db01, and change this line: From script "/mysql/keepalived_check.sh db02.wordpress.com" to script "/mysql/keepalived_check.sh db01.wordpress.com"
2) Add scripts to both nodes
/mysql/keepalived_check.sh : monitor MySQL (for the host/network down, keepalived has internal mechanism to monitor them)
#!/bin/bash # monitor mysql status # if this node mysql is dead and its slave delay less than 120 seconds, then stop its keepalived. The other node will bind the IP. export MYSQL_HOME=/mysql export PATH=$MYSQL_HOME/bin:$PATH mysql="$MYSQL_HOME/bin/mysql" delay_file="$MYSQL_HOME/slave_delay_second.log" slave_host=$1 $mysql -u root --connect_timeout=3 --execute="select version();" if [ $? -ne 0 ]; then delayseconds=`cat $delay_file` if [ $delayseconds -le 120 ]; then /etc/init.d/keepalived stop fi exit 1 #bad fi # Get slave delay time and save it $mysql -urepluser -prepluser -h$slave_host --connect_timeout=3 -e"select version();" if [ $? -eq 0 ]; then delayseconds=`$mysql -urepluser -prepluser -h$slave_host --connect_timeout=3 -e"show slave status\G"|grep Seconds_Behind_Master|awk '{print \$2}'` if [[ "$delayseconds" =~ ^[0-9]+$ ]] ; then echo "$delayseconds" > $delay_file else echo "9999" > $delay_file fi fi exit 0 #good
/mysql/keepalived_master.sh : it will be called when the node becomes master
#!/bin/bash my_host=`hostname` current_date=`/bin/date +"%b %d %H:%M:%S"` From="$my_host" mail_list=alexzeng@wordpress.com Subject="$my_host is MASTER" Msgboday="$current_date : mysql.wordpress.com is online at $my_host" echo "$Msgboday" | /usr/bin/mailx -s "$Subject" "$mail_list"
/mysql/keepalived_backup.sh : it will be called when the node becomes slave
#!/bin/bash my_host=`hostname` current_date=`/bin/date +"%b %d %H:%M:%S"` From="$my_host" mail_list=alexzeng@wordpress.com Subject="$my_host is BACKUP" Msgboday="$current_date : mysql.wordpress.com is offline at $my_host" echo "$Msgboday" | /usr/bin/mailx -s "$Subject" "$mail_list"
4. Start keepalived at both nodes
service keepalived start or /etc/init.d/keepalived start Check its log file at /var/log/messages
5. Test it
Scenarios:
A. Stop MySQL at the master node
B. Shutdown master node network
C. Shutdown master node OS
D. Split-brain (the nodes cannot connect to each other) – In my test, keepalived didn’t do anything in this situation.
Check result:
1) Check emails
2) Check IP using ifconfig at both nodes
2) Connect to DB without stop:
while true loop do date mysql -urepluser -prepluser -hmysql.wordpress.com -e"select @@hostname;" sleep 1 done;
In my test, the db cannot be connected for just 2 seconds.
Mon Oct 29 22:30:51 GMT+7 2012 +---------------+ | @@hostname | +---------------+ | db01 | +---------------+ Mon Oct 29 22:30:52 GMT+7 2012 ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql.wordpress.com' (111) Mon Oct 29 22:30:53 GMT+7 2012 ERROR 2003 (HY000): Can't connect to MySQL server on 'mysql.wordpress.com' (111) Mon Oct 29 22:30:54 GMT+7 2012 +---------------+ | @@hostname | +---------------+ | db02 | +---------------+
Hi!
I don’t get it, why are both keepalived.conf having the same priority? And state (BACKUP)? Shouldn’t the one with higher priority be prefered master and slave should take over only when prefered master looses the points?
I’ve tried to setup my system as specified above, but no luck: I can’t seem to be able to get the server01 to be prefered server.
In my case, I just let it keep the same. Once the primary is down, the slave will take the role. The VIP will not failover back to the old primary even it’s up later.
Regards,
Alex
What happens to database connections which were established prior to a failover? Do they get interrupted and have to be re-established?
They need to be re-established. If you use mysql client, it will auto re-connect to the new host.
-Alex
okies thanks.
With this guid, MySQL server just right like as title of thread is HA (failed-over). Good for repication Master-Master
But I want to config it as Load Balancing sharing. If have 2 request at concurent:
_1 request will go to MySQL 1
_1 request will go to MySQL2
How can I config that with KeepAlived?
No luck… I cannot ping the VIP after setup with the configurations in this blog… use “ip add” command cannot show the VIP on both server.
Okey, I found the problem… I have another pair LVS + Keepalived directors which have the same virtual_router_id as my MySQL-HA pair. That will cause errors in /var/log/syslog like:
Mar 9 07:32:52 keepalivet2 Keepalived_vrrp: VRRP_Instance(VI_1) Dropping received VRRP packet…
Mar 9 07:32:53 keepalivet2 Keepalived_vrrp: ip address associated with VRID not present in received packet : 1992032266
Mar 9 07:32:53 keepalivet2 Keepalived_vrrp: one or more VIP associated with VRID mismatch actual MASTER advert
Mar 9 07:32:53 keepalivet2 Keepalived_vrrp: bogus VRRP packet received on eth0 !!!
just change the virtual_router_id value to another one will fix the problem.
When the keepalived_check.sh fails I am having problems getting keepalived process to stop:
” Process [6393] didn’t respond to SIGTERM”