How-to rebuild a MongoDB node
June 28, 2012 Leave a comment
When a secondary MongoDB node’s lag is too far to catch up the primary, or we want to shrink the data files, we can just rebuild the node as follows:
- Make sure majority nodes are working fine if we shutdown the one to be rebuild
- Shutdown the mongod (For primary: run rs.stepDown() to change role to secondary first)
- Check the replicat-set status at other node: rs.status(), should have 1 primary
- Remove all files and directory at dbpath
- Start mongod and it will recover by itself
If there is a big index, the secondaries will need to rebuild index. It could run too too long time, may never complete as follows:
Tue Jun 26 11:27:20 [rsSync] build index zeng.AlexTable { MD5: 1 } 3000000/773362927 0% 6000000/773362927 0% 9000000/773362927 1% ... 769775300/773362927 99% 773000000/773362927 99% Tue Jun 26 12:08:09 [rsSync] external sort used : 774 files in 2448 secs 107600/773362927 0% 267900/773362927 0% 423800/773362927 0% … Tue Jun 26 17:34:06 [initandlisten] connection accepted from 10.254.242.244:52894 #373 Tue Jun 26 17:34:06 [conn373] end connection 10.254.242.244:52894 324598300/773362927 41% 324763700/773362927 41%
In this case, We can do cold backup to restore the secondary as follows :
- Shutdown one node (the secondary is preferred, if no secondary in good status, we need to shutdown primary). If after shutdown the node, no primary will be available, we will need to notify customer beforehand.
- Remove all files all files and directory at dbpath
- Copy all files from the shutdown node dbpath, the journal directory and mongod.lock is not needed.
- Start up mongod, check mongod.log to make sure it works
If you copied the mongod.lock from the source, and didn’t delete it before startup the target node, you will get this error in mongod.log:
************** old lock file: /data/mongo/data/mongod.lock. probably means unclean shutdown, but there are no journal files to recover. this is likely human error or filesystem corruption. found 4 dbs. see: http://dochub.mongodb.org/core/repair for more information ************* Wed Jun 27 23:21:00 [initandlisten] exception in initAndListen: 12596 old lock file, terminating Wed Jun 27 23:21:00 dbexit: Wed Jun 27 23:21:00 [initandlisten] shutdown: going to close listening sockets... Wed Jun 27 23:21:00 [initandlisten] shutdown: going to flush diaglog... Wed Jun 27 23:21:00 [initandlisten] shutdown: going to close sockets... Wed Jun 27 23:21:00 [initandlisten] shutdown: waiting for fs preallocator... Wed Jun 27 23:21:00 [initandlisten] shutdown: lock for final commit... Wed Jun 27 23:21:00 [initandlisten] shutdown: final commit... Wed Jun 27 23:21:00 [initandlisten] shutdown: closing all files... Wed Jun 27 23:21:00 [initandlisten] closeAllFiles() finished Wed Jun 27 23:21:00 dbexit: really exiting now
For more info, you can reference online MongoDB document expand-replica-set