How-to rebuild a MongoDB node

When a secondary MongoDB node’s lag is too far to catch up the primary, or we want to shrink the data files, we can just rebuild the node as follows:

  1. Make sure majority nodes are working fine if we shutdown the one to be rebuild
  2. Shutdown the mongod (For primary: run rs.stepDown() to change role to secondary first)
  3. Check the replicat-set status at other node: rs.status(), should have 1 primary
  4. Remove all files and directory at dbpath
  5. Start mongod and it will recover by itself

If there is a big index, the secondaries will need to rebuild index. It could run too too long time, may never complete as follows:

Tue Jun 26 11:27:20 [rsSync] build index zeng.AlexTable { MD5: 1 }
                   3000000/773362927       0%
                   6000000/773362927       0%
                   9000000/773362927       1%
    ...
                   769775300/773362927   99%
                   773000000/773362927   99%
Tue Jun 26 12:08:09 [rsSync]          external sort used : 774 files  in 2448 secs
                   107600/773362927          0%
                   267900/773362927          0%
                   423800/773362927          0%
…
Tue Jun 26 17:34:06 [initandlisten] connection accepted from 10.254.242.244:52894 #373
Tue Jun 26 17:34:06 [conn373] end connection 10.254.242.244:52894
                   324598300/773362927   41%
                   324763700/773362927   41%

In this case, We can do cold backup to restore the secondary as follows :

  1. Shutdown one node (the secondary is preferred, if no secondary in good status, we need to shutdown primary). If after shutdown the node, no primary will be available, we will need to notify customer beforehand.
  2. Remove all files all files and directory at dbpath
  3. Copy all files from the shutdown node dbpath, the journal directory and mongod.lock is not needed.
  4. Start up mongod, check mongod.log to make sure it works

If you copied the mongod.lock from the source, and didn’t delete it before startup the target node, you will get this error in mongod.log:

**************
old lock file: /data/mongo/data/mongod.lock.  probably means unclean shutdown,
but there are no journal files to recover.
this is likely human error or filesystem corruption.
found 4 dbs.
see: http://dochub.mongodb.org/core/repair for more information
*************
Wed Jun 27 23:21:00 [initandlisten] exception in initAndListen: 12596 old lock file, terminating
Wed Jun 27 23:21:00 dbexit:
Wed Jun 27 23:21:00 [initandlisten] shutdown: going to close listening sockets...
Wed Jun 27 23:21:00 [initandlisten] shutdown: going to flush diaglog...
Wed Jun 27 23:21:00 [initandlisten] shutdown: going to close sockets...
Wed Jun 27 23:21:00 [initandlisten] shutdown: waiting for fs preallocator...
Wed Jun 27 23:21:00 [initandlisten] shutdown: lock for final commit...
Wed Jun 27 23:21:00 [initandlisten] shutdown: final commit...
Wed Jun 27 23:21:00 [initandlisten] shutdown: closing all files...
Wed Jun 27 23:21:00 [initandlisten] closeAllFiles() finished
Wed Jun 27 23:21:00 dbexit: really exiting now

For more info, you can reference online MongoDB document expand-replica-set

About Alex Zeng
I would be very happy if this blog can help you. I appreciate every honest comments. Please forgive me if I'm too busy to reply your comments in time.

Leave a comment