Cacti host run out of capacity

Our cacti host in production run out of  capacity recently. We use cacti to create graphs for MySQL, including Innodb and Memory engine db, and MongoDB. The key benefits of cacti is that it’s easy for users to understand, and our developers can easily check DB performance metrics by themselves. It’s also easy for DBA to setup them because it didn’t need to setup/maintain agents at each DB host.

The benefit of easy-setup is also causes problem: all poller actions have to be done at cacti server. We run into performance problem about a year ago: the poller cannot finish all poll items in 1 minute. I replaced the php poller with spine which is is written in native C and more powerful. It started to work fine without problem. As we have more and more hosts added, I adjusted the “Maximum Concurrent Poller Processes” and “Maximum Threads per Process” at the same steps, and cacti kept hold its position to finish pollers in 1 minute.

At the same time, the hosts load kept increase, and reached 45 recently on this 24 virtual CPU(2 cores) physical host. It starts to run timeout for some hosts recently. I tried to adjust “Maximum Concurrent Poller Processes” and “Maximum Threads per Process”, but it didn’t help. The host load 45 is already much more than its 24 CPU number. It’s already overloaded. We can upgrade cacti host to more powerful host to scale-up, but it didn’t solve the scale-out problem. It’ll run into the same problem sooner or later.

At this time, cacti handles ~800 hosts with 18k datasources and 18k RRDs in 1 minute. The  “Maximum Concurrent Poller Processes”  is 3 and “Maximum Threads per Process” is 60. It finishes each round in 57 seconds in average. The serever CPU mode is ” Intel(R) Xeon(R) CPU   X5670  @ 2.93GHz”, 2 cores with 24 VCPU.

Although cacti has “Distributed and remote polling” in it’s road map, but the release date is unknown. That’ll help solve the problem of putting all load on a standalone host. We decided to stop adding more hosts to cacti, and pursuit the other solution.

Advertisements

Debug and fix cacti graph trees and hosts missing problem

Recently we run into a problem that some trees and hosts were disappeared suddenly from graph page, but they existed in Management “Graph Trees” list.

I googled around and didn’t find a useful clue. So I debug the problem myself. I checked around and opened the graph page to see if there’s any error. I guess it may caused by some JavaScript errors, and silently ignored by web browser. So I opened chrome’s console. I found there’s an error as follow:

cacti_error

It showed something’s wrong with the host “crp-wikidbstg02_3307”. So it’s a problem caused or triggered by this host, either run into a cacti bug, or this host has some problems in configuration. One way is to check the detail config data of this host in cacti and find out the flaw. I chose the other way, an easy way: just delete this host, and let the auto-add job to re-add it later. As expected, the missed trees and hosts were back in graph page, even after the host is re-added. 

If you run into the same problem, you may try this way to see if it’s a similar problem.

A script to debug cacti

My check_cacti.sh reports such warning frequently.

01/09/2013 05:50:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
01/09/2013 05:53:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
01/09/2013 05:56:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
01/09/2013 05:59:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate

It basically means some pollers cannot finish in time. But it didn’t report which poller items.

So I wrote below script to check it.

#!/bin/bash

if [ $# -lt 1 ]; then
cat <<EOF
  Check long run cacti poller commands
  usage:  debug_cacti.sh <threshold_second> [thresh_minutes]
  examples:      
          debug_cacti.sh 30   #check command run longer than 30 seconds in 3 minutes
          debug_cacti.sh 30 5 #check command run longer than 30 seconds in 5 minutes
EOF
  exit 1
fi

# arguments
thresh_sec="$1"
thresh_min="$2"
if [ "alex$thresh_min" = "alex" ] ; then
   thresh_min=3
fi

# configuration


echo "Check poller command longer than $thresh_sec seconds at next $thresh_min minutes..."
echo "Start time : `date +%M:%S`"
start_min=`date +%M`
min=`date +%M`
elapsed_min=0
while [ $elapsed_min -lt $thresh_min ] 
do
  #sleep a while
  sleep_sec=`expr $thresh_sec - $sec + 1`
  if [ $sleep_sec -gt 0 ] ; then
    echo "Sleep $sleep_sec seconds "
    sleep $sleep_sec
  fi

  sec=`date +%S`
  while [ $sec -gt $thresh_sec ] 
  do
    echo "Time : $min:$sec"
    ps -ef |grep php |grep -v grep
    sleep 3
    sec=`date +%S`
  done

  #calculate elapsed time
  min=`date +%M`
  elapsed_min=`expr $min - $start_min`
  if [ $elapsed_min -lt 0 ] ; then
     #round at 60
     elapsed_min=`expr $elapsed_min + 60`
  fi
done

One of my output as follow, it shows that the poller on the cassandra host is slow (the last one to finish).

./debug_cacti.sh 30
Check poller command longer than 30 at next 3 minutes...
Start time : 29:
Time = 29:31
root     27597     1  0 23:28 ?        00:00:00 /usr/bin/php -q /export/home/cacti-0.8.7g/cmd.php 0 67
root     29338 29337  0 23:29 ?        00:00:00 /bin/sh -c php /var/www/html/cacti/poller.php > /dev/null  2>&1 
root     29339 29338  1 23:29 ?        00:00:00 php /var/www/html/cacti/poller.php
root     30795 27597  0 23:29 ?        00:00:00 php /export/home/cacti-0.8.7g/scripts/ss_get_cassandra_stats.php --host sharedcass.alexzeng.wordpress.com --port --user --pass --items dp
Time = 29:34
root     27597     1  0 23:28 ?        00:00:00 /usr/bin/php -q /export/home/cacti-0.8.7g/cmd.php 0 67
root     30981 27597  0 23:29 ?        00:00:00 php /export/home/cacti-0.8.7g/scripts/ss_get_cassandra_stats.php --host sharedcass.alexzeng.wordpress.com --port --user --pass --items dc,dd
Time = 29:37
Time = 29:40
Time = 29:43
Time = 29:46
Time = 29:49
Time = 29:52
Time = 29:55
Time = 29:58
...

Now I got the problem. It’s halfway success.

A script to check cacti

A simple script to check whether cacti works well.

#!/bin/bash

if [ $# -lt 2 ]; then
cat <<EOF
  Check cacti log
  usage:  check_cacti.sh <CACTI_HOME>
  examples:      
          check_cacti.sh /export/home/cacti 30   #check last 30 lines in cacti log
EOF
  exit 1
fi

# arguments
CACTI_HOME="$1"
lines_to_check="$2"

# configuration
MAIL_TO="alexzeng@wordpress.com"
CACTI_LOG="$CACTI_HOME/log/cacti.log"
CHECK_LOG="$CACTI_HOME/log/check_cacti.log"

echo "`date` : Start" > $CHECK_LOG
echo ""  >> $CHECK_LOG

#check if the logfile get updated
current=`date +%s`
last_modified=`stat -c "%Y" $CACTI_LOG`
if [ $(($current-$last_modified)) -gt 3600 ]; then 
  title="CACTI log didn't get updated in last hour"; 
  ls -lt $CACTI_LOG >> $CHECK_LOG
fi

#check warning or error msg in log, only if cacti log is updated
if [ "alex$title" = "alex" ] ; then
  warining=`tail -$lines_to_check $CACTI_LOG | grep -i -e warning -e error`
  if [ "alex$warining" = "alex" ] ; then
    echo "No warning or error msg" >> $CHECK_LOG
  else
    title="found warning/error in $CACTI_LOG"
    tail -$lines_to_check $CACTI_LOG | grep -i -e warning -e error >> $CHECK_LOG
  fi
fi

echo ""  >> $CHECK_LOG
echo "`date` : End" >> $CHECK_LOG

#send mail
if [ "alex$title" != "alex" ] ; then
  title="$title at `hostname`"
  mailx -s "$title" $MAIL_TO < $CHECK_LOG
fi

Add a cron job to check cacti every hour.

#check cacti
0 * * * *   /export/home/cacti/cli/check_cacti.sh /export/home/cacti 30 

Backup Cacti script

This is a simple script to backup cacti.

#!/bin/bash

if [ $# -lt 2 ]; then
cat <<EOF
  Backup cacti binary
  usage:  backup_cacti.sh <BACKUP_DIR> <BACKUP_KEEP_DAYS>
  examples:      
          backup_cacti.sh /data01/cacti/cacti_backup 7       --backup cacti directory, and keep 7 days
EOF
  exit 1
fi

# arguments
BACKUP_DIR="$1"
BACKUP_KEEP_DAYS="$2"

# configuration
#The cacti home should be the real home, not softlink, otherwise tar will only tar the softlink
CACTI_DIR="cacti-0.8.7g"
CACTI_BASE="/home/mysql"
CACTI_HOME="$CACTI_BASE/$CACTI_DIR"
MAIL_TO="alexzeng@wordpress.com"

# prepare backup
DATE=`date +"%Y.%m.%d"`
BACKUP_FILE="$BACKUP_DIR/cacti.$DATE.tar.gz"

# convert rrd file to xml file, which can be restore to rra
cd $CACTI_HOME/rra
mkdir -p $CACTI_HOME/rraxml
rm -f $CACTI_HOME/rraxml/*
# only backup rra files updated in last 7 days 
for f in `find *.rrd -type f -mtime -7 `
do
  /usr/bin/rrdtool dump $f $CACTI_HOME/rraxml/$f.xml
done

# backup cacti all files except rra files because they cannot be used directly for restore
cd $CACTI_BASE
tar --exclude='rra' -zcf $BACKUP_FILE $CACTI_DIR
if [ $? -ne 0 ]; then
  mailx -s "Backup cacti cron job $0 failed" $MAIL_TO <<EOF
Backup files list :
`ls -lt $BACKUP_DIR/cacti.*.tar.gz`
EOF
else
  cat /dev/null > $CACTI_HOME/log/cacti.log
fi

#delete old files
cd $BACKUP_DIR
for backup in `find . -ctime +$BACKUP_KEEP_DAYS -name "cacti.*.tar.gz"`; do rm -f $backup; done;

It will convert the rra files to xml files for backup purpose (copying rra files directly may not work).

How to install spine for cacti

As cacti.net said, Spine is the fast replacement for cmd.php. So I started to install it for cacti by reference this doc Spine install. I run into lots of issues. Later, I found there is a better guide Install and Configure Spine. But it’s still not enough.

The solution in simple word is that we need to run ./bootstrap in the same directory before we run ./configure. The long store is that, in order to run bootstrap, we need to install more packages.

During my practice, Spine did improved the performance.

cmd.php :
&quot;Maximum Concurrent Poller Processes&quot;=20 cannot finish in 1 minute, CPU usage maximum is 65%

Spine:
&quot;Maximum Concurrent Poller Processes&quot;=10 and &quot;Maximum Threads per Process&quot;=10 cannot finish in 1 minute, CPU usage reached 100%
&quot;Maximum Concurrent Poller Processes&quot;=5 and &quot;Maximum Threads per Process&quot;=10  finish in 36 seconds, CPU usage reached 99%
&quot;Maximum Concurrent Poller Processes&quot;=3 and &quot;Maximum Threads per Process&quot;=20  finish in 33 seconds, CPU usage is lower
&quot;Maximum Concurrent Poller Processes&quot;=2 and &quot;Maximum Threads per Process&quot;=30  finish in 33 seconds, CPU usage is 97%
&quot;Maximum Concurrent Poller Processes&quot;=1 and &quot;Maximum Threads per Process&quot;=30  finish in 44 seconds, CPU usage maximum is 85%
&quot;Maximum Concurrent Poller Processes&quot;=1 and &quot;Maximum Threads per Process&quot;=20  finish in 44 seconds, CPU usage maximum is 82%
&quot;Maximum Concurrent Poller Processes&quot;=1 and &quot;Maximum Threads per Process&quot;=15  finish in 56 seconds, CPU usage maximum is 60%
&quot;Maximum Concurrent Poller Processes&quot;=1 and &quot;Maximum Threads per Process&quot;=10  cannot finish in 1 minute, CPU usage maximum is 30%

Finally, I use 1 process with 20 threads. That meets 1 minute limit, and still have some buffer, and lower CPU usage. Here is the log:
12/26/2012 03:17:47 AM – SYSTEM STATS: Time:45.7329 Method:spine Processes:1 Threads:20 Hosts:443 HostsPerProcess:443 DataSources:10714 RRDsProcessed:10714

From those data, I got conclusions as follows:

  1. If you change from cmd.php to Spine, set processes number to 1, and set use its number for the threads per process, or less
  2. Spine is CPU killer, unless you have enough powerful CPU, set “Maximum Concurrent Poller Processes” to 1.
  3. Increase maximum threads to meet the cycle time, but once it meats, add more threads didn’t reduce the response time
  4. To reduce the response time, use more processes with Spine, which comes along with high CPU cost

Here is the whole processes:

--in Summary, packages installed:
autoconf
dos2unix
automake
libtool

# ./configure
checking build system type... Invalid configuration `x86_64-unknown-linux-': machine `x86_64-unknown-linux' not recognized
configure: error: /bin/sh config/config.sub x86_64-unknown-linux- failed

--get hint from this page http://forums.cacti.net/viewtopic.php?f=5&amp;t=46320 , run bootstrap first

[root@phxdbx1112 cacti-spine-0.8.8a]# ./bootstrap
FATAL: Unable to locate dos2unix utility

[root@ cacti-spine-0.8.8a]# yum install dos2unix

[root@ cacti-spine-0.8.8a]# ./bootstrap
INFO: Starting Spine build process
INFO: Removing cache directories
INFO: Running auto-tools to verify buildability
./bootstrap: line 51: autoreconf: command not found
ERROR: 'autoreconf' exited with errors

[root@ cacti-spine-0.8.8a]# yum install autoconf

[root@ cacti-spine-0.8.8a]# ./bootstrap
INFO: Starting Spine build process
INFO: Removing cache directories
INFO: Running auto-tools to verify buildability
Can't exec &quot;aclocal&quot;: No such file or directory at /usr/share/autoconf/Autom4te/FileUtils.pm line 326.
autoreconf: failed to run aclocal: No such file or directory
ERROR: 'autoreconf' exited with errors

[root@ cacti-spine-0.8.8a]# yum install automake

[root@ cacti-spine-0.8.8a]# ./bootstrap
INFO: Starting Spine build process
INFO: Removing cache directories
INFO: Running auto-tools to verify buildability
configure.ac:69: error: possibly undefined macro: AC_PROG_LIBTOOL
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
autoreconf: /usr/bin/autoconf failed with exit status: 1
ERROR: 'autoreconf' exited with errors

[root@ cacti-spine-0.8.8a]# yum install libtool

[root@ cacti-spine-0.8.8a]# ./bootstrap
INFO: Starting Spine build process
INFO: Removing cache directories
INFO: Running auto-tools to verify buildability
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'.
libtoolize: copying file `config/ltmain.sh'
libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.ac and
libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree.
libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
libtoolize: `AC_PROG_RANLIB' is rendered obsolete by `LT_INIT'
INFO: Spine bootstrap process completed

--Finally, bootstrap run successfully

[root@ cacti-spine-0.8.8a]# ./configure
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for gawk... (cached) gawk
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... gcc -E
checking for a BSD-compatible install... /usr/bin/install -c
checking whether ln -s works... yes
checking for a sed that does not truncate output... /bin/sed
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking the maximum length of command line arguments... 1966080
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands &quot;+=&quot;... yes
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for ar... ar
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... no
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking for ranlib... (cached) ranlib
checking whether to enable -Wall... no
checking for threadsafe gethostbyname()... no
checking for gethostbyname_r in -lnls... no
checking for socket in -lsocket... no
checking for floor in -lm... yes
checking for pthread_exit in -lpthread... yes
checking for deflate in -lz... yes
checking for kstat_close in -lkstat... no
checking for CRYPTO_realloc in -lcrypto... yes
checking for ANSI C header files... (cached) yes
checking sys/socket.h usability... yes
checking sys/socket.h presence... yes
checking for sys/socket.h... yes
checking sys/select.h usability... yes
checking sys/select.h presence... yes
checking for sys/select.h... yes
checking sys/wait.h usability... yes
checking sys/wait.h presence... yes
checking for sys/wait.h... yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking assert.h usability... yes
checking assert.h presence... yes
checking for assert.h... yes
checking ctype.h usability... yes
checking ctype.h presence... yes
checking for ctype.h... yes
checking errno.h usability... yes
checking errno.h presence... yes
checking for errno.h... yes
checking signal.h usability... yes
checking signal.h presence... yes
checking for signal.h... yes
checking math.h usability... yes
checking math.h presence... yes
checking for math.h... yes
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking netdb.h usability... yes
checking netdb.h presence... yes
checking for netdb.h... yes
checking for signal.h... (cached) yes
checking stdarg.h usability... yes
checking stdarg.h presence... yes
checking for stdarg.h... yes
checking stdio.h usability... yes
checking stdio.h presence... yes
checking for stdio.h... yes
checking syslog.h usability... yes
checking syslog.h presence... yes
checking for syslog.h... yes
checking for netinet/in_systm.h... yes
checking for netinet/in.h... yes
checking for netinet/ip.h... yes
checking for netinet/ip_icmp.h... yes
checking for unsigned long long... yes
checking for long long... yes
checking for an ANSI C-conforming const... yes
checking for size_t... yes
checking whether time.h and sys/time.h may both be included... yes
checking whether struct tm is in sys/time.h or time.h... time.h
checking return type of signal handlers... void
checking for malloc... yes
checking for calloc... yes
checking for gettimeofday... yes
checking for strerror... yes
checking for strtoll... yes
checking priv.h usability... no
checking priv.h presence... no
checking for priv.h... no
checking whether we are using Solaris privileges... no
checking sys/capability.h usability... no
checking sys/capability.h presence... no
checking for sys/capability.h... no
checking whether we are using Linux Capabilities... no
checking for mysql_init in -lmysqlclient_r... yes
checking for mysql_thread_init in -lmysqlclient_r... yes
checking if UCD-SNMP needs crypto support... no
checking if Net-SNMP needs crypto support... yes
checking for snmp_timeout in -lnetsnmp... yes
checking for the spine results buffer size... 1024 bytes
checking for the maximum simultaneous spine scripts... 20
checking for the maximum MySQL buffer size... 65536
checking whether we are using traditional popen... no
checking whether to verify net-snmp library vs header versions... no
checking for glibc gethostbyname_r... yes
checking for Solaris/Irix gethostbyname_r... no
checking for HP-UX gethostbyname_r... no
configure: creating ./config.status
config.status: creating Makefile
config.status: creating config/config.h
config.status: executing depfiles commands
config.status: executing libtool commands
[root@ cacti-spine-0.8.8a]# make
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT sql.o -MD -MP -MF .deps/sql.Tpo -c -o sql.o sql.c
mv -f .deps/sql.Tpo .deps/sql.Po
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT spine.o -MD -MP -MF .deps/spine.Tpo -c -o spine.o spine.c
mv -f .deps/spine.Tpo .deps/spine.Po
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT util.o -MD -MP -MF .deps/util.Tpo -c -o util.o util.c
mv -f .deps/util.Tpo .deps/util.Po
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT snmp.o -MD -MP -MF .deps/snmp.Tpo -c -o snmp.o snmp.c
mv -f .deps/snmp.Tpo .deps/snmp.Po
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT locks.o -MD -MP -MF .deps/locks.Tpo -c -o locks.o locks.c
mv -f .deps/locks.Tpo .deps/locks.Po
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT poller.o -MD -MP -MF .deps/poller.Tpo -c -o poller.o poller.c
mv -f .deps/poller.Tpo .deps/poller.Po
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT nft_popen.o -MD -MP -MF .deps/nft_popen.Tpo -c -o nft_popen.o nft_popen.c
mv -f .deps/nft_popen.Tpo .deps/nft_popen.Po
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT php.o -MD -MP -MF .deps/php.Tpo -c -o php.o php.c
mv -f .deps/php.Tpo .deps/php.Po
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT ping.o -MD -MP -MF .deps/ping.Tpo -c -o ping.o ping.c
mv -f .deps/ping.Tpo .deps/ping.Po
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT keywords.o -MD -MP -MF .deps/keywords.Tpo -c -o keywords.o keywords.c
mv -f .deps/keywords.Tpo .deps/keywords.Po
gcc -DHAVE_CONFIG_H -I. -I./config     -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -MT error.o -MD -MP -MF .deps/error.Tpo -c -o error.o error.c
mv -f .deps/error.Tpo .deps/error.Po
/bin/sh ./libtool --tag=CC   --mode=link gcc  -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2  -L/usr/lib64 -L/usr/lib64/mysql  -o spine sql.o spine.o util.o snmp.o locks.o poller.o nft_popen.o php.o ping.o keywords.o error.o  -lnetsnmp -lmysqlclient_r -lmysqlclient_r -lcrypto -lz -lpthread -lm
libtool: link: gcc -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -o spine sql.o spine.o util.o snmp.o locks.o poller.o nft_popen.o php.o ping.o keywords.o error.o  -L/usr/lib64 -L/usr/lib64/mysql -lnetsnmp -lmysqlclient_r -lcrypto -lz -lpthread -lm
[root@ cacti-spine-0.8.8a]# make install
make[1]: Entering directory `/mysql/home/cacti-spine-0.8.8a'
test -z &quot;/usr/local/spine/bin&quot; || /bin/mkdir -p &quot;/usr/local/spine/bin&quot;
  /bin/sh ./libtool   --mode=install /usr/bin/install -c spine '/usr/local/spine/bin'
libtool: install: /usr/bin/install -c spine /usr/local/spine/bin/spine
test -z &quot;/usr/local/spine/etc&quot; || /bin/mkdir -p &quot;/usr/local/spine/etc&quot;
 /usr/bin/install -c -m 644 spine.conf.dist '/usr/local/spine/etc'
make[1]: Leaving directory `/mysql/home/cacti-spine-0.8.8a'

[root@ cacti-spine-0.8.8a]# ls -lt /usr/local/spine/bin/
total 264
-rwxr-xr-x 1 root root 269419 Dec 25 19:25 spine

[root@ cacti-spine-0.8.8a]# cp spine.conf.dist /etc/spine.conf

[root@ cacti-spine-0.8.8a]# vi  /etc/spine.conf
--make sure the information to connect to mysql db is right

--test it
[root@ cacti-spine-0.8.8a]#  /usr/local/spine/bin/spine
SPINE: Using spine config file [/etc/spine.conf]
SPINE: Version 0.8.8a starting

^C12/25/2012 07:34:35 PM - SPINE: Poller[0] FATAL: Spine Interrupted by Console Operator (Spine thread)
[root@ cacti-spine-0.8.8a]#

A few other problems I got :
1. “WARNING: Max OIDS is out of range with value of ‘0’. Resetting to default of 5”.
I didn’t find a clue by google. But I did find the source code of this warning message in poller.c. After looking into the code, I understood it means that the max_oids of hosts are 0 in my cacti. Then I got the solution :

First, update the current max_oids. I think it’s because “The Maximum SNMP OID’s Per SNMP Get Request” is set to 0 in the Spine section before.


mysql&gt; update host set max_oids=1 where max_oids=0;
Query OK, 435 rows affected (0.01 sec)
Rows matched: 435  Changed: 435  Warnings: 0

mysql&gt; commit;
Query OK, 0 rows affected (0.00 sec)

Second, set “The Maximum SNMP OID’s Per SNMP Get Request” to 5 to avoid new hosts run into the same issue.

2. “SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal”
This is because the Spine cannot finish the polling in time.

3. “POLLER: Poller[0] WARNING: There are ‘1’ detected as overrunning a polling process, please investigate”
The number ‘1’ can be others as well, ‘2’ or ‘3’, etc. The reason is the same as previous.

The solution is to “poller cannot finish in time”:
1) Adjust your polling processes/threads.
2) Remove/disable any bad hosts that cannot connect, or reduce “Script and Script Server Timeout Value” to a reasonable value.
3) If possible, let poller use cache. For my case, the script ss_get_by_ssh_script.php has a cache mechanism.
4) Using Spine instead of cmd.php

How-to use scripts to create cacti templates

If you have ever created a cacti templates from scratch, you would know it’s a very tedious work. I am very lucky to find this site. Later it moved to this new place. There are plenty of documents to let us know how it works while I will just share my experience of creating an MongoDB template.

First, create a definition file.

It can be copied from an existing one and change it. Referenceand.

You may need to unique the hash value using this script.

tools/unique-hashes.pl  mongodb_definitions_tmp.pl > mongodb_definitions.pl
tools/unique-hashes.pl  --refresh mongodb_definitions_tmp.pl > mongodb_definitions.pl

Secondly, generate the cacti template.

make-template.pl  --mpds port2 --script /export/home/cacti/scripts/ss_get_mongo_stats.php mongodb_definitions.pl > mongo.xml

Here are the source files(you need to change the file extension name back after download):
mongodb_definitions.pl : this one need to be created manually. It’s not so difficult if you know the structure.
make-template.pl : I added the support of “friend name” to it.
ss_get_mongo_stats.php : this one is changed from ss_get_by_ssh.php. One of the outstanding changes is the change to the map array as follows:

<pre>$keys_map = array(
   'MONGODB_connected_clients'            => array('short' => 'ma',  'source' => 'Connections'),
   'MONGODB_active_sessions'              => array('short' => 'mb',  'source' => 'Active'),
   'MONGODB_used_resident_memory'         => array('short' => 'mc',  'source' => 'residentMem'),
   'MONGODB_used_mapped_memory'           => array('short' => 'md',  'source' => 'mappedMem'),
...

The performance data is from a webpage like:

opInsert=36868243
opQuery=4290620
opUpdate=20708814
opDelete=608893
opGetmore=255241
opCommand=22943868
...

As always, it’s just an example to help you get started. You can change whatever you like once you know it.