A script to debug cacti

My check_cacti.sh reports such warning frequently.

01/09/2013 05:50:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
01/09/2013 05:53:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
01/09/2013 05:56:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
01/09/2013 05:59:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate

It basically means some pollers cannot finish in time. But it didn’t report which poller items.

So I wrote below script to check it.

#!/bin/bash

if [ $# -lt 1 ]; then
cat <<EOF
  Check long run cacti poller commands
  usage:  debug_cacti.sh <threshold_second> [thresh_minutes]
  examples:      
          debug_cacti.sh 30   #check command run longer than 30 seconds in 3 minutes
          debug_cacti.sh 30 5 #check command run longer than 30 seconds in 5 minutes
EOF
  exit 1
fi

# arguments
thresh_sec="$1"
thresh_min="$2"
if [ "alex$thresh_min" = "alex" ] ; then
   thresh_min=3
fi

# configuration


echo "Check poller command longer than $thresh_sec seconds at next $thresh_min minutes..."
echo "Start time : `date +%M:%S`"
start_min=`date +%M`
min=`date +%M`
elapsed_min=0
while [ $elapsed_min -lt $thresh_min ] 
do
  #sleep a while
  sleep_sec=`expr $thresh_sec - $sec + 1`
  if [ $sleep_sec -gt 0 ] ; then
    echo "Sleep $sleep_sec seconds "
    sleep $sleep_sec
  fi

  sec=`date +%S`
  while [ $sec -gt $thresh_sec ] 
  do
    echo "Time : $min:$sec"
    ps -ef |grep php |grep -v grep
    sleep 3
    sec=`date +%S`
  done

  #calculate elapsed time
  min=`date +%M`
  elapsed_min=`expr $min - $start_min`
  if [ $elapsed_min -lt 0 ] ; then
     #round at 60
     elapsed_min=`expr $elapsed_min + 60`
  fi
done

One of my output as follow, it shows that the poller on the cassandra host is slow (the last one to finish).

./debug_cacti.sh 30
Check poller command longer than 30 at next 3 minutes...
Start time : 29:
Time = 29:31
root     27597     1  0 23:28 ?        00:00:00 /usr/bin/php -q /export/home/cacti-0.8.7g/cmd.php 0 67
root     29338 29337  0 23:29 ?        00:00:00 /bin/sh -c php /var/www/html/cacti/poller.php > /dev/null  2>&1 
root     29339 29338  1 23:29 ?        00:00:00 php /var/www/html/cacti/poller.php
root     30795 27597  0 23:29 ?        00:00:00 php /export/home/cacti-0.8.7g/scripts/ss_get_cassandra_stats.php --host sharedcass.alexzeng.wordpress.com --port --user --pass --items dp
Time = 29:34
root     27597     1  0 23:28 ?        00:00:00 /usr/bin/php -q /export/home/cacti-0.8.7g/cmd.php 0 67
root     30981 27597  0 23:29 ?        00:00:00 php /export/home/cacti-0.8.7g/scripts/ss_get_cassandra_stats.php --host sharedcass.alexzeng.wordpress.com --port --user --pass --items dc,dd
Time = 29:37
Time = 29:40
Time = 29:43
Time = 29:46
Time = 29:49
Time = 29:52
Time = 29:55
Time = 29:58
...

Now I got the problem. It’s halfway success.

About Alex Zeng
I would be very happy if this blog can help you. I appreciate every honest comments. Please forgive me if I'm too busy to reply your comments in time.

One Response to A script to debug cacti

  1. Fran Spain says:

    Thank you very much!! Very useful script which helped me to find the proccess that did not respond. Now Cacti poller doesn´t timeout and all params are being graph correctly!

Leave a comment