Debug Cassandrar JVM thread 100% CPU usage issue

Recently, one of our cassandra nodes run into an issue: one CPU thread utilization is 100% while the others are almost idle. The node shows “Down” in the nodetool now and then.

Java supports threads, and threads can use different CPU thread without problem in theory. But why one CPU is 100% while the others are idle? What the 100% utilization CPU is doing? Here is what I did:

1. Find out the thread ID that uses 100% CPU
In this Linux host, I run top to see the CPU usage, and then type 1 to show each CPU thread uage, then type H to show thread of processes. When I see their is one CPU thread is 100% usage, I can see the top thread as below.

 35376 cassandr  20   0 28.8g  10g 1.2g R 99.7  8.0  11:13.08 java

Its CPU usage is 99.7%

2. Find out what the thread is doing
I use “jstack -l ” to dump all JVM thread calling stacks. I need to find the thread with ID 35376. The number of thread id in jstack dump file is hexadecimal format while it’s decimal format in top output. Decimal 35376 equals 8a30 HEX. Got it in the output:

...
"VM Thread" prio=10 tid=0x00007f2a78313000 nid=0x8a30 runnable
...

So I know that “VM Thread” is the culprit. From my basic understanding, its main job is GC(Garbage Collection). If there is no memory leak problem in the application java code, GC can be improved by adjust JVM HEAP SIZE parameters. First I need to check the current heap usage.

3. Get JVM heap usage
We can just “jmap -heap “:

$ ./jmap -heap 122576
Attaching to process ID 122576, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.10-b01

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 8589934592 (8192.0MB)
   NewSize          = 2147483648 (2048.0MB)
   MaxNewSize       = 2147483648 (2048.0MB)
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 1932787712 (1843.25MB)
   used     = 1932773920 (1843.2368469238281MB)
   free     = 13792 (0.013153076171875MB)
   99.99928641930438% used
Eden Space:
   capacity = 1718091776 (1638.5MB)
   used     = 1718091768 (1638.4999923706055MB)
   free     = 8 (7.62939453125E-6MB)
   99.99999953436713% used
From Space:
   capacity = 214695936 (204.75MB)
   used     = 214682152 (204.73685455322266MB)
   free     = 13784 (0.01314544677734375MB)
   99.99357975737371% used
To Space:
   capacity = 214695936 (204.75MB)
   used     = 0 (0.0MB)
   free     = 214695936 (204.75MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 6442450944 (6144.0MB)
   used     = 6442450896 (6143.999954223633MB)
   free     = 48 (4.57763671875E-5MB)
   99.99999925494194% used
Perm Generation:
   capacity = 58068992 (55.37890625MB)
   used     = 34732264 (33.123268127441406MB)
   free     = 23336728 (22.255638122558594MB)
   59.81206630898639% used

New Generation is 99.999% usage. This could be a problem because when the code create a new object it will need get memory from this part. If it’s full, JVM needs to scan the memory area to release memory. Before it’s done, the program can do nothing but wait.

4. Adjust JVM heap size
I set the parameters in cassandra-env.sh :

MAX_HEAP_SIZE="16G"
HEAP_NEWSIZE="4G"

It will set it in JVM options as follows:

JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}"

5. Check result
After restart cassandra, and let it run a while, I run jmap again:

$ ./jmap -heap 35339
Attaching to process ID 35339, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.10-b01

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 17179869184 (16384.0MB)
   NewSize          = 4294967296 (4096.0MB)
   MaxNewSize       = 4294967296 (4096.0MB)
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 3865509888 (3686.4375MB)
   used     = 1271912480 (1212.9902648925781MB)
   free     = 2593597408 (2473.447235107422MB)
   32.90413210294704% used
Eden Space:
   capacity = 3436052480 (3276.875MB)
   used     = 1013629048 (966.671989440918MB)
   free     = 2422423432 (2310.203010559082MB)
   29.499812761881913% used
From Space:
   capacity = 429457408 (409.5625MB)
   used     = 258283432 (246.31827545166016MB)
   free     = 171173976 (163.24422454833984MB)
   60.14180386428449% used
To Space:
   capacity = 429457408 (409.5625MB)
   used     = 0 (0.0MB)
   free     = 429457408 (409.5625MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 12884901888 (12288.0MB)
   used     = 3744285600 (3570.8290100097656MB)
   free     = 9140616288 (8717.170989990234MB)
   29.059480875730515% used
Perm Generation:
   capacity = 62611456 (59.7109375MB)
   used     = 37563512 (35.82335662841797MB)
   free     = 25047944 (23.88758087158203MB)
   59.99463101449038% used

In another case, it’s tomcat and home-grown java application, we got similar 100% CPU usage on one thread. While the solution is opposite : I reduced the JVM heap size to fix it because the default MAX JVM heap size is 32GB which is much more than enough. The program performance is good at start, but it will run slower and slower. Because setting too large heap size may also causes similar problem because the bigger the JVM heap size, the more work GC need to do, similar to Oracle shared pool size.

jmap and jstack is available in JDK. If you don’t have JDK, you need to download the exact same version(including minor version) JDK of the corresponding JRE otherwise it will complain incompatible version.

Kindly reminder, I am a newbie to java, thus my understanding may be wrong. Please use your own judgement on the contents.

References:
http://java.sys-con.com/node/1611555
http://middlewaremagic.com/weblogic/?tag=young-generation

Advertisements

How-to use cacti to monitor Cassandra

I’d like to share my experiences to setup monitor for Cassandra in cacti.

The template for Cassandra is based on mysql-cacti-templates and Cacti Templates for Cassandra. The setup steps are listed in the README file of the package.

The original template is based on Cassandra 0.6.8 while I need to monitor Cassandra 0.8. So there are some problems:

Fixed issues:
1. function array_fill_keys bug in ss_get_cassandra_stats.php
My php is 5.1.6 while this function array_fill_keys is only available after 5.2.0. So there is a replacement in this file. But there is a deadly bug. It should use “$v” as key, not “$k”. Finding this bug took me a lot of time, while it’s easy to fix:

if( !function_exists('array_fill_keys') ) {
  function array_fill_keys($array, $value) {
    $r = array();
    foreach( $array as $k => $v ) {
      //$r[$k] = $value;
      $r[$v] = $value;          //alex
    }
    return $r;
  }
}

2. function to_shortname issue in ss_get_cassandra_stats.php
Cassandra’s JMX property name/items are changed a lot. This makes the to_shortname function stop work for many items. It’s too tedious and not necessary to add all of them. So I added 3 lines to ignore them, by setting all unknown items to ‘xx’.

  //alex, to ignore not defined keys in cassandra_keys, map all of them to xx
  if(!isset($cassandra_keys[$pfx ."_$name"]) ) {
    return 'xx';
  }
  return $cassandra_keys[$pfx ."_$name"];

Some of them are just changed name, so I need to update keys in $cassandra_keys. Here is the updated file ss_get_cassandra_stats.php

3. function get_stats_cache bug in ss_get_cassandra_stats.php
Different hosts in one cluster should not shared the same cache file. So I added the host name to the cache file.

  //list($fp, $content) = check_cache($cache_dir, $poll_time, 'cassandra_'. $options['cluster'], $options);     #alex, should not shared between hosts
  list($fp, $content) = check_cache($cache_dir, $poll_time, 'cassandra_'. $options['cluster'] . '_' . $options['host'] . '_stats', $options);

4. Cassandra’s JMX domain name changed in ss_get_cassandra_stats.php
‘org.apache.cassandra.service’ changed to ‘org.apache.cassandra.db’
‘org.apache.cassandra.concurrent’ changed to ‘org.apache.cassandra.request’

5. Finally, there are too many graphs in the templates
It can add graphs for each keyspace or column family. Also many metrics don’t exist anymore. That will be too much for our cacti system as we get data every 1 minute. So I deleted many graphs from the template, this is the updated file cacti_host_template_x_cassandra_server_ht.xml.

Besides these solved issues, there are some remained :
1. Performance: after I added a 4-node Cassandra cluster to our cacti system, the poller time jumped from 30 seconds to 50 seconds. This make it very near to 1 minute.
2. Cassandra’s JMX property name/items changes a lot. It’s not surprise that it will continue changing in coming versions. We need to take lots of efforts to maintenance this templates.
3. Some metrics cannot found in JMX while it’s easy to get by Cassandra’s nodetool, and it’s almost not possible to add metrics that not in JMX.

So I may use home-grown scripts to create templates later, like what we did for mongoDB.