Debug Cassandrar JVM thread 100% CPU usage issue
May 25, 2013 1 Comment
Recently, one of our cassandra nodes run into an issue: one CPU thread utilization is 100% while the others are almost idle. The node shows “Down” in the nodetool now and then.
Java supports threads, and threads can use different CPU thread without problem in theory. But why one CPU is 100% while the others are idle? What the 100% utilization CPU is doing? Here is what I did:
1. Find out the thread ID that uses 100% CPU
In this Linux host, I run top to see the CPU usage, and then type 1 to show each CPU thread uage, then type H to show thread of processes. When I see their is one CPU thread is 100% usage, I can see the top thread as below.
35376 cassandr 20 0 28.8g 10g 1.2g R 99.7 8.0 11:13.08 java
Its CPU usage is 99.7%
2. Find out what the thread is doing
I use “jstack -l ” to dump all JVM thread calling stacks. I need to find the thread with ID 35376. The number of thread id in jstack dump file is hexadecimal format while it’s decimal format in top output. Decimal 35376 equals 8a30 HEX. Got it in the output:
... "VM Thread" prio=10 tid=0x00007f2a78313000 nid=0x8a30 runnable ...
So I know that “VM Thread” is the culprit. From my basic understanding, its main job is GC(Garbage Collection). If there is no memory leak problem in the application java code, GC can be improved by adjust JVM HEAP SIZE parameters. First I need to check the current heap usage.
3. Get JVM heap usage
We can just “jmap -heap “:
$ ./jmap -heap 122576 Attaching to process ID 122576, please wait... Debugger attached successfully. Server compiler detected. JVM version is 20.10-b01 using parallel threads in the new generation. using thread-local object allocation. Concurrent Mark-Sweep GC Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 8589934592 (8192.0MB) NewSize = 2147483648 (2048.0MB) MaxNewSize = 2147483648 (2048.0MB) OldSize = 5439488 (5.1875MB) NewRatio = 2 SurvivorRatio = 8 PermSize = 21757952 (20.75MB) MaxPermSize = 85983232 (82.0MB) Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 1932787712 (1843.25MB) used = 1932773920 (1843.2368469238281MB) free = 13792 (0.013153076171875MB) 99.99928641930438% used Eden Space: capacity = 1718091776 (1638.5MB) used = 1718091768 (1638.4999923706055MB) free = 8 (7.62939453125E-6MB) 99.99999953436713% used From Space: capacity = 214695936 (204.75MB) used = 214682152 (204.73685455322266MB) free = 13784 (0.01314544677734375MB) 99.99357975737371% used To Space: capacity = 214695936 (204.75MB) used = 0 (0.0MB) free = 214695936 (204.75MB) 0.0% used concurrent mark-sweep generation: capacity = 6442450944 (6144.0MB) used = 6442450896 (6143.999954223633MB) free = 48 (4.57763671875E-5MB) 99.99999925494194% used Perm Generation: capacity = 58068992 (55.37890625MB) used = 34732264 (33.123268127441406MB) free = 23336728 (22.255638122558594MB) 59.81206630898639% used
New Generation is 99.999% usage. This could be a problem because when the code create a new object it will need get memory from this part. If it’s full, JVM needs to scan the memory area to release memory. Before it’s done, the program can do nothing but wait.
4. Adjust JVM heap size
I set the parameters in cassandra-env.sh :
MAX_HEAP_SIZE="16G" HEAP_NEWSIZE="4G"
It will set it in JVM options as follows:
JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}" JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}" JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}"
5. Check result
After restart cassandra, and let it run a while, I run jmap again:
$ ./jmap -heap 35339 Attaching to process ID 35339, please wait... Debugger attached successfully. Server compiler detected. JVM version is 20.10-b01 using parallel threads in the new generation. using thread-local object allocation. Concurrent Mark-Sweep GC Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 17179869184 (16384.0MB) NewSize = 4294967296 (4096.0MB) MaxNewSize = 4294967296 (4096.0MB) OldSize = 5439488 (5.1875MB) NewRatio = 2 SurvivorRatio = 8 PermSize = 21757952 (20.75MB) MaxPermSize = 85983232 (82.0MB) Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 3865509888 (3686.4375MB) used = 1271912480 (1212.9902648925781MB) free = 2593597408 (2473.447235107422MB) 32.90413210294704% used Eden Space: capacity = 3436052480 (3276.875MB) used = 1013629048 (966.671989440918MB) free = 2422423432 (2310.203010559082MB) 29.499812761881913% used From Space: capacity = 429457408 (409.5625MB) used = 258283432 (246.31827545166016MB) free = 171173976 (163.24422454833984MB) 60.14180386428449% used To Space: capacity = 429457408 (409.5625MB) used = 0 (0.0MB) free = 429457408 (409.5625MB) 0.0% used concurrent mark-sweep generation: capacity = 12884901888 (12288.0MB) used = 3744285600 (3570.8290100097656MB) free = 9140616288 (8717.170989990234MB) 29.059480875730515% used Perm Generation: capacity = 62611456 (59.7109375MB) used = 37563512 (35.82335662841797MB) free = 25047944 (23.88758087158203MB) 59.99463101449038% used
In another case, it’s tomcat and home-grown java application, we got similar 100% CPU usage on one thread. While the solution is opposite : I reduced the JVM heap size to fix it because the default MAX JVM heap size is 32GB which is much more than enough. The program performance is good at start, but it will run slower and slower. Because setting too large heap size may also causes similar problem because the bigger the JVM heap size, the more work GC need to do, similar to Oracle shared pool size.
jmap and jstack is available in JDK. If you don’t have JDK, you need to download the exact same version(including minor version) JDK of the corresponding JRE otherwise it will complain incompatible version.
Kindly reminder, I am a newbie to java, thus my understanding may be wrong. Please use your own judgement on the contents.
References:
http://java.sys-con.com/node/1611555
http://middlewaremagic.com/weblogic/?tag=young-generation
A nice tutorial on
JVM Architecture Specification Basic The Heap Area Introduction, teach about the JVM Heap Area in details
JVM Architecture Specification Basic The Method Area explained, teach about the JVM method area