OutOfMemoryError in cassandra - cassandra

Too many OutOfMemoryError has occurred and stopped cassandra service.
WARN [New I/O worker #22] 2016-11-03 10:38:15,083 Slf4JLogger.java (line 76) Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at org.jboss.netty.channel.socket.nio.SocketReceiveBufferAllocator.newBuffer(SocketReceiveBufferAllocator.java:64)
at org.jboss.netty.channel.socket.nio.SocketReceiveBufferAllocator.get(SocketReceiveBufferAllocator.java:44)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:62)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Can anyone help me to find rootcause?
I'm using cassandra-2.0.9 and netty-3.6.6.Final.jar.

You will need to obtain a heap dump and inspect it.

Related

Running Jmeter script through jenkins Out of Memory error while generating report

I am facing an Out of Memory issue while running Jmeter script through Jenkins.The process is there is code written in framework that will convert the generated XML file result to CSV then HTML to Publish the report in dashboard.
I have already tried Increasing Heap space pf Jenkins server to 25 GB out of 32 GB.It seems Initially it takes 1 Gb then after some times it throws the error even though it still have 24 GB pf memory available in heap ,I ran free -h for this.
Also tried increasing Jmeter memory by
set HEAP=-Xms1g -Xmx8g -XX:MaxMetaspaceSize=512m
The script executed fine in Jnekins server which is on Windows.But it throsw error when the jenkin server is on Linux mode.
Hear is my error log.
/var/lib/jenkins/workspace/ITT2_Execution/Resources/csvReportPath/ITT2_Unicast_Broker_Download_count.xml/var/lib/jenkins/workspace/ITT2_Execution/Resources//Configuration/transformGaurav.xsl/var/lib/jenkins/workspace/ITT2_Execution/jmeter_reports/ITT2_Unicast_Broker_Download_2_Oct_2019_19_3_52_Count.html{titleReport=ITT2_Unicast_Broker_DownloadCountReport, dateReport=2-Oct-2019 22:02:38}
Finished Parsing
/var/lib/jenkins/workspace/ITT2_Execution/Resources/csvReportPath/AutomationReport_5.2.4.2018.20_2_Oct_2019_19_3_52_count.xml/var/lib/jenkins/workspace/ITT2_Execution/Resources//Configuration/transformGaurav.xsl/var/lib/jenkins/workspace/ITT2_Execution/jmeter_reports/AutomationReport_5.2.4.2018.20_2_Oct_2019_19_3_52.html{titleReport=nullCountReport, dateReport=2-Oct-2019 22:02:39}
JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2019/10/02 18:33:10 - please wait.
JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2019/10/02 18:33:10 - please wait.
JVMDUMP032I JVM requested Heap dump using '/var/lib/jenkins/workspace/ITT2_Execution/heapdump.20191002.183310.40181.0001.phd' in response to an event
JVMDUMP010I Heap dump written to /var/lib/jenkins/workspace/ITT2_Execution/heapdump.20191002.183310.40181.0001.phd
JVMDUMP032I JVM requested System dump using '/var/lib/jenkins/workspace/ITT2_Execution/core.20191002.183310.40181.0002.dmp' in response to an event
JVMDUMP010I System dump written to /var/lib/jenkins/workspace/ITT2_Execution/core.20191002.183310.40181.0002.dmp
JVMDUMP032I JVM requested Java dump using '/var/lib/jenkins/workspace/ITT2_Execution/javacore.20191002.183310.40181.0003.txt' in response to an event
JVMDUMP010I Java dump written to /var/lib/jenkins/workspace/ITT2_Execution/javacore.20191002.183310.40181.0003.txt
JVMDUMP032I JVM requested Snap dump using '/var/lib/jenkins/workspace/ITT2_Execution/Snap.20191002.183310.40181.0005.trc' in response to an event
JVMDUMP010I Snap dump written to /var/lib/jenkins/workspace/ITT2_Execution/Snap.20191002.183310.40181.0005.trc
JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
JVMDUMP032I JVM requested Heap dump using '/var/lib/jenkins/workspace/ITT2_Execution/heapdump.20191002.183310.40181.0004.phd' in response to an event
JVMDUMP010I Heap dump written to /var/lib/jenkins/workspace/ITT2_Execution/heapdump.20191002.183310.40181.0004.phd
JVMDUMP032I JVM requested Java dump using '/var/lib/jenkins/workspace/ITT2_Execution/javacore.20191002.183310.40181.0006.txt' in response to an event
JVMDUMP010I Java dump written to /var/lib/jenkins/workspace/ITT2_Execution/javacore.20191002.183310.40181.0006.txt
JVMDUMP032I JVM requested Snap dump using '/var/lib/jenkins/workspace/ITT2_Execution/Snap.20191002.183310.40181.0007.trc' in response to an event
JVMDUMP010I Snap dump written to /var/lib/jenkins/workspace/ITT2_Execution/Snap.20191002.183310.40181.0007.trc
JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
[WARNING]
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:508)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
at java.lang.Thread.run(Thread.java:811)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.xerces.xni.XMLString.toString(Unknown Source)
at org.apache.xerces.parsers.AbstractDOMParser.characters(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCharReference(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at utils.APIReportProcessing.fetchAPIReportDetailModuleWise(APIReportProcessing.java:110)
at jmeterRun.RunProcess.prepareFinalResultsMerged(RunProcess.java:228)
at jmeterRun.ControllerJMeter.main(ControllerJMeter.java:139)
... 6 more
[WARNING] Attempt to (de-)serialize anonymous class hudson.maven.reporters.MavenArtifactArchiver$2; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
[WARNING] Attempt to (de-)serialize anonymous class hudson.maven.reporters.MavenFingerprinter$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
Thanks
Bibek
The JMeter process runs in the Java Virtual Machine (JVM) under various settings and arguments. The Java Heap Space (as referred to in the error message) is the memory the JVM takes from the underlying operating system to allocate space for the creation of necessary objects.
JMeter’s default configuration (see jmeter.bat for Windows or jmeter for non-Windows systems scripts) assumes a heap space of 512 megabytes only. This is actually pretty low considering many modern smartphones have up to four times more! If your test is running massive objects which go over 512Mb, you’ll get an OOM error and your test will fail.
Fortunately, there’s a simple solution. Just increase the maximum heap size to ~80% of your total available physical RAM. To do this, find the following line in your JMeter startup script:
HEAP="-Xms1g -Xmx25g"
Now change the -Xmx value accordingly. For example: if you want to set the maximum heap size to 25 gigabytes, you’ll need to change the line to:
HEAP="-Xms1g -Xmx25g"
To apply the change, you’ll need to restart JMeter.
As per Understand the OutOfMemoryError Exception article:
Exception in thread thread_name: java.lang.OutOfMemoryError: Java heap space
Cause: The detail message Java heap space indicates object could not be allocated in the Java heap. This error does not necessarily imply a memory leak. The problem can be as simple as a configuration issue, where the specified heap size (or the default size, if it is not specified) is insufficient for the application.
In other cases, and in particular for a long-lived application, the message might be an indication that the application is unintentionally holding references to objects, and this prevents the objects from being garbage collected. This is the Java language equivalent of a memory leak. Note: The APIs that are called by an application could also be unintentionally holding object references.
One other potential source of this error arises with applications that make excessive use of finalizers. If a class has a finalize method, then objects of that type do not have their space reclaimed at garbage collection time. Instead, after garbage collection, the objects are queued for finalization, which occurs at a later time. In the Oracle Sun implementation, finalizers are executed by a daemon thread that services the finalization queue. If the finalizer thread cannot keep up, with the finalization queue, then the Java heap could fill up and this type of OutOfMemoryError exception would be thrown. One scenario that can cause this situation is when an application creates high-priority threads that cause the finalization queue to increase at a rate that is faster than the rate at which the finalizer thread is servicing that queue.
Action: You can find more information about how to monitor objects for which finalization is pending in Monitor the Objects Pending Finalization.
So use Java Memory Map tool to investigate the largest objects and what process do they belong to.
Also it appears you're running Jenkins on Linux or other Unix-like os while SET command is Windows-specific. In addition Jenkins doesn't respect HEAP environment variable, it uses JAVA_ARGS and/or JENKINS_JAVA_OPTIONS instead.
For JMeter:
HEAP="-Xms1g -Xmx8g" && export HEAP
For Jenkins:
JENKINS_JAVA_OPTIONS="-Xms1g -Xmx8g" && export JENKINS_JAVA_OPTIONS
More information:
How to add Java arguments to Jenkins?
9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure

Cassandra WriteTimeoutException exception in CounterMutationStage - node dies eventually

I'm getting the following exception in my cassandra system.log:
WARN [CounterMutationStage-25] 2017-07-25 13:25:35,874 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[CounterMutationStage-25,5,main]: {}
java.lang.RuntimeException: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2490) ~[apache-cassandra-3.9.jar:3.9]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.8.0_112]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) [apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [apache-cassandra-3.9.jar:3.9]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_112]
Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses.
at org.apache.cassandra.db.CounterMutation.grabCounterLocks(CounterMutation.java:150) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.db.CounterMutation.applyCounterMutation(CounterMutation.java:122) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.StorageProxy$9.runMayThrow(StorageProxy.java:1473) ~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2486) ~[apache-cassandra-3.9.jar:3.9]
... 5 common frames omitted
Whenever this happens, CPU goes down to 0% for a minute or so, node becomes unresponsive but recovers after that.
But eventually, the node will die completely (i.e. the process keeps running, but it will not respond to commands any more, even shutdown does not work, have to kill the process).
Some more information:
Cassandra 3.9
G1 garbage collector
Single Node on Windows Server 2012 R2 (20 Cores, 256 GB RAM)
using a lot of counters and counter mutations
Things I have tried:
eleminated all other warnings from the log. Used to have warnings about counter batches being too large, rewrote code to not use batching at all. This eleminated the warning, but not the exception problem.
migrated to a bigger machine, used bigger heap and fine tuned GC to make sure the problem is not the machine being overstressed. CPU load is < 20%.
Does anyone have an idea what else to do? My main concern is the node dying completely. I am not sure that this exception is causing it but it is the only hint I have...
Update 1:
Updated to Cassandra 3.11 and the node does not seem to die any more now. However, write timeouts presists, node is unresponsive for several minutes but at least recovers now.
Update 2:
Solved the problem (with the help of a professional consultant). Disc I/O speed on our node was terrible, leading to a growing queue of flush writers. Reason is unknown, I/O speed tests on the drive (Raid 1 SSDs) were actually super good.
Moving the node from Windows to Linux (and configuring it according to http://docs.datastax.com/en/landing_page/doc/landing_page/recommendedSettings.html) solved the problem.
Real reason for the problem is unknown; might have been Windows per se or just some freak incompatibility with the RAID setup. In any case, Cassandra is only really tested on Linux and it is far easier to find help for Linux setups. Lesson learned.
It sounds like a beefy machine with 20cores and 256GB RAM. Cassandra is a distributed system aimed to scale horizontally. Rather than pushing the load at a single node, try adding more commodity hardware and scale horizontally. Also you can run multiple nodes of Cassandra within the same box.
Atleast try running a couple of nodes within this box to scale from the unresponsiveness. Most often CPU is not the bottleneck for Cassandra. Its the I/O that a single node can perform.
Check the values on concurrent_writes in cassandra.yaml, I guess based on the recommendation for 20 cores it would be 160 (20 * 8).
If feasible, try separating the commitlog directory and data directory storage drives.
Best bet to scale writes is to add more boxes (which could be smaller in configuration).

ResourceManager Memory Leak?

We got two CDH cluster with the same version(CDH-5.5.2-1.cdh5.5.2.p0.4), and both the ResourceManager of each cluster with the same configuration.
One of the ResourceManager is running well, and its heap memory is stay in a constant value(e.g 800mb) as the time is going on.
But the other one will throw OOM exception and exit after 15 days. When we use 'jmap -F -histo' to dump its jvm heap info, we are seeing that the size of object 'char[]' is growing up as the time is moving, and it finally throw OOM.
Following is key info of jvm dump result of both the good RM and OOM RM:
dump cmd:jmap -F -histo pid
A)jvm dump of good RM in cluster A
[we are seeing that 40w+ char[] instances with 60m+ heap mem][1]
B)jvm dump of bak RM(OOM) in cluster B
[we are seeing that 30w+ char[] instances but with 400m+ heap mem][2]
Any help wil be appreciated.
We dump(jmap -F -dump:file=file.dump_result pid) heap info today, and use MAT(memory analyzer tools) to analyse the dump file, we found that the instance variable applications(java.util.concurrent.ConcurrentHashMap) in org.apache.hadoop.yarn.server.resourcemanager.RMActiveServiceContext eats up a lot of memory:
call hierachry information
instance variable: applications

How to analyze memory leaks in Java 8 compressed class space?

Some Context: We have upgraded the environment of a web application from running on Java 7 to running on Java 8 and Tomcat 8 (64-bit arch, Heap size about 2 GB, PermGen size=256 MB, no constraints on metaspace size). After a while, we started getting the following error:
java.lang.OutOfMemoryError: Compressed class space
which means that the space needed for UseCompressedClassPointers exceeded CompressedClassSpaceSize. At that moment VisualVM showed a 2 GB metaspace size.
Now with the VisualVM tool, we can see the Metaspace size is constatnly increasing with every request about 3 MB, however the heap does not seem to do so. The heap usage has a saw zigzag shape going back to the same low point after every GC.
I can tell that the application is leaking Metadata only when using a Java JAXB operation, but I couldn't prove it with VisualVM.
The application depends on webservices-rt-1.4 as a JAXB implementation provider. The application uses marshalling, unmarshalling. The class generation from XSD is done with maven-jaxb2-plugin-0.13.1.
Update:
After tracing class loading and unloading, I found out that the same JAXB classes is loaded into memory by WebAppClassLoader multiple times but never cleaned up. Moreover, there are no instances to them in the heap. I debugged and I saw that JDK is calls the method
javax.xml.bind.JAXBContext com.sun.xml.bind.v2.ContextFactory.createContext by reflection and that's when the class are created.
I though the classes are cleaned up by GC. Is it the responsibility of the classLoader to clean up?
Questions: Is there a way to analyze the metaspace objects? Why do I have a leak in metaspace but not in heap? aren't they related? Is that even possible?
Why would the app work fine with PermGen but not Metaspace?
I am facing similar issue.
In my case, memory leak was caused by JAXBContext.newInstance(...) invoke.
Solutions:
wrap this new instance as singleton (https://github.com/javaee/jaxb-v2/issues/581) or
use -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true VM parameter, like in answer Old JaxB and JDK8 Metaspace OutOfMemory Issue
I had similar issue and adding -Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true in setenv.sh as JVM OPT arguments it resolve OOM metaspace issue.

Setting JVM maxperm size outside of Eclipse.ini

I'm using eclipse64 3.7.1 with Linux RHEL5 64 I have 8gigs of ram.
No matter how large I set the
-Dosgi.requiredJavaVersion=1.5
-XX:MaxPermSize=1024M
-Xms1024m
-Xmx1024m
I continue to get errors like:
Error while logging event loop exception:
java.lang.OutOfMemoryError: PermGen space
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.
Is there anywhere else I can configure memory options that relate to eclipse and the JVM? Please help.
If you're getting that exception even after setting the max PermGen size to 1024M, it is likely that you have classloader leaks in your application. Increasing the PermGen size will mitigate these exceptions for a while but might not be very helpful for a long-running application.
You might find this article useful: http://blogs.oracle.com/fkieviet/entry/classloader_leaks_the_dreaded_java

Resources