this is my first post in stack overflow forum. we are recently experiencing some Java OOME issues and using jvisualvm, yourkit and eclipse mat tools able to idenify and fix some issues...
one behavior observed during analysis is that when we create a heapdump manually using jconsole or jvisualvm, the used heap size in jvm reduces dramatically (from 1.3 GB to 200 MB) after generating the heapdump.
can some one please advise on this behavior? this is a boon in disguise since whenever i see the used heapsize is >1.5GB, i perform a manaul GC and the system is back to lower used heapsize numbers resulting in no jvm restarts.
let me know for any additional details
thanks
Guru
when you use JConsole to create the dump file, there are 2 parameters: The first one is the file name to generate (complete path) and the second one (true by default) indicates if you want to perform a gc before taking the dump. Set it to false if you don't want a full gc before dumping
This is an old question but I found it while asking a new question of my own, so I figured I'd answer it.
When you generate a heap dump, the JVM performs a System.gc() operation before it generates the heap dump, which is collecting non-referenced objects and effectively reducing your heap utilization. I am actually looking for a way to disable that System GC so I can inspect the garbage objects that are churning in my JVM.
Related
I am adding JVM args by using -XX:MinHeapFreeRatio and -XX:MaxHeapFreeRatio to see the GC behavior changes. I got some explanations here (what is the purpose of -XX:MinHeapFreeRatio and -XX:MaxHeapFreeRatio)
The question is how could I get the memory usage picture in the answer. I wanna make sure the modifications I made can really change the GC behavior.
It is a tool called VisualVM
There are some other alternatives such us JConsole or Java Mission Control
Currently working on optimizing a library for speed. I've already reduced execution time drastically, using V8 CPU and Memory Profiling through Webstorm. This was achieved mainly by changing the core method from recursive to iterative.
Now the self time distribution breaks down as
I'm assuming the first entry "node" is timing internal functions calls, which is great. The other entries also make sense. I'm new to Nodejs profiling, but 31.6% for GC seems high, so I've decided to investigate.
I've now created a heap dump through Webstorm, but unfortunately that doesn't give me much information.
These seem to be system internal memory references mainly. Stepping through the core iteration code logic again, there also don't seem to be a lot of places where memory is explicitly allocated (using this as a reference).
Question
Can the GC overhead be reduced?
Is this amount of allocation just expected here?
Is it possible to get better memory profiling information?
Setup Instructions
In case someone want's to try debugging this, I'm including setup instructions.
Download or clone object-scan and run
yarn install --frozen-lockfile
yarn run test-simple --verbose
Now create a file test.js in the project root containing this content and run node --trace_gc test.js or run it through Webstorm for advanced profiling.
In Javascript and in v8 (node) particularly an amount of time spent for garbage collection depends on amount of data stored in heap, but that's only one of many factors.
In v8 engine there are two main "types" of GC: minor (scavenge) and major (mark-sweep/mark-compact). You may see GC types that happen during your tests in console with --trace-gc enabled. And in different cases one type could "eat" more time than other an vice versa. So before optimizations you should determine which gc takes more time.
There are not a lot of options for optimizing major GC, cause it highly affected by amount of data that stays in memory for "long" (actually in this case long means that object survives scavenge GC) period. Such data is stored in so called "old space" in heap. And major GC works with this space and it should scan all that memory and mark objects that no longer have any references for further clearance.
In your case the amount of test data you're loading goes to old space. As a result it affects major GC during the whole test. And in this case major GC will not clear too much, because you're using your test object, but it still consume time for scanning entire old space. So you may consider preventing v8 from doing that by launching node with gc-specific flags like: --nouse-idle-notification --expose-gc --gc_interval=100500 (where 100500 is number of allocation, it can be take high value that will prevent running gc before the whole test will pass) that will allow trigger garbage collections manually. Test your code using this approach and see how major GC affects it, try tests with different amount of data you provide to function. If the impact is quiet high you may try to refactor your code trying to minimize long-lived variables, closures, etc.
If you'll discover that major GC doesn't have much impact on performance, then scavenge GC takes the most of time. Unlike major GC it operates with so called "new space" in heap. It's a space where all new objects are stored. If those objects survive scavenge, then they are moved to old space. New space has much smaller size ( you may control it by setting --max_semi_space_size, note: new space size = 2 * semi space size) than old space and more new objects and variables you allocate more scavenge GC runs will happen. If this GC heats performance too much you may consider refactor your code to make less new allocations. But if you'll reuse variables it may also slowdown the performance and those objects will go to old space and may become a problem described in "major GC" section.
Also v8 GC doesn't always work in the same thread that your program runs. It does some work in background too, but I don't know what Webstorm shows in your case. If it counts just total time spend in GC, may be it just doesn't have so much impact.
You may find more details on v8 GC in this blog post.
TL;DR:
Can the GC overhead be reduced?
Yes, but first you should discover what should be optimized by following steps above.
Is this amount of allocation just expected here?
That's could be just discovered by comparing different approaches. There's no some absolute number that could limit "good" amount from "bad", because it depends on lot's of factors, including the amount on entry data.
Is it possible to get better memory profiling information?
You may find some good tools here, but in general you may use Chrome dev tools which could provide a bit more details rather than Webstorm does.
My project has started using java 8 from java 7.
After switching to java 8, we are seeing issues like the memory consumed is getting higher with time.
Here are the investigations that we have done :
Issues comes only after migrating from java7 and from java8
As metaspace is the only thing related to memory which is changes from hava 7 to java 8. We monitored metaspace and this does not grow more then 20 MB.
Heap also remains consistent.
Now the only path left is to analyze how the memory gets distributes to process in java 7 and java 8, specifically private byte memory. Any thoughts or links here would be appreciated.
NOTE: this javaw application is a swing based application.
UPDATE 1 : After analyzing the native memory with NMT tool and generated a diff of memory occupied as compare to baseline. We found that the heap remained same but threads are leaking all this memory. So as no change in Heap, I am assuming that this leak is because of native code.
So challenge remains still open. Any thoughts on how to analyze the memory occupied by all the threads will be helpful here.
Below are the snapshots taken from native memory tracking.
In this pic, you can see that 88 MB got increased in threads. Where arena and resource handle count had increased a lot.
in this picture you can see that 73 MB had increased in this Malloc. But no method name is shown here.
So please throw some info in understanding these 2 screenshot.
You may try another GC implementation like G1 introduced in Java 7 and probably the default GC in Java 9. To do so just launch your Java apps with:
-XX:+UseG1GC
There's also an interesting functionality with G1 GC in Java 8u20 that can look for duplicated Strings in the heap and "deduplicate" them (this only works if you activate G1, not with the default Java 8's GC).
-XX:+UseStringDeduplication
Be aware to test thoroughly your system before going to production with such a change!!!
Here you can find a nice description of the diferent GCs you can use
I encountered the exact same issue.
Heap usage constant, only metaspace increase, NMT diffs showed a slow but steady leak in the memory used by threads specifically in the arena allocation. I had tried to fix it by setting the MALLOC_ARENAS_MAX=1 env var but that was not fruitful. Profiling native memory allocation with jemalloc/jeprof showed no leakage that could be attributed to client code, pointing instead to a JDK issue as the only smoking gun there was the memory leak due to malloc calls which, in theory, should be from JVM code.
Like you, I found that upgrading the JDK fixed the problem. The reason I am posting an answer here is because I know the reason it fixes the issue - it's a JDK bug that was fixed in JDK8 u152: https://bugs.openjdk.java.net/browse/JDK-8164293
The bug report mentions Class/malloc increase, not Thread/arena, but a bit further down one of the comments clarifies that the bug reproduction clearly shows increase in Thread/arena.
consider optimising the JVM options
Parallel Collector(throughput collector)
-XX:+UseParallelGC
concurrent collectors (low-latency collectors)
-XX:+UseConcMarkSweepGC
use String Duplicates remover
-XX:+UseStringDeduplication
optimise compact ratio
-XXcompactRatio:
and refer
link1
link2
In this my answer you can see information and references how to profile native memory of JVM to find memory leaks. Shortly, see this.
UPDATE
Did you use -XX:NativeMemoryTracking=detail option? The results are straightforward, they show that the most memory allocated by malloc. :) It's a little bit obviously. Your next step is to profile your application. To analyze native methods and Java I use (and we use on production) flame graphs with perf_events. Look at this blog post for a good start.
Note, that your memory increased for threads, likely your threads grow in application. Before perf I recommend analyze thread dumps before/after to check does Java threads number grow and why. Thread dumps you can get with jstack/jvisualvm/jmc, etc.
This issue does not come with Java 8 update 152. The exact root cause of why it was coming with earlier versions is still not clearly identified.
Context: 64 bit Oracle Java SE 1.8.0_20-b26
For over 11 hours, my running java8 app has been accumulating objects in the Tenured generation (close to 25%). So, I manually clicked on the Perform GC button in jconsole and you can see the precipitous drop in heap memory on the right of the chart. I don't have any special VM options turned on except for XX:NewRatio=2.
Why does the GC not clean up the tenured generation ?
This is a fully expected and desirable behavior. The JVM has been successfully avoiding a Major GC by performing timely Minor GC's all along. A Minor GC, by definition, does not touch the Tenured Generation, and the key idea behind generational garbage collectors is that precisely this pattern will emerge.
You should be very satisfied with how your application is humming along.
The throughput collector's primary goal is, as its name says, throughput (via GCTimeRatio). Its secondary goal is pause times (MaxGCPauseMillis). Only as tertiary goal it considers keeping the memory footprint low.
If you want to achieve a low heap size you will have to relax the other two goals.
You may also want to lower MaxHeapFreeRatio to allow the JVM to yield back memory to the OS.
Why does the GC not clean up the tenured generation ?
Because it doesn't need to.
It looks like your application is accumulating tenured garbage at a relatively slow rate, and there was still plenty of space for tenured objects. The "throughput" collector generally only runs when a space fills up. That is the most efficient in terms of CPU usage ... which is what the throughput collector optimizes for.
In short, the GC is working as intended.
If you are concerned by the amount of memory that is being used (because the tenured space is not being collected), you could try running the application with a smaller heap. However, the graph indicates that the application's initial behavior may be significantly different to its steady-state behavior. In other words, your application may require a large heap to start with. If that is the case, then reducing the heap size could stop the application working, or at least make the startup phase a lot slower.
I'm monitoring an app whose GC verbose log looks like this:
The graph draws the amount of Used Tenured after the GC runs.
As you can see, there's an obvious memory leak, but I was wondering what would be the best next step to find out which component is holding around 50MB of memory each time the GC runs.
The machine is an AIX 6.1 running an IBM's JVM 5.
Thanks
The pattern in the chart definitely looks like a typical memory leak, building up in tenured space over time. Your best shot would be heap dump analyzers - take a heap dump for example similar to following
jmap -dump:format=b,file=dump.bin <your java process id>
and analyze the dump file for example with Eclipse Memory Analyzer.