JVM Tuning - CMS behavior - garbage-collection

JVM Tuning - CMS behavior - garbage-collection

Currently I'm running an application in Tomcat 7 with the following jvm arguments:
-Dcatalina.home=E:\Tomcat
-Dcatalina.base=E:\Tomcat
-Djava.endorsed.dirs=E:\Tomcat\endorsed
-Djava.io.tmpdir=E:\TomcatE\temp
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=E:\Tomcat\conf\logging.properties
-XX:MaxPermSize=512m
-XX:PermSize=512m
-XX:+UseConcMarkSweepGC
-XX:NewSize=7g
-XX:MaxTenuringThreshold=31
-XX:CMSInitiatingOccupancyFraction=90
-XX:+UseCMSInitiatingOccupancyOnly
-XX:SurvivorRatio=6
-XX:TargetSurvivorRatio=90
-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-Xloggc:E:\Tomcat7\gc.log
I'm using CMS as garbage collector and the behavior seems to be very strange. Even having 13GB of Old generation, when a major collection is performed (I guess at 90% of occupied space -> -XX:CMSInitiatingOccupancyFraction=90) , CMS is not able to clean a large amount of objects (still having occupied space of at least 7GB). I don't believe that application has so many long-lived objects (not sure!). Is it not supposed that CMS release much more space? Or could be something related to fragmentation?
Because of this behavior I'm having frequent CMS cycles...that I would like to decrease.
Even using a low pause GC, sometimes application stops 15-30 secs... How can I decrease pause time in CMS?
Could be a good idea to have more JVMS instead of having one with 20GB of heap?
Thanks a lot

First, you can dump the heap dump file with:
jmap -heap:live heap.bin ${pid}
command and find the long-lived objects by mat
Second, because the heap size is bigger than 8G and you can try Garbage First(G1) Collector

Related

GC graph shows there is a memory leak but unable to track in the dump

We have a Java Micorservice in our application which is connected to Postgres as well as Phoenix. We are using Spring Boot 2.x.
The problem is we are executing endurance testing for our application for about 8 hours and we could observe that the used heap is keep on increasing though we used the recommended suggestions for VM arguments, looks like a memory leak. we analysed the heap dump however the root cause is not exactly clear for us, can some experts help based on the results?
The VM arguments that we are actually using are:
-XX:ConcGCThreads=8 -XX:+DisableExplicitGC -XX:InitialHeapSize=536870912 -XX:InitiatingHeapOccupancyPercent=45 -XX:MaxGCPauseMillis=1000 -XX:MaxHeapFreeRatio=70 -XX:MaxHeapSize=536870912 -XX:MinHeapFreeRatio=40 -XX:ParallelGCThreads=16 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:StringDeduplicationAgeThreshold=1 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseStringDeduplication
We are expecting the used heap should be flat in the GC log, however memory consumption is not released and it keeps on increasing.
Heap Dump:
GC graph:

I'm not sure which tool you are using above, but I would be looking for the dominator hierarchy in the heap. Eclipse MAT is a good tool to analyse heap dumps and it can point you in the direction of what's actually holding the memory and you can decide if you want to categorise it as a leak or not. Regardless of the label you attach, if the application is going to crash after a while because it runs out of memory, then it is a problem.
This blog also discusses diagnosing this type of problems.

Why does the java8 GC not collect for over 11 hours?

Context: 64 bit Oracle Java SE 1.8.0_20-b26
For over 11 hours, my running java8 app has been accumulating objects in the Tenured generation (close to 25%). So, I manually clicked on the Perform GC button in jconsole and you can see the precipitous drop in heap memory on the right of the chart. I don't have any special VM options turned on except for XX:NewRatio=2.
Why does the GC not clean up the tenured generation ?

This is a fully expected and desirable behavior. The JVM has been successfully avoiding a Major GC by performing timely Minor GC's all along. A Minor GC, by definition, does not touch the Tenured Generation, and the key idea behind generational garbage collectors is that precisely this pattern will emerge.
You should be very satisfied with how your application is humming along.

The throughput collector's primary goal is, as its name says, throughput (via GCTimeRatio). Its secondary goal is pause times (MaxGCPauseMillis). Only as tertiary goal it considers keeping the memory footprint low.
If you want to achieve a low heap size you will have to relax the other two goals.
You may also want to lower MaxHeapFreeRatio to allow the JVM to yield back memory to the OS.

Why does the GC not clean up the tenured generation ?
Because it doesn't need to.
It looks like your application is accumulating tenured garbage at a relatively slow rate, and there was still plenty of space for tenured objects. The "throughput" collector generally only runs when a space fills up. That is the most efficient in terms of CPU usage ... which is what the throughput collector optimizes for.
In short, the GC is working as intended.
If you are concerned by the amount of memory that is being used (because the tenured space is not being collected), you could try running the application with a smaller heap. However, the graph indicates that the application's initial behavior may be significantly different to its steady-state behavior. In other words, your application may require a large heap to start with. If that is the case, then reducing the heap size could stop the application working, or at least make the startup phase a lot slower.

OutOfMemoryException - GC verbose confirmed a memory leak, what now?

I'm monitoring an app whose GC verbose log looks like this:
The graph draws the amount of Used Tenured after the GC runs.
As you can see, there's an obvious memory leak, but I was wondering what would be the best next step to find out which component is holding around 50MB of memory each time the GC runs.
The machine is an AIX 6.1 running an IBM's JVM 5.
Thanks

The pattern in the chart definitely looks like a typical memory leak, building up in tenured space over time. Your best shot would be heap dump analyzers - take a heap dump for example similar to following
jmap -dump:format=b,file=dump.bin <your java process id>
and analyze the dump file for example with Eclipse Memory Analyzer.

JVM GC settings for a simple use case

I'm struggling with getting the right settings for my JVM.
Here's the use case:
Tomcat is serving requests (300req/s). But they are very fast (key-value lookup) so I don't have any performance problems. Everything would work fine till I have to refresh the data it's serving every 3 hours. You can imagine I have a big HashMap and I'm just doing lookups. During data reload a create a temporary HashMap and then I swap it. I need to load quite a lot of data (~800MB in memory every time).
The problem I have that during those loads from time to time Tomcat stops responding.
Initially the problem was promotion failures and FullGC but I got around those problems by tweaking the settings.
As you might notice I already decreased the value when the CMS collector kicks in. I don't get any promotion failure or anything like that any more. The young generation is reasonably small to make the minor collection fast. I've increased the SurvivorRatio because all the request objects die young and what doesn't should be automatically promoted to old generation.(the data being load).
But I'm still seeing 503 errors in Tomcat during the data load. In gc.log my minor collections started to be slow during this process. They are now in seconds comparing to miliseconds. I've tried slowing down the load process to give a breather to the GC but I doesn't seem to work...
The problem is especially problematic the moment I reach the capacity of old generation. CMS kicks in, frees up memory and then later the allocations are pretty slow. I don't see any errors in gc.log any more.
What can I do differently? I know fragmentation might be a problem but I'm not getting promotion failures. The machine is a 8 core server. Does decreasing the number of GCThread make any sense? Will setting a lower thread priority for the data loading thread make sense?
Is there a way to kick off CMS collector periodically in the background? The data that's being swapped can be actually immediately be garbage collected.
I'm open to any suggestions!
Here are my JVM settings.
-Xms14g
-Xmx14g
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+AlwaysPreTouch
-XX:MaxNewSize=256m
-XX:NewSize=256m
-XX:MaxPermSize=128m
-XX:PermSize=128m
-XX:SurvivorRatio=24
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=88
-XX:+UseCompressedStrings
-XX:+DisableExplicitGC
JDK 1.6.33
Tomcat 6
gc.log snippet:
line 7 the data load starts
line 20 it stops
http://safebin.net/9124

Looking at that attached log and seeing those huge increases in minor GC times leads me to belive that your machine is under extremely heavy load from other processes than the JVM.
My reasoning in this is that when your minor GC is taking place, all application threads are stopped. Hence, nothing your application does should be able to affect the minor GC times seeing that your new gen is constant in size.
However, if there are a lot of load from other processes on the machine during this time, the GC threads will compete for execution time and you could see this behavior.
Could you check the CPU usage from other processes when your data load is running?
Edit: Looking a bit more on the logs I come up with another possible explanation.
It seems that the target survivor space is full (ParNew goes down to exactly 10048K each "slow" GC). That would mean that objects are promoted to old gen directly which possibly could slow this down. I would try to increase the size of the New gen and lower the survivor ratio. Even maybe try to run without setting the new gen size or the survivor rate at all and see how the JVM managed to optimize this (although beware that the JVM usually does a poor job for optimizing for bursts like this).

your load lasts about 90s and is interrupted by a GC every 1s or so yet you have a 14G heap which has a steady state occupancy (assuming the surrounding log lines are steady state) of only about 5G which means you have a lot of memory going to waste. I think the previous answer looks to be correct (based on the data presented) when it says your survivor spaces are too small. If it reasonable does nothing but lookups the rest of the time then a perfectly reasonable strategy would be something like
tenuring threshold = 0 (or 1)
eden size > 2x the working set so maybe 1.5-2G (i.e. allow the current live data and the working copy to reside entirely in eden)
tenured = whatever is left
The point here being to try and completely avoid a young collection during the load phase. However a tenured threshold of 0 would mean the previous version would likely be in tenured and you'd eventually see a possibly lengthy collection to clean it up. Another option might be to go the other way round and have tenured big enough to fit 2-3 versions of the data and eden the rest with a view to attempting to minimise the frequency of a young collection and help tenured be collected as quickly as possible.
What works best really depends on what else the app is doing the rest of the time.
The cms trigger seems quite high for a large heap btw, if you only start collecting at 88% then does it have time to finish the job before a fullgc is forced? I suppose it might be quite safe if you're actually doing v little allocation most of the time.

Swappiness in JVM

I stumbled upon some interesting problems last week where I noticed that the one of our production servers, which has Apache HTTP Server over Tomcat running, stopped, reporting an HTTP outage.
On further investigating this issue it seemed like it was due to JVM causing memory pages to be swapped out quickly. This resulted in the swap space being completely populated, causing a memory issue the next time a page is moved to swap.
Investigating further, it seems that there is a JVM swappiness factor that's set to 60% by default with some of the our Linux distributions. Based on some research, it seems that this could be a high value for a web service that has high traffic. Our swap space was set to 2 GB.
The swap details are:
Filename Type Size Used Priority
/dev/sda3 partition 2096472 1261420 -1
From /proc/meminfo
SwapCached: 944668 kB
Our JVM properties are as follows:
-Xmx6g -Xms4g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:PermSize=512M -XX:MaxPermSize=1024M -XX:NewSize=2g -XX:MaxNewSize=2g -XX:ParallelGCThreads=8
The server runs with 12 GB RAM.
Swappiness does not go well with the JVM's GC process. Thus I tried reducing the swappiness to 0, but that did not change anything. We still see cases where the entire swap space is consumed and resulting in an OutOfMemory error.
How can I tune the JVM performance?

The swappiness factor is actually a system-wide setting in Linux, not JVM-specific.
My personal experience with some kinds of applications is that they need swap to be turned off in order to work without hiccups, period. While I can't say for sure if your application belongs to this category, I have seen apps being swapped out despite enough free RAM being available. As you pointed out, GC and swap don't mix well. Swappiness is just an indication to the OS, so its impact on how much gets swapped out may be larger or smaller depending on circumstances. My suggestion would be to try turning swap completely off.
In order to do this, you need to be root or have sudo access. Comment out the line describing swap in /etc/fstab, which will prevent swap from being turned on after a reboot. Then, in order to turn swap off for the current server run, run swapoff -a. This may take up to a few minutes if there's data in swap that needs to be pulled back into RAM. Then, check the last line of the output of free to ensure that the total size of available swap is 0. After this, observe your app in order to determine if turning swap off solved your issue.

If your physical RAM is 12 GB and your heap is set to only 6 GB (max), then you have plenty of RAM. Is this server dedicated to the Tomcat instance?
Take a look at the verbose GC log files to see what the memory usage is over time. Check your access log files to confirm whether the access requests have increased over time.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string