JVM GC behaviour on heap dump and unnecessary heap usage - garbage-collection

We have problem tuning the memory management of JVM's. The very same application running on the k8s cluster, but one of the pods' jvm heap usage rises to ~95% and, when we try to get a heapdump on this vs, somehow gc runs and heap usage drops suddenly, leaving us with a tiny heap dump.
I think the old space has grown unnecessarily, and gc did not work to reclaim memory (for nearly 15 hours). Unfortunately we can't see what is occupying the space, because the heap dump is very small as gc is forced.
All 3 pods are having memory of 1500m and
here is the jvm heap usage percentage graph (3 pods, green being the problematic one):
Details:
openjdk 15.0.1 2020-10-20
OpenJDK Runtime Environment AdoptOpenJDK (build 15.0.1+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 15.0.1+9, mixed mode, sharing)
JVM Parameters:
-XX:MaxRAMPercentage=75
-XX:InitialRAMPercentage=75
-server
-Xshare:off
-XX:MaxMetaspaceSize=256m
-Dsun.net.inetaddr.ttl=60
-XX:-OmitStackTraceInFastThrow
-XX:+ShowCodeDetailsInExceptionMessages
The questions are:
Why a full gc is called when we try to get heap dump?
What is the motivation behind the gc not reclaiming memory and causes the application run with the heap size between ~70% and ~95%, while jvm can use and perfectly work with only 10%?
What can be done to force jvm to do gc more aggresively to avoid this situation? Or should it be done for production environment?

JVM heap dump procedure has 2 modes
live objects - this mode executes Full GC along side with heap dump. This is default options.
all objects - heap dump would include all object on heap both reachable and unreachable.
Heap dump mode is usually possible to choose via tool specific option.
Answering your questions
Why a full gc is called when we try to get heap dump?
Answered above
What is the motivation behind the gc not reclaiming memory and causes the application run with the heap size between ~70% and ~95%, while jvm can use and perfectly work with only 10%?
Reclaiming memory required CPU resources and impacts application latency. While JVM is operating withing memory limits it will mostly avoid expensive GC.
Recent development of containers is driving some changes in JVM GC department, but statement above is still relevant for default GC configuration.
What can be done to force jvm to do gc more aggressively to avoid this situation? Or should it be done for production environment?
Original answers lack problem statement. But general advises are
manage memory limits per container (JVM derive heap size from container limits unless they are overridden explicitly)
forcing GC periodically is possible, though unlikely to be a solution to any problem
G1GC has wide range of tuning options relevant for containers

Related

Netty webclient memory leak in tomcat server

I am observing swap memory issue in our tomcat servers which is installed in linux machines and when tried to collect heap dump, got this while analyzing heap dump.
16 instances of "io.netty.buffer.PoolArena$HeapArena", loaded by "org.apache.catalina.loader.ParallelWebappClassLoader # 0x7f07994aeb58" occupy 201,697,824 (15.40%) bytes.
Have seen in this blog Memory accumulated in netty PoolChunk that Adding -Dio.netty.allocator.type=unpooled showed significant reduction in the memory. Where do we need to add this property in our tomcat servers?

G1GC and Permgen

I'm having doubts regarding which metrics I should follow to allocate memory for the permgen.
I'm having crashing problems and that permgen is full, my server has 32gb of memory for the heap and 512m for permgen, would you have any metrics or recommendations to follow to configure Permgen? Another doubt would be related to the GC, the G1GC was configured because from what I had researched it was one of the best options, but I noticed that it demands more of the heap memory, would there be a better gc for a server with a lot of demand and that needs a precise collection or would it just be the same?
CentOS operating system
Java 7
tomcat 7

VoltDB cluster eating all RAM

I've setup a 3 machine VoltDB cluster with more or less default settings. However there seems to be a constant problem with voltdb eating up all of the RAM heap and not freeing it. The heap size is recommended 2GB.
Things that I think might be bad in my setup:
I've set 1 min async snapshots
Most of my queries are AdHoc
Event though it might not be ideal, I don't think it should lead to a problem where memory doesn't get freed.
I've setup my machines accordingly to 2.3. Configure Memory Management.
On this image you can see sudden drops in memory usage. These are server shutdowns.
Heap filling warnings
DB Monitor, current state of leader server
I would also like to note that this server is not heavily loaded.
Sadly, I couldn't find anyone with a similar problem. Most of the advice were targeted on fixing problems with optimizing memory use or decreasing the amount of memory allocated to voltdb. No one seems to have this memory leak lookalike.

GC in Server Mode Not Collecting the Memory

IIS hosted WCF service is consuming Large memory like 18 GB and the server has slowed down.
I Analyzed Mini dump file and it shows only 1 GB or active objects. I understand the GC is not clearing the memory and GC must be running in server mode in 64 bit System. Any idea why the whole computer is stalling and app is taking huge memory?
The GC was running on Server Mode it was configured for better performance. I Understand GC running in Server mode will have a performance improvement because the GC's will not be triggered frequently due to high available memory and in server mode it will have high limit on memory usage. Here the problem was when the high limit is reached for the process CLR triggered the GC and it was trying to clear the Huge 18 GB of memory in one shot, so it was using 90% of system resource and rest applications were lagging.
We tried restarting but it was forever going so We had to kill the process. and now with Workstation mode GC smooth and clean. The only difference is response time has some delay due to GC after 1.5 GB allocation.
One more info: .NET 4.5 version has revision regarding this which has resolved this issue in GC.

Java OutOfMemoryError in Windows Azure Virtual Machine

When I run my Java applications on a Window Azure's Ubuntu 12.04 VM,
with 4 by 1.6GHZ core and 7G RAM, I get the following out of memory error after a few minutes.
java.lang.OutOfMemoryError: GC overhead limit exceeded
I have a swap size of 15G byte, and the max heap size is set to 2G. I am using a Oracle Java 1.6. Increase the max heap size only delays the out of memory error.
It seems the JVM is not doing garbage collection.
However, when I run the above Java application on my local Windows 8 PC (core i7) , with the same JVM parameters, it runs fine. The heap size never exceed 1G.
Is there any extra setting on Windows Azure linux VM for running Java apps ?
On Azure VM, I used the following JVM parameters
-XX:+HeapDumpOnOutOfMemoryError
to get a heap dump. The heap dump shows an actor mailbox and Camel messages are taking up all the 2G.
In my Akka application, I have used Akka Camel Redis to publish processed messages to a Redis channel.
The out of memory error goes away when I stub out the above Camel Actor. It looks as though Akka Camel Redis Actor
is not performant on the VM, which has a slower cpu clock speed than my Xeon CPU.
Shing
The GC throws this exception when too much time is spent in garbage collection without collecting anything. I believe the default settings are 98% of CPU time being spent on GC with only 2% of heap being recovered.
This is to prevent applications from running for an extended period of time while making no progress because the heap is too small.
You can turn this off with the command line option -XX:-UseGCOverheadLimit

Resources