How do I measure fragmentation in Hotspot's Metaspace? - jvm-hotspot

I'm looking into debugging an "OutOfMemoryError: Metaspace" error in my application. Right before the OOME I see the following in the gc logs:
{Heap before GC invocations=6104 (full 39):
par new generation total 943744K, used 0K [...)
eden space 838912K, 0% used [...)
from space 104832K, 0% used [...)
to space 104832K, 0% used [...)
concurrent mark-sweep generation total 2097152K, used 624109K [...)
Metaspace used 352638K, capacity 487488K, committed 786432K, reserved 1775616K
class space used 36291K, capacity 40194K, committed 59988K, reserved 1048576K
2015-08-11T20:34:13.303+0000: 105892.129: [Full GC (Last ditch collection) 105892.129: [CMS: 624109K->623387K(2097152K), 3.4208207 secs] 624109K->623387K(3040896K), [Metaspace: 352638K->352638K(1775616K)], 3.4215100 secs] [Times: user=3.42 sys=0.00, real=3.42 secs]
Heap after GC invocations=6105 (full 40):
par new generation total 943744K, used 0K [...)
eden space 838912K, 0% used [...)
from space 104832K, 0% used [...)
to space 104832K, 0% used [...)
concurrent mark-sweep generation total 2097152K, used 623387K [...)
Metaspace used 352638K, capacity 487488K, committed 786432K, reserved 1775616K
class space used 36291K, capacity 40194K, committed 59988K, reserved 1048576K
}
From what I can see, Metaspace capacity isn't even nearing the committed size (in this case, -XX:MaxMetaspaceSize=768m). So I suspect fragmentation of Metaspace causing the allocator to fail to find a new chunk for the new classloader.
I'm aware of -XX:PrintFLSStatistics but that only covers CMS, not native memory.
So my question is: is there a debugging help similar to PrintFLSStatistics available for Hotspot's native memory?
This is using Java HotSpot(TM) 64-Bit Server VM (25.45-b02) for linux-amd64 JRE (1.8.0_45-b14).

I've just looked into the implementation of the Metaspace in HotSpot. The Metaspace is divided into chunks and managed using a freelist. So fragmentation is indeed a possible reason for your problem.
I've also looked through the flags of the HotSpot VM (-XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal), there is no flag in the release version.
However, there is a dump() method in the Metaspace class which seems to be triggered by setting the -XX:+TraceMetadataChunkAllocation flag. There is also the -XX:+TraceMetavirtualspaceAllocation which is sounding to be of interest for you. However, those are "develop" flags, meaning you need a debug version of the VM.

#loonytune's answer works just fine, but I want to provide a little bit more detail:
For context, "The Metaspace" is a collection of metaspaces, one per class loader. Each metaspace holds a list of VirtualSpace objects out of which Metachunks of different sizes are allocated. These chunks hold MetaBlocks, which are the real containers for metadata.
I need a debug JRE to run those flags, so following this tuorial I checked out the openjdk repository (I renamed the checkout to vm because the build scripts seem to take issue with the jdk8 folder name), ran
~/vm$ bash configure --enable-debug
~/vm$ DISABLE_HOTSPOT_OS_VERSION_CHECK=ok make all
and used the resulting vm/build/linux-x86_64-normal-server-fastdebug/images/j2re-image as my java runtime.
The log lines generated look like this:
VirtualSpaceNode::take_from_committed() not available 8192 words space # 0x00007fee4cdb9350 128K, 94% used [0x00007fedf5e22000, 0x00007fedf5f13000, 0x00007fedf5f22000, 0x00007fedf6022000)
Which indicates that the current VirtualSpace is full and can't hold another chunk of the requested 8192 word size. This will cause this metaspace to switch to another VirtualSpace.
ChunkManager::chunk_freelist_allocate: 0x00007fee4c0c39f8 chunk 0x00007fee15397400 size 128 count 0 Free chunk total 7680 count 15
ChunkManager::chunk_freelist_allocate: 0x00007fee4c0c39f8 chunk 0x00007fedf6021000 size 512 count 14 Free chunk total 7168 count 14
This happens when a new Metachunk is allocated, in the first case it's 128 words big and uses up the list of small chunks. As you can see, the next request goes to the medium sized chunks (of size 512) and leaves 14 chunks free in total. Once the free total reaches 0, a Full GC is needed to increase the total Metaspace size.
Note that specifying -verbose gets you even more output from the above two flags.

Related

Heap usage is less but free memory is less

I have 10 gb RAM
Application max heap : 8 GB
current app heap usage: 3 GB
Free memory : 188mb
total used free shared buff/cache available
Mem: 9993 9362 188 2 442 326
Swap: 4095 866 3229
So if we assign Xmx to 8GB . will the RAM reserve the 8gb for application heap ?
The JVM will reserve 8GB in its virtual address space for Java Heap.
Note there are other memory areas used for Thread stacks, GC, JIT Code cache, Direct Buffers, etc., so your Java process may consume more than just -Xmx amount of memory.
I'm not sure what you're getting to when saying "Heap usage is less but free memory is less" but if you're wondering why the free part of memory isn't ~2 GB or even ~7 GB,
then this might be because:
JVM needs more memory than just heap (as mentioned before),
what you report as "current app heap usage" may be misleading and GCs often don't return unused memory to OS (you may want to look at "total memory" usage)
OS itself and other applications need some memory too

how can I know size of each generation in java heap with jmx

jmap can know size of each generation, but I want to monitor my java process real time.
With jmx, MemoryMXBean.getHeapMemoryUsage().getUsed() can get the total heap size. But I cannot find any method to get:
Size of New (Young) and Tenured (Old) generations;
Size of Eden and each Survivor in New generations.
You can use ManagementFactory.getMemoryPoolMXBeans(). It shows all the memory regions. Depending on the GC you used, the names differ. pool.getCollectionUsage().getInit() gives the initial size of a pool. pool.getName() is for example 'G1 Eden Space' or 'G1 Old Gen' or 'G1 Survivor Space' if you use G1 GC.
pool.getUsage().getUsed() returns as well the amount of used memory for that region.

Why JVM calculated PS Survivor Space size too low for parallel collector

I am using JDK1.6.0_16 JVM for the java application that is hosted on an Linux Intel procesor 80 cores machine.
while starting the Java application I have only two options configured
-Xms2048m -Xmx8000m in the JVM Options (after java command).
I see that PS Old Gen is calculated as 5.21G and PS Eden is calcuated 2.6G but the PS Survivor space is 25MB.
I have exactly same JVM in production and in that PS Survivor Space size is shown as 888MB. I am seeing these sizes in java mission control Memory tab.
The cache size (output of /proc/cpuinfo) is showing 24656 in both UAT and production boxes.
Dont think it will make any difference for JVM but still mentioning that the there was very low load on machine at the time of starting JVM.
Can you please advise what parameters does JVM consider for calculating the PS Survivor Space size?
From oracle gc tuning article 1 and article 2 :
Survivor Space Sizing
You can use the parameter SurvivorRatio can be used to tune the size of the survivor spaces, but this is often not important for performance. For example, -XX:SurvivorRatio=6 sets the ratio between eden and a survivor space to 1:6.
In other words, each survivor space will be one-sixth the size of eden, and thus one-eighth the size of the young generation (not one-seventh, because there are two survivor spaces).
If survivor spaces are too small, copying collection overflows directly into the tenured generation. If survivor spaces are too large, they will be uselessly empty.
The NewSize and MaxNewSize parameters control the new generation’s minimum and maximum size. Regulate the new generation size by setting these parameters equal. The bigger the younger generation, the less often minor collections occur.
NewRatio: The size of the young generation relative to the old generation is controlled by NewRatio. For example, setting -XX:NewRatio=3 means that the ratio between the old and young generation is 1:3, the combined size of eden and the survivor spaces will be fourth of the heap.
As correctly quoted by Peter Lawrey, setting survivor depends on type of your application. From gc tuning article by Oracle, here are the guidelines.
First decide the maximum heap size you can afford to give the virtual machine. Then plot your performance metric against young generation sizes to find the best setting
If the total heap size is fixed, then increasing the young generation size requires reducing the tenured generation size. Keep the tenured generation large enough to hold all the live data used by the application at any given time, plus some amount of slack space (10 to 20% or more).
Subject to the previously stated constraint on the tenured generation: Grant plenty of memory to the young generation and increase the young generation size as you increase the number of processors, because allocation can be parallelized. The default is calculated from NewRatio and the -Xmx setting
Can you please advise what parameters does JVM consider for calculating the PS Survivor Space size?
It must large enough to never actually fill up after a collection of the Eden space, otherwise you will get Full GCs which is undesirable.
What is an optimal Survivor space size depends on your application. I suggest you test you application under realistic loads with a larger Eden and Survivor space than you imagine useful and see how much of that space ever gets used and add 50% to 100% based on what you see is used.
The machine has 256G of physical memory out of which ~200G
The default heap size is 32 GB and I suggest you use this default unless you have a good reason to reduce it.
-XX:SurvivorRatio=1
This is usually a bad idea and having a high survivor ratio like 8 is usually better.
Setting the value to 8 didnt have any effect
Most likely you have a low allocation rate. I usually set a high Young space, alike -Xmn8g or even -Xmn24g but whether this is a good/bad idea depends on your application.

Understanding output of GHC's +RTS -t -RTS option

I'm benchmarking the memory consumption of a haskell programm compiled with GHC. In order to do so, I run the programm with the following command line arguments: +RTS -t -RTS. Here's an example output:
<<ghc: 86319295256 bytes, 160722 GCs, 53963869/75978648 avg/max bytes residency (386 samples), 191M in use, 0.00 INIT (0.00 elapsed), 152.69 MUT (152.62 elapsed), 58.85 GC (58.82 elapsed) :ghc>>.
According to the ghc manual, the output shows:
The total number of bytes allocated by the program over the whole run.
The total number of garbage collections performed.
The average and maximum "residency", which is the amount of live data in bytes. The runtime can only determine the amount of live data during a major GC, which is why the number of samples corresponds to the number of major GCs (and is usually relatively small).
The peak memory the RTS has allocated from the OS.
The amount of CPU time and elapsed wall clock time while initialising the runtime system (INIT), running the program itself (MUT, the mutator), and garbage collecting (GC).
Applied to my example, it means that my program shuffles 82321 MiB (bytes divided by 1024^2) around, performs 160722 garbage collections, has a 51MiB/72MiB average/maximum memory residency, allocates at most 191M memory in RAM and so on ...
Now I want to know, what »The average and maximum "residency", which is the amount of live data in bytes« is compared to »The peak memory the RTS has allocated from the OS«? And also: What uses the remaining space of roughly 120M?
I was pointed here for more information, but that does not state clearly, what I want to know. Another source (5.4.4 second item) hints that the 120M memory is used for garbage collection. But that is too vague – I need a quotable information source.
So please, is there anyone who could answer my questions with good sources as proofs?
Kind regards!
The "resident" size is how much live Haskell data you have. The amount of memory actually allocated from the OS may be higher.
The RTS allocates memory in "blocks". If your program needs 7.3 blocks of of RAM, the RTS has to allocate 8 blocks, 0.7 of which is empty space.
The default garbage collection algorithm is a 2-space collector. That is, when space A fills up, it allocates space B (which is totally empty) and copies all the live data out of space A and into space B, then deallocates space A. That means that, for a while, you're using 2x as much RAM as is actually necessary. (I believe there's a switch somewhere to use a 1-space algorithm which is slower but uses less RAM.)
There is also some overhead for managing threads (especially if you have lots), and there might be a few other things.
I don't know how much you already know about GC technology, but you can try reading these:
http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel-gc/par-gc-ismm08.pdf
http://www.mm-net.org.uk/workshop190404/GHC%27s_Garbage_Collector.ppt

How is total memory in Java calculated

If I have 8GB RAM and I use the following on a 64-bit JVM
max heap size 6144MB
max perm gen space 2048MB
stack size 2MB
Q1 : Is perm gen space allocated from the max heap or a separate?
Q2 : if seperate then will the jvm with above settings get started or it will give error as heap + permgen + stack + program data would be above the total RAM?
First of all remember that the parameter you set with -Xmx (since that's the way I suppose you are setting your heap size) is the size of heap available to your Java code, not the amount of memory the JVM will consume. The difference comes from housekeeping structures that the JVM keeps (garbage collector structures, JIT overhead etc.), sometimes memory allocated by native code, buffers, and so on. The size of this additional memory depends on JVM version, the app you are running, and other factors, but I've seen JVMs allocate twice as much RAM as the heap size visible to the application. For the average case, I usually consider 50% to be a safe margin, with 20-30% acceptable. If you set your heap size to be close to amount of RAM in your machine, you will hit the swap and performance will suffer.
Now for the enumerated questions:
Perm gen is a separate space from the heap at least in Oracle's JDK 6. It is separate because it undergoes completely different memory management rules than the regular heap. By the way, 2 GB of pergen space is huge - are you sure you really need it?
Regarding the second question, see above. If this is Oracle's JDK, you are likely to run into trouble since perm and heap sums up but there will be additional memory, usually on the order of 20-50% of your 6 GB heap, and together with heap and perm space this will be more than your RAM. At first try this setup may work, but once both the heap and perm gen space usages come close to their configured limits, you could run out of memory.
heap and permgen are different memory parts of JVM. As such you will be consuming virtually all the memory on system. It is always better to leave 20% ram to be free for os/other tasks to execute properly.
Also, 2 gb for perm space is a huge figure. Have you looked at jar optimisation meaning that only relevant classes are present in the classpath?
This depends on the JVM and the version of the JVM.
In Hotspot Java 6, PermGen space is independent from the max heap size argument (-Xmx and -Xms control only the Young/OldGen sizes). The PermGen space size is given by the -XX:PermSize and -XX:MaxPermSize. See Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning
UPDATE: In Hotspot Java 8, there is no PermGen space anymore and the objects reside in the Young/Old Generation spaces.

Resources