Can we use G1GC Garbage Collector for smaller heap size - garbage-collection

We need help regarding G1GC Algorithm , can we use this GC for low heap size around 1GB or this is only for bigger heap.
We want to use this GC because we are getting "GC overhead limit exceeds" and application using the default GC collector.

I assume you still use OpenJDK 8 and the Parallel garbage collector (otherwise the default is G1 already). It is difficult to generalize things, but G1 tends to benefit more from additional headroom than the parallel collector. If the heap size is already tight, switching to G1 will likely make things worse. On other hand, if your workload benefits a lot from G1 string deduplication (which needs to be enabled explicitly with -XX:+UseStringDeduplication), G1 may indeed be the better option.
In any case, you probably should upgrade to OpenJDK 11 if you want to use G1 because of the many improvements there.

Related

How to increase memory at startup?

is there an option for node.js to increase initial allocated memory?
https://futurestud.io/tutorials/node-js-increase-the-memory-limit-for-your-process
the --max-old-space-size seems to increase max memory but what about initial memory?
Kind of like xmx and xms for the JVM.
V8 developer here. The short answer is: no.
The reason no such option exists is that adding fresh pages to the heap is so fast that there is no significant benefit to doing it up front.
V8 does have a flag --initial-old-space-memory, but it doesn't increase the initial allocation. Instead, what it means is "don't bother doing (old-space) GC while the heap size is below this limit". If you set that to, e.g., 1000 (MB), and then allocate 800MB of unreachable objects, and then just wait, then V8 will sit around forever with 800MB of garbage on the heap and won't lift a finger to get rid of any of that.
I'm not sure in what scenario this behavior would be useful (it's not like it will turn off GC entirely; GC will just run less frequently, but fewer GCs on a bigger heap don't necessarily add up to less total time than more GCs on a smaller heap), so I would strongly recommend to measure the effect on your particular workload carefully before using this flag -- if it were a good idea to have this on by default, then it would be on by default!
If I had to guess: this flag might be beneficial if you know that (1) your application will have a large amount of "eternal" (=lives as long as the app is running) data on the heap, and (2) you can estimate the amount of that data with reasonable accuracy. E.g.: if you know that at any given time, your old-space will consist of 500MB of always-reachable-anyway data plus any potentially-freeable-garbage, you could use this flag to tell V8 "if old-space size is below 600MB (=500MB plus a little), then don't bother trying to find garbage, it won't be worth the effort".

How to automate garbage Collection execution using java melody?

Im using java melody to monitor memory usage in production environment.
The requirement is memory not should exceed 256MB/512MB .
I have done maximum of code optimized but still the usage is 448MB/512MB but when i executed garbage collector in java melody manually the memory consumption is 109MB/512MB.
You can invoke the garbage collection using one of these two calls in your code (they're equivalent):
System.gc();
Runtime.getRuntime().gc();
It's better if you place the call in a Runnable that gets invoked periodically, depending on how fast your memory limit is reached.
Why you actually care about heap usage? as long as you set XMS (maximum heap) you are fine. Let java invoke GC when it seems fit. As long as you have free heap it is no point doing GC and freeing heap just for sake of having a lot of free heap.
If you want to limit memory allocated by process XMX is not enough. You should also limit native memory.
What you should care about is
Memory leaks, consecutive Full GCs, GC starvation
GC KPIs: Latency, throughput, footprint
Object creation rate, promotion rate, reclamation rateā€¦
GC Pause time statistics: Duration distribution, average, count, average interval, min/max, standard deviation
GC Causes statistics: Duration, Percentage, min/max, total
GC phases related statistics: Each GC algorithm has several sub-phases. Example for G1: initial-mark, remark, young, full, concurrent mark, mixed
See https://blog.gceasy.io/2017/05/30/improving-your-performance-reports/ https://blog.gceasy.io/2017/05/31/gc-log-analysis-use-cases/ for more technical details. You could also analyze your GC logs using https://blog.gceasy.io/ it will help you understand how your JVM is using memory.

Why does the java8 GC not collect for over 11 hours?

Context: 64 bit Oracle Java SE 1.8.0_20-b26
For over 11 hours, my running java8 app has been accumulating objects in the Tenured generation (close to 25%). So, I manually clicked on the Perform GC button in jconsole and you can see the precipitous drop in heap memory on the right of the chart. I don't have any special VM options turned on except for XX:NewRatio=2.
Why does the GC not clean up the tenured generation ?
This is a fully expected and desirable behavior. The JVM has been successfully avoiding a Major GC by performing timely Minor GC's all along. A Minor GC, by definition, does not touch the Tenured Generation, and the key idea behind generational garbage collectors is that precisely this pattern will emerge.
You should be very satisfied with how your application is humming along.
The throughput collector's primary goal is, as its name says, throughput (via GCTimeRatio). Its secondary goal is pause times (MaxGCPauseMillis). Only as tertiary goal it considers keeping the memory footprint low.
If you want to achieve a low heap size you will have to relax the other two goals.
You may also want to lower MaxHeapFreeRatio to allow the JVM to yield back memory to the OS.
Why does the GC not clean up the tenured generation ?
Because it doesn't need to.
It looks like your application is accumulating tenured garbage at a relatively slow rate, and there was still plenty of space for tenured objects. The "throughput" collector generally only runs when a space fills up. That is the most efficient in terms of CPU usage ... which is what the throughput collector optimizes for.
In short, the GC is working as intended.
If you are concerned by the amount of memory that is being used (because the tenured space is not being collected), you could try running the application with a smaller heap. However, the graph indicates that the application's initial behavior may be significantly different to its steady-state behavior. In other words, your application may require a large heap to start with. If that is the case, then reducing the heap size could stop the application working, or at least make the startup phase a lot slower.

How much extra memory does garbage collection require?

I heard once that for a language to implement and run garbage collection correctly there is on average of 3x more memory required. I am not sure if this is assuming the application is small, large or either.
So i wanted to know if theres any research or actually numbers of garbage collection overhead. Also i want to say GC is a very nice feature.
The amount of memory headroom you need depends on the allocation rate within your program. If you have a high allocation rate, you need more room for growth while the GC works.
The other factor is object lifetime. If your objects typically have a very short lifetime, then you may be able to manage with slightly less headroom with a generational collector.
There are plenty of research papers that may interest you. I'll edit a bit later to reference some.
Edit (January 2011):
I was thinking of a specific paper that I can't seem to find right now. The ones below are interesting and contain some relevant performance data. As a rule of thumb, you are usually ok with about twice as much memory available as your program residency. Some programs need more, but other programs will perform very well even in constrained environments. There are lots of variables that influence this, but allocation rate is the most important one.
Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance
Myths and realities: the performance impact of garbage collection
Edit (February 2013): This edit adds a balanced perspective on a paper cited, and also addresses objections raised by Tim Cooper.
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management, as noted by Natan Yellin, is actually the reference I was first trying to remember back in January 2011. However, I don't think the interpretation Natan has offered is correct. That study does not compare GC against conventional manual memory management. Rather it compares GC against an oracle which does perfect explicit releases. In otherwords, it leaves us not know how well conventional manual memory management compares to the magic oracle. It is also very hard to find this out because the source programs are either written with GC in mind, or with manual memory management in mind. So any benchmark retains in inherent bias.
Following Tim Cooper's objections, I'd like to clarify my position on the topic of memory headroom. I do this mainly for posterity, as I believe Stack Overflow answers should serve as a long-term resource for many people.
There are many memory regions in a typical GC system, but three abstract kinds are:
Allocated space (contains live, dead, and untraced objects)
Reserved space (from which new objects are allocated)
Working region (long-term and short-term GC data structures)
What is headroom anyway? Headroom is the minimum amount of reserved space needed to maintain a desired level of performance. I believe that is what the OP was asking about. You can also think of the headroom as memory additional to the actual program residency (maximum live memory) neccessary for good performance.
Yes -- increasing the headroom can delay garbage collection and increase throughput. That is important for offline non-critical operations.
In reality most problem domains require a realtime solution. There are two kinds of realtime, and they are very different:
hard-realtime concerns worst case delay (for mission critical systems) -- a late response from the allocator is an error.
soft-realtime concerns either average or median delay -- a late response from the allocator is ok, but shouldn't happen often.
Most state of the art garbage collectors aim for soft-realtime, which is good for desktop applications as well as for servers that deliver services on demand. If one eliminates realtime as a requirement, one might as well use a stop-the-world garbage collector in which headroom begins to lose meaning. (Note: applications with predominantly short-lived objects and a high allocation rate may be an exception, because the survival rate is low.)
Now suppose that we are writing an application that has soft-realtime requirements. For simplicity let's suppose that the GC runs concurrently on a dedicated processor. Suppose the program has the following artificial properties:
mean residency: 1000 KB
reserved headroom: 100 KB
GC cycle duration: 1000 ms
And:
allocation rate A: 100 KB/s
allocation rate B: 200 KB/s
Now we might see the following timeline of events with allocation rate A:
T+0000 ms: GC cycle starts, 100 KB available for allocations, 1000 KB already allocation
T+1000 ms:
0 KB free in reserved space, 1100 KB allocated
GC cycle ends, 100 KB released
100 KB free in reserve, 1000 KB allocated
T+2000 ms: same as above
The timeline of events with allocation rate B is different:
T+0000 ms: GC cycle starts, 100 KB available for allocations, 1000 KB already allocation
T+0500 ms:
0 KB free in reserved space, 1100 KB allocated
either
delay until end of GC cycle (bad, but sometimes mandatory), or
increase reserved size to 200 KB, with 100 KB free (assumed here)
T+1000 ms:
0 KB free in reserved space, 1200 KB allocated
GC cycle ends, 200 KB released
200 KB free in reserve, 1000 KB allocated
T+2000 ms:
0 KB free in reserved space, 1200 KB allocated
GC cycle ends, 200 KB released
200 KB free in reserve, 1000 KB allocated
Notice how the allocation rate directly impacts the size of the headroom required? With allocation rate B, we require twice the headroom to prevent pauses and maintain the same level of performance.
This was a very simplified example designed to illustrate only one idea. There are plenty of other factors, but it does show what was intended. Keep in mind the other major factor I mentioned: average object lifetime. Short lifetimes cause low survival rates, which work together with the allocation rate to influence the amount of memory required to maintain a given level of performance.
In short, one cannot make general claims about the headroom required without knowing and understanding the characteristics of the application.
According to the 2005 study Quantifying the Performance of Garbage Collection vs. Explicit Memory Management (PDF), generational garbage collectors need 5 times the memory to achieve equal performance. The emphasis below is mine:
We compare explicit memory management to both copying and non-copying garbage collectors across a range of benchmarks, and include real (non-simulated) runs that validate our results. These results quantify the time-space tradeoff of garbage collection: with five times as much memory, an Appel-style generational garbage collector with a non-copying mature space matches the performance of explicit memory management. With only three times as much memory, it runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection
degrades performance by nearly 70%. When physical memory is scarce, paging causes garbage collection to run an order of magnitude slower than explicit memory management.
I hope the original author clearly marked what they regard as correct usage of garbage collection and the context of their claim.
The overhead certainly depends on many factors; e.g., the overhead is larger if you run your garbage collector less frequently; a copying garbage collector has a higher overhead than a mark and sweep collector; and it is much easier to write a garbage collector with lower overhead in a single-threaded application than in the multi-threaded world, especially for anything that moves objects around (copying and/or compacting gc).
So i wanted to know if theres any research or actually numbers of garbage collection overhead.
Almost 10 years ago I studied two equivalent programs I had written in C++ using the STL (GCC on Linux) and in OCaml using its garbage collector. I found that the C++ used 2x more memory on average. I tried to improve it by writing custom STL allocators but was never able to match the memory footprint of the OCaml.
Furthermore, GCs typically do a lot of compaction which further reduces the memory footprint. So I would challenge the assumption that there is a memory overhead compared to typical unmanaged code (e.g. C++ using what are now the standard library collections).

Is it possible to monitor "Full GC" frequency in JMX (on HotSpot)?

I want to monitor Full GC frequency in JMX. A MBean exposes GC count.
(cf. http://download.oracle.com/javase/1.5.0/docs/api/java/lang/management/GarbageCollectorMXBean.html - java.lang:type=GarbageCollector,name=).
The problem is that MBean does not distinguish between minor and full gc.
Does someone have an idea ?
Thanks.
Arnault
I'm not completely sure about this but I assume that the garbage collector that controls all the memory pools (at least the one for Old Gen) is the one used for major gc. e.g.: I have a JVM running with these 2 collectors:
PS MarkSweep
MemoryPoolNames: PS Eden Space, PS Survivor Space, PS Old Gen, PS Perm Gen
CollectionCount: 68
PS Scavenge
MemoryPoolNames: PS Eden Space, PS Survivor Space
CollectionCount: 2690
Taking this into account I would say, PS Scavenge is used for minor gc and PS MarkSweep for major gc.
UPDATE (based on #ajeanson comment, thanks for your feedback btw):
Effectively, the example I put in there was taken from the information exposed in the MXBeans of the JVM I was using. As you mentioned, these are GC algorithms, and the name the MXBean for the GC is using is based on the algorithm the GC is using. I've been looking for some more information about this; in this article http://download.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html, reads the following:
The Java HotSpot VM defines two
generations: the young generation
(sometimes called the "nursery") and
the old generation. The young
generation consists of an "Eden space"
and two "survivor spaces." The VM
initially assigns all objects to the
Eden space, and most objects die
there. When it performs a minor GC,
the VM moves any remaining objects
from the Eden space to one of the
survivor spaces. The VM moves objects
that live long enough in the survivor
spaces to the "tenured" space in the
old generation. When the tenured
generation fills up, there is a full
GC that is often much slower because
it involves all live objects. The
permanent generation holds all the
reflective data of the virtual machine
itself, such as class and method
objects.
Taking a look at the collectionCount property on the MXBeans, in the case of my "PS MarkSweep" collector (the one managing the Old Generation pool), the collection count seems to increase only when I get a full GC in the verbose output. I might be wrong and maybe in some cases this Collector performs also minor GC, but I would need to run more tests to be totally sure about this.
Please, let me know if someone finds out something else or you have some more specific information about this issue as I'm quite interested in it.
it does ... have a look to the names e.g. ParNew, ConcurrentMarkSweep, .. etc.
some names are for minor gc, some for full gc,
Was looking for the same information and found out after reading https://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html#BABFAFAE for JAVA 8 that some collectors can be used for both minor/full GCs (such as G1 or SerialGC) but some other collectors are for only minor or full GCs (such as ParNewGC, ConcMarkSweepGC).
And when you use the G1 for example, the two collectors used are quite explicit with their names and the one for full gc is the G1 Old Generation.
But, because the MXBean is missing the information about being minor or full, either:
you know the GC in use for your app and code accordingly your monitoring method knowing the collector names
or you start having like a map of all possibilities for your selected JVM version
I will, in my case, just print the collector name along with the time and count value and let the person reading those data make the analysis. In my case, the data will be graphed (Grafana)
Not sure if the newest JDK improve this...

Resources