Full GC duration difference in same JVM - garbage-collection

Why is the amount of time to complete full GC varies significantly in same JVM
We got 8GB heap Sun JVM.
Some times it is 13 seconds and rarely (once a week) it is 530 seconds. This long FULL GCs is causing some communication issue in our clustered environment. Is the difference in resource availability (like cpu cycles non availability etc ) when Full GC occurs is causing this issue? Whether changing our gc parameters will help? . Please find our gc parameters below.
example:
157858.158: [Full GC 157858.158: [Tenured: 5567918K->2718558K(5593088K), 13.4078854 secs] 7042362K->2718558K(7689728K), [Perm : 202405K->202405K(524288K)], 13.4079752 secs]
683185.700: [Full GC 683185.700: [Tenured: 5584345K->2461609K(5593088K), 536.8253698 secs] 7028566K->2461609K(7689728K), [Perm : 242259K->242259K(524288K)], 536.8254562 secs]
Environment:
We are running a application on SAP Netweaver Server - Sun JVM.
java -version
java version "1.4.2_19-rev"
Java(TM) Platform, Standard Edition for Business (build 1.4.2_19-rev-b0
Java HotSpot(TM) 64-Bit Server VM (build 1.4.2_19-rev-b07, mixed mode)
JVM parameters:
Xmx8192M
-Xms8192M
-XX:PermSize=512M
-XX:MaxPermSize=512M
-XX:NewSize=2730M
-XX:MaxNewSize=2730M
-Djco.jarm=1
-XX:SurvivorRatio=2
-XX:TargetSurvivorRatio=90
-XX:MaxTenuringThreshold=10
-XX:SoftRefLRUPolicyMSPerMB=1
-XX:+DisableExplicitGC
-XX:+UseParNewGC
-XX:+UseTLAB
-XX:+HandlePromotionFailure
-XX:ParallelGCThreads=32
-verbose:gc
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-Xss2M
-XX:CompilerThreadStackSize=4096
-Djava.awt.headless=true
-Dsun.io.useCanonCaches=false
-Djava.security.policy=./java.policy
-Djava.security.egd=file:/dev/urandom
-Dorg.omg.CORBA.ORBClass=com.sap.engine.system.ORBProxy
-Dorg.omg.CORBA.ORBSingletonClass=com.sap.engine.system.ORBSingletonProxy
-Djavax.rmi.CORBA.PortableRemoteObjectClass=com.sap.engine.system.PortableRemoteObjectProxy
-Dvr2m.meta.directory.class=com.vendavo.core.util.VenMetaDir
-Dvr2m.home=E:\Vendavo
-Djasper.reports.compile.class.path=E:\<>\jasperreports\v1.2.5\jasperreports -1.2.5.jar;E:\<dsaf>iReport\v1.2.4\iReport- 1.2.4.jar;E:\<dsaf>\iReport\v1.2.4\itext-1.3.1.jar;E:\<dsaf>\classes\jars\abc.jar;
-Dvr2m.cluster.mynodename=n1_server101
-XX:+HeapDumpOnCtrlBreak
-XX:+HeapDumpOnOutOfMemoryError
Below is the same tenuring distribution. Not sure of the way to send the complete GC logs.
Desired survivor size 644087808 bytes, new threshold 10 (max 10)
- age 1: 17299744 bytes, 17299744 total
- age 2: 4327344 bytes, 21627088 total
- age 3: 2152536 bytes, 23779624 total
- age 4: 1291104 bytes, 25070728 total
- age 5: 2277184 bytes, 27347912 total
- age 6: 8323128 bytes, 35671040 total
- age 9: 1859888 bytes, 37530928 total
- age 10: 2849376 bytes, 40380304 total
: 1465272K->39817K(2096640K), 0.0317708 secs] 7042426K->5619506K(7689728K), 0.0318546 secs]
682873.961: [GC 682873.961: [ParNew
Desired survivor size 644087808 bytes, new threshold 10 (max 10)
- age 1: 17629648 bytes, 17629648 total
- age 2: 1937560 bytes, 19567208 total
- age 3: 4322600 bytes, 23889808 total
- age 4: 2051048 bytes, 25940856 total
- age 5: 910360 bytes, 26851216 total
- age 6: 2237400 bytes, 29088616 total
- age 7: 8322776 bytes, 37411392 total
- age 10: 1859936 bytes, 39271328 total
: 1437577K->38693K(2096640K), 0.0363818 secs] 7017266K->5621199K(7689728K), 0.0364742 secs]
683032.408: [GC 683032.408: [ParNew
Desired survivor size 644087808 bytes, new threshold 10 (max 10)
- age 1: 27372472 bytes, 27372472 total
- age 2: 414904 bytes, 27787376 total
- age 3: 1828208 bytes, 29615584 total
- age 4: 4318504 bytes, 33934088 total
- age 5: 2051520 bytes, 35985608 total
- age 6: 760512 bytes, 36746120 total
- age 7: 2153392 bytes, 38899512 total
- age 8: 8322232 bytes, 47221744 total
: 1436453K->46460K(2096640K), 0.0555022 secs] 7018959K->5630806K(7689728K), 0.0555993 secs]
683185.700: [Full GC 683185.700: [Tenured: 5584345K->2461609K(5593088K), 536.8253698 secs] 7028566K->2461609K(7689728K), [Perm : 242259K->242259K(524288K)], 536.8254562 secs]
684682.569: [GC 684682.569: [ParNew

This looks like "promotion failure" syndrom.
GC has two cycles
young GC or minor GC, collects garbage in young space
full GC collects both young and old (if Mark Sweep Compact algorithm is chosen as in your configuration)
Young GC promotes some live objects to old space
if they have been tenured long enough
or in case if too much objects have survived and survivor subspace of young space cannot accommodate them
Normally JVM estimates amount of free space in old space it needs for young collection and starts full GC if free memory is low.
But if that estimate is wrong free space in old space can get exhausted in the middle of young collection.
In this case JVM have to rollback young collection and start full collection. During work of copy collection (used for young collection) object graph is not consistent and mark sweep compact cannot start until graph is fixed to consistent state.
Unfortunately this roll back could take order of magnitude longer than normal full GC.
This problem is typical for Concurrent Mark Sweep collector, but may affect Mark Sweep Compact too.
UPDATE
Looking at your GC logs and tenuring distribution, I would suggest you to reduce young space (and thus scale of potential "rollback").
Judging from your logs cutting young space down 3 times may be reasonable. -XX:NewSize=900M -XX:MaxNewSize=900M
Upgrading JVM would be another good options (failure prediction logic has likely been improved since 1.4 times).
Below are few links related to GC in HotSpot JVM:
Understanding GC pauses in JVM, HotSpot's minor GC
Garbage collection in HotSpot JVM
How to tame java GC pauses? Surviving 16GiB heap and greater

Related

Application is taking more time to process the JNI weak reference during remark phase of G1GC

The application is running to unexpected behavior due to this long GC and am trying to bring the GC time below 500ms.
Snippet from GC logs:
2020-03-17T16:50:04.505+0900: 1233.742: [GC remark
2020-03-17T16:50:04.539+0900: 1233.776: [GC ref-proc
2020-03-17T16:50:04.539+0900: 1233.776: [SoftReference, 0 refs, 0.0096740 secs]
2020-03-17T16:50:04.549+0900: 1233.786: [WeakReference, 3643 refs, 0.0743530 secs]
2020-03-17T16:50:04.623+0900: 1233.860: [FinalReference, 89 refs, 0.0100470 secs]
2020-03-17T16:50:04.633+0900: 1233.870: [PhantomReference, 194 refs, 9 refs, 0.0168580 secs]
2020-03-17T16:50:04.650+0900: 1233.887: [JNI Weak Reference, 0.9726330 secs], 1.0839410 secs], 1.1263670 secs]
Application is running on Java 7 with the below JVM options:
CommandLine flags: -XX:+AggressiveOpts -XX:GCLogFileSize=52428800 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=4294967296
-XX:+ManagementServer -XX:MaxHeapSize=8589934592 -XX:MaxPermSize=805306368 -XX:MaxTenuringThreshold=15 -XX:NewRatio=5
-XX:NumberOfGCLogFiles=30 -XX:+OptimizeStringConcat -XX:PermSize=268435456 -XX:+PrintGC -XX:+PrintGCDateStamps
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintReferenceGC -XX:+UseCompressedOops
-XX:+UseFastAccessorMethods -XX:+UseG1GC -XX:+UseGCLogFileRotation -XX:+UseStringCache
Changing the parameters like NewRatio, MaxTenuringThreshold, InitialHeapSize etc is changing the frequency of such long GCs, but still there are one or two.
Is there any way to figure out what is contributing to long processing time of JNI weak reference?

Making sense from GHC profiler

I'm trying to make sense from GHC profiler. There is a rather simple app, which uses werq and lens-aeson libraries, and while learning about GHC profiling, I decided to play with it a bit.
Using different options (time tool, +RTS -p -RTS and +RTS -p -h) I acquired entirely different numbers of my memory usage. Having all those numbers, I'm now completely lost trying to understand what is going on, and how much memory the app actually uses.
This situation reminds me the phrase by Arthur Bloch: "A man with a watch knows what time it is. A man with two watches is never sure."
Can you, please, suggest me, how I can read all those numbers, and what is the meaning of each of them.
Here are the numbers:
time -l reports around 19M
#/usr/bin/time -l ./simple-wreq
...
3.02 real 0.39 user 0.17 sys
19070976 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
21040 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
71 messages sent
71 messages received
2991 signals received
43 voluntary context switches
6490 involuntary context switches
Using +RTS -p -RTS flag reports around 92M. Although it says "total alloc" it seems strange to me, that a simple app like this one can allocate and release 91M
# ./simple-wreq +RTS -p -RTS
# cat simple-wreq.prof
Fri Oct 14 15:08 2016 Time and Allocation Profiling Report (Final)
simple-wreq +RTS -N -p -RTS
total time = 0.07 secs (69 ticks # 1000 us, 1 processor)
total alloc = 91,905,888 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
main.g Main 60.9 88.8
MAIN MAIN 24.6 2.5
decodeLenient/look Data.ByteString.Base64.Internal 5.8 2.6
decodeLenientWithTable/fill Data.ByteString.Base64.Internal 2.9 0.1
decodeLenientWithTable.\.\.fill Data.ByteString.Base64.Internal 1.4 0.0
decodeLenientWithTable.\.\.fill.\ Data.ByteString.Base64.Internal 1.4 0.1
decodeLenientWithTable.\.\.fill.\.\.\.\ Data.ByteString.Base64.Internal 1.4 3.3
decodeLenient Data.ByteString.Base64.Lazy 1.4 1.4
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 443 0 24.6 2.5 100.0 100.0
main Main 887 0 0.0 0.0 75.4 97.4
main.g Main 889 0 60.9 88.8 75.4 97.4
object_ Data.Aeson.Parser.Internal 925 0 0.0 0.0 0.0 0.2
jstring_ Data.Aeson.Parser.Internal 927 50 0.0 0.2 0.0 0.2
unstream/resize Data.Text.Internal.Fusion 923 600 0.0 0.3 0.0 0.3
decodeLenient Data.ByteString.Base64.Lazy 891 0 1.4 1.4 14.5 8.1
decodeLenient Data.ByteString.Base64 897 500 0.0 0.0 13.0 6.7
....
+RTS -p -h and hp2ps show me the following picture and two numbers: 114K in the header and something around 1.8Mb on the graph.
And, just in case, here is the app:
module Main where
import Network.Wreq
import Control.Lens
import Data.Aeson.Lens
import Control.Monad
main :: IO ()
main = replicateM_ 10 g
where
g = do
r <- get "http://httpbin.org/get"
print $ r ^. responseBody
. key "headers"
. key "User-Agent"
. _String
UPDATE 1: Thank everyone for incredible good responses. As was suggested, I add +RTS -s output, so the entire picture builds up for everyone who read it.
#./simple-wreq +RTS -s
...
128,875,432 bytes allocated in the heap
32,414,616 bytes copied during GC
2,394,888 bytes maximum residency (16 sample(s))
355,192 bytes maximum slop
7 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 194 colls, 0 par 0.018s 0.022s 0.0001s 0.0022s
Gen 1 16 colls, 0 par 0.027s 0.031s 0.0019s 0.0042s
UPDATE 2: The size of the executable:
#du -h simple-wreq
63M simple-wreq
A man with a watch knows what time it is. A man with two watches is never sure.
Ah, but what do does two watches show? Are both meant to show the current time in UTC? Or is one of them supposed to show the time in UTC, and the other one the time on a certain point on Mars? As long as they are in sync, the second scenario wouldn't be a problem, right?
And that is exactly what is happening here. You compare different memory measurements:
the maximum residency
the total amount of allocated memory
The maximum residency is the highest amount of memory your program ever uses at a given time. That's 19MB. However, the total amount of allocated memory is a lot more, since that's how GHC works: it "allocates" memory for objects that are garbage collected, which is almost everything that's not unpacked.
Let us inspect a C example for this:
int main() {
int i;
char * mem;
for(i = 0; i < 5; ++i) {
mem = malloc(19 * 1000 * 1000);
free(mem);
}
return 0;
}
Whenever we use malloc, we will allocate 19 megabytes of memory. However, we free the memory immediately after. The highest amount of memory we ever have at one point is therefore 19 megabytes (and a little bit more for the stack and the program itself).
However, in total, we allocate 5 * 19M, 95M total. Still, we could run our little program with just 20 megs of RAM fine. That's the difference between total allocated memory and maximum residency. Note that the residency reported by time is always at least du <executable>, since that has to reside in memory too.
That being said, the easiest way to generate statistics is -s, which will show how what was the maximum residency from the Haskell's program point of view. In your case, it will be the 1.9M, the number in your heap profile (or double the amount due to profiling). And yeah, Haskell executables tend to get extremely large, since libraries are statically linked.
time -l is displaying the (resident, i.e. not swapped out) size of the process as seen by the operating system (obviously). This includes twice the maximum size of the Haskell heap (due to the way that GHC's GC works), plus anything else allocated by the RTS or other C libraries, plus the code of your executable itself plus the libraries it depends on, etc. I'm guessing in this case the primary contributor to the 19M is the size of your exectuable.
total alloc is the total amount allocated onto the Haskell heap. It is not at all a measure of maximum heap size (which is what people usually mean by "how much memory is my program using"). Allocation is very cheap and allocation rates of around 1GB/s are typical for a Haskell program.
The number in the header of the hp2ps output "114,272 bytes x seconds" is something completely different again: it is the integral of the graph, and is measured in bytes * seconds, not in bytes. For example if your program holds onto a 10 MB structure for 4 seconds then that will cause this number to increase by 40 MB*s.
The number around 1.8 MB shown in the graph is the actual maximum size of the Haskell heap, which is probably the number you're most interested in.
You've omitted the most useful source of numbers about your program's execution, which is running it with +RTS -s (this doesn't even require it to have been built with profiling).

High Number of CMS mark remark pauses even though Old gen is not half full

I am trying to understand the cause for high number of CMS marks and remarks(other phases as well) averaging around 700ms even though the old gen is not even half full.Following are the GC configurations and stats from GCViewer.
-Xms3g
-Xmx3g
-XX:NewSize=1800m
-XX:MaxNewSize=1800m
-XX:MaxPermSize=256m
-XX:SurvivorRatio=8
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
Summary using GC Viewer: http://i.imgur.com/0IIbNUr.png
GC Log
152433.761: [GC [1 CMS-initial-mark: 284761K(1302528K)] 692884K(2961408K), 0.3367298 secs] [Times: user=0.33 sys=0.00, real=0.34 secs]
152434.098: [CMS-concurrent-mark-start]
152434.417: [CMS-concurrent-mark: 0.318/0.318 secs] [Times: user=1.38 sys=0.02, real=0.32 secs]
152434.417: [CMS-concurrent-preclean-start]
152434.426: [CMS-concurrent-preclean: 0.008/0.009 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
152434.426: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 152439.545: [CMS-concurrent-abortable-preclean: 4.157/5.119 secs] [Times: user=5.82 sys=0.20, real=5.12 secs]
152439.549: [GC[YG occupancy: 996751 K (1658880 K)]152439.550: [Rescan (parallel) , 0.5383841 secs]152440.088: [weak refs processing, 0.0070783 secs]152440.095: [class unloading, 0.0777632 secs]152440.173: [scrub symbol & string tables, 0.0416825 secs] [1 CMS-remark: 284761K(1302528K)] 1281512K(2961408K), 0.6771800 secs] [Times: user=3.35 sys=0.02, real=0.68 secs]
152440.227: [CMS-concurrent-sweep-start]
152440.613: [CMS-concurrent-sweep: 0.382/0.386 secs] [Times: user=0.39 sys=0.01, real=0.39 secs]
152440.613: [CMS-concurrent-reset-start]
152440.617: [CMS-concurrent-reset: 0.004/0.004 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
152441.719: [GC [1 CMS-initial-mark: 284757K(1302528K)] 1320877K(2961408K), 0.7720557 secs] [Times: user=0.78 sys=0.01, real=0.77 secs]
152442.492: [CMS-concurrent-mark-start]
CMS remark has to scan the young generation, since your young generation is so large this takes some time. Depending on the java version (which you did not specify!) you may have to enable parallel remarking (CMSParallelRemarkEnabled).
Enabling CMSScavengeBeforeRemark may also reduce the amount of memory that needs to be scanned during remarks.
And simply shrinking the new generation and taking the hit of a few more promotions that then get cleaned out by the concurrent old gen GC may work too.
I don't think incremental mode fixes anything here, it just drastically alters the behavior of CMS that it masks your original issue.

Extremely long pause times for concurrent mode failure and promotion failure

I'm trying to troubleshoot extremely long pause times when using the CMS collector. I'm using Java 1.6.0u20 and planning an upgrade to 1.7.0u71 but we are stuck right now on this older version.
I'm wondering if anyone has any insight into these long "real" pauses.
The machine is a VM but there are only 2 VMs on the ESX host and they are using less than the total number of cores and ram available, so swapping shouldn't be an issue, but I'm not 100% sure. Any tips related to JVM on a VM would be appreciated as well.
Increasing the heap doesn't help - we started with 1gb on the throughput collector and went to 1.5, 2, 4, 5, 6, ... just last night I increased the heap size to 10gb. The problem always remains with larger or smaller new sizes, etc.
Here is a concurrent mode failure:
2014-11-13T09:36:12.805-0700: 34537.287: [GC 34537.288: [ParNew: 2836628K->2836628K(3058944K), 0.0000296 secs]34537.288: [CMS: 3532075K->1009314K(6989824K), 298.2601836 secs] 6368704K->1009314K(10048768K), [CMS Perm : 454750K->105512K(524288K)], 298.2603873 secs] [Times: user=5.89 sys=31.00, real=297.67 secs]
Total time for which application threads were stopped: 298.2647309 seconds
Here is a promotion failure:
2014-11-13T11:23:30.395-0700: 40974.985: [GC 40974.985: [ParNew (promotion failed)
Desired survivor size 223739904 bytes, new threshold 7 (max 7)
- age 1: 126097168 bytes, 126097168 total
: 3058944K->2972027K(3058944K), 1.6271403 secs]40976.612: [CMS: 6369748K->1735350K(6989824K), 26.6789774 secs] 9103364K->1735350K(10048768K), [CMS Perm : 129283K->105970K(524288K)], 28.3063205 secs] [Times: user=8.05 sys=2.08, real=28.38 secs]
Total time for which application threads were stopped: 28.3069287 seconds
Why are the "real" times so much longer than the cpu/kernel times??
[Times: user=5.89 sys=31.00, real=297.67 secs]
[Times: user=8.05 sys=2.08, real=28.38 secs]

how to find application suspension time from GC log files

I am new to Garbage collection ,Plz some one help me to get answers for my following questions with clear explanation
I want to find application suspension time and suspension count from the GC logs files for different JVM's :
SUN
jRockit
IBM
of different versions.
A. For SUN i am using JVM options
-Xloggc:gc.log -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+UseParNewGC -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
B. For JRockit i am using JVM options
-Xms100m -Xmx100m -Xns50m -Xss200k -Xgc:genconpar -Xverbose:gc -Xverboselog:gc_jrockit.log
My Questions are
Q1. What is suspension time of an application and why it occurs.
Q2. How to say by looking on logs that suspension was occurred.
Q3. Does suspension time of an application = sum of GC Times.
Eg:
2013-09-06T23:35:23.382-0700: [GC 150.505: [ParNew
Desired survivor size 50331648 bytes, new threshold 2 (max 15)
- age 1: 28731664 bytes, 28731664 total
- age 2: 28248376 bytes, 56980040 total
: 688128K->98304K(688128K), 0.2166700 secs] 697655K->163736K(10387456K), 0.2167900 secs] [Times: user=0.44 sys=0.04, real=0.22 secs]
2013-09-06T23:35:28.044-0700: 155.167: [GC 155.167: [ParNew
Desired survivor size 50331648 bytes, new threshold 15 (max 15)
- age 1: 22333512 bytes, 22333512 total
- age 2: 27468336 bytes, 49801848 total
: 688128K->71707K(688128K), 0.0737140 secs] 753560K->164731K(10387456K), 0.0738410 secs] [Times: user=0.30 sys=0.02, real=0.07 secs]
suspensionTime = 0.2167900 secs + 0.0738410 secs
i. If yes do i need to add all times for every gc occurs
ii. If no Plz explain me in detail for those logs we consider that suspension occured and those not consider for different Collectors with logs
Q4. Can we say GC times "0.2167900 , 0.0738410" are equal to GC Pauses ie;TotalGCPause = 0.2167900 + 0.0738410
Q5. Can we calculate suspension time by using only above flags or we need to include extra flags like -XX:+PrintGCApplicationStoppedTime for SUN
Q6. I seen an tool dyna trace it calculating suspension time and count for SUN with out using the flag -XX:+PrintGCApplicationStoppedTime
If you want the most precise information about the amount of time your application was stopped due to GC activity, you should go with -XX:+PrintGCApplicationStoppedTime.
-XX:+PrintGCApplicationStoppedTime enables the printing of the amount of time application threads have been stopped as the result of an internal HotSpot VM operation (GC and safe-point operations).
But, for practical daily usage the information provided by the GC logs is sufficient. You can use the approach described in your question 3. to determine the time spent in the GC.

Resources