What's the difference of UseParallelGC between Java6 and Java 7 - garbage-collection

Now I'm using the jvm parameter UseParallelGC, and I want to upgrade my jvm version to Java 7. Is there any difference between java 6 and java 7? Any one has some experience on it?

There should not be any usability changes when using the ParallelGC between Java 6 and Java 7. But, you can expect a better GC performance when switching to Java 7. I am using -XX:+UseParallelOldGC and have been upgrading just recently. Java 7 gave me several percentages of throughput and reduced the total GC time. Of course, your mileage can vary.
The main GC changes between Java 6 and 7 concern the G1 collector, the ParallelGC collector remained more or less the same with some bug fixes and minor improvements.

Related

Alfresco 5.2 high CPU utilization & application slow performance

Recently we have upgraded alfresco to 5.2.G with Solar4 , Tomcat Version 7.0.78 ,java version "1.8.0_111" environment -- RHEL -7 ,Virtual machine , 32 core CPU ,32 GB RAM application is started without errors but with in 2 -3 hours application performance is slow & getting high CPU utilization.
Can anyone suggest what are basic tuning parameters need to change in OS & JVM & Alfresco & solar LEVEL, below are the JVM arguments add in tomcat:
JAVA_OPTS="-server -Xms24576m -Xmx24576m -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC
-XX:ParallelGCThreads=6 -XX:+UseCompressedOops -XX:+CMSClassUnloadingEnabled
-Djava.awt.headless=true -Dalfresco.home=/opt/new/alfresco -Dcom.sun.management.jmxremote
-Dsun.security.ssl.allowUnsafeRenegotiation=true -XX:ReservedCodeCacheSize=2048m"
& alfresco-global.properties (line breaks added for readability) :
cifs.serverName=?
system.thumbnail.generate=false
system.enableTimestampPropagation=false
system.workflow.engine.activiti.enabled=false
sync.mode=OFF,system.workflow.engine.jbpm.enabled=false
removed-index.recovery.mode=FULL
I had a similar issue. Keep in mind that when checking CPU for multi-core CPU's using 'top' command, it shows the combined CPU usage over all the cores. To show individual cores press '1'.
I also applied the patch suggested above but it didnt show much difference (also there is an issue with that patch for Alfresco 5.2.g).
I then tried re-indexing all my content which seemed to speed up things a lot.
There is still a lot of usage but only during working hours. Once everyone goes home, it returns back to almost 0% usage
Another thing that really slowed my Alfresco response was a corrupted meta-data database. After I restored a backup, it performed much faster.
I also disabled a lot of unnecessary features including generation of thumbnails

Performance check between shared cluster and laptop with Intel(R)Core™ i7

I am not really familiar with shared clusters, but I am assuming performance should not differ much in terms of completing a single task when compared with a laptop processor. I have a C++ code which I ran on my laptop with Intel(R)Core™ i7-4558U 2.80 GHz CPU and 16.0 GB RAM, with the operating system of 64 bit Windows 10. On the other hand, I have results of the same code from a publication which belong to the tests conducted on a shared cluster with Intel Xeon 2.3 GHz CPU and 4 GB memory limit with Linux operating system. The program uses CPLEX as the solver: my laptop has IBM Cplex 12.7 whereas previous runs used IBM CPLEX 12.4 (Cplex, 2012). My results seem to take 300 times more than the reported results of the previous run. Does this much difference make sense? If so what could be the driver behind it?
This could be attributed to performance variability (see, for example, section 5 of the MIPLIB 2010 paper here). In a nutshell, minor differences in problem formulation (e.g., order of constraints, input format, etc.), or running on different platforms, can have a great effect on the time to solve. With CPLEX 12.7, you can use the interactive to help you evaluate variability.

Metaspace Memory Leak

We recently migrated our application from Java 7 to Java 8. from the day of cut over, we started seeing Out of Memory Metaspace issues. We tried increasing the metaspace space size, but it didn't help. Jvisual VM(and Jconsole) shows that 60 -70 K class files are getting loaded into memory every day and nothing getting unloaded. we tried using all kinds of GC algorithms and nothing helped. What else can possibly go wrong in never Java version ?
After some research, we found the solution to our problem. Adding below JVM argument fixed the issue.
-Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true
Below are article has good info on the issue.
https://issues.apache.org/jira/browse/CXF-2939
Hope this helps.

private bytes increase for a javaw process in java 8

My project has started using java 8 from java 7.
After switching to java 8, we are seeing issues like the memory consumed is getting higher with time.
Here are the investigations that we have done :
Issues comes only after migrating from java7 and from java8
As metaspace is the only thing related to memory which is changes from hava 7 to java 8. We monitored metaspace and this does not grow more then 20 MB.
Heap also remains consistent.
Now the only path left is to analyze how the memory gets distributes to process in java 7 and java 8, specifically private byte memory. Any thoughts or links here would be appreciated.
NOTE: this javaw application is a swing based application.
UPDATE 1 : After analyzing the native memory with NMT tool and generated a diff of memory occupied as compare to baseline. We found that the heap remained same but threads are leaking all this memory. So as no change in Heap, I am assuming that this leak is because of native code.
So challenge remains still open. Any thoughts on how to analyze the memory occupied by all the threads will be helpful here.
Below are the snapshots taken from native memory tracking.
In this pic, you can see that 88 MB got increased in threads. Where arena and resource handle count had increased a lot.
in this picture you can see that 73 MB had increased in this Malloc. But no method name is shown here.
So please throw some info in understanding these 2 screenshot.
You may try another GC implementation like G1 introduced in Java 7 and probably the default GC in Java 9. To do so just launch your Java apps with:
-XX:+UseG1GC
There's also an interesting functionality with G1 GC in Java 8u20 that can look for duplicated Strings in the heap and "deduplicate" them (this only works if you activate G1, not with the default Java 8's GC).
-XX:+UseStringDeduplication
Be aware to test thoroughly your system before going to production with such a change!!!
Here you can find a nice description of the diferent GCs you can use
I encountered the exact same issue.
Heap usage constant, only metaspace increase, NMT diffs showed a slow but steady leak in the memory used by threads specifically in the arena allocation. I had tried to fix it by setting the MALLOC_ARENAS_MAX=1 env var but that was not fruitful. Profiling native memory allocation with jemalloc/jeprof showed no leakage that could be attributed to client code, pointing instead to a JDK issue as the only smoking gun there was the memory leak due to malloc calls which, in theory, should be from JVM code.
Like you, I found that upgrading the JDK fixed the problem. The reason I am posting an answer here is because I know the reason it fixes the issue - it's a JDK bug that was fixed in JDK8 u152: https://bugs.openjdk.java.net/browse/JDK-8164293
The bug report mentions Class/malloc increase, not Thread/arena, but a bit further down one of the comments clarifies that the bug reproduction clearly shows increase in Thread/arena.
consider optimising the JVM options
Parallel Collector(throughput collector)
-XX:+UseParallelGC
concurrent collectors (low-latency collectors)
-XX:+UseConcMarkSweepGC
use String Duplicates remover
-XX:+UseStringDeduplication
optimise compact ratio
-XXcompactRatio:
and refer
link1
link2
In this my answer you can see information and references how to profile native memory of JVM to find memory leaks. Shortly, see this.
UPDATE
Did you use -XX:NativeMemoryTracking=detail option? The results are straightforward, they show that the most memory allocated by malloc. :) It's a little bit obviously. Your next step is to profile your application. To analyze native methods and Java I use (and we use on production) flame graphs with perf_events. Look at this blog post for a good start.
Note, that your memory increased for threads, likely your threads grow in application. Before perf I recommend analyze thread dumps before/after to check does Java threads number grow and why. Thread dumps you can get with jstack/jvisualvm/jmc, etc.
This issue does not come with Java 8 update 152. The exact root cause of why it was coming with earlier versions is still not clearly identified.

OpenMP provides 13.5 times improvement on Visual Studio 2010 but nothing on Unix

I'm a new OpenMP user and have parallelized code that runs 13.5 times faster (14 threads) on Visual Studio 2010 (Windows 7 Ultimate x64). The performance on CentOS 5.8 x64 (gcc 4.1.2) or SUSE x64 (gcc 4.5.1) is zip. I've verified that multiple threads are being used. Is there some system flag or option I need to turn on? Yes, OMP_NUM_THREADS is in the environment and set to 8. The CentOS machine is a dual xenon processor.
With 8 cores, it sounds nearly impossible to have a 13.5 times speedup, whatever the number of threads you use.
I suspect your measurement is wrong. How do you measure performance ?
In Unix, the command "time ./myprogram" will return 3 different times. The "real" time is the time your are interested in, while the "user" time is the cpu time (sum of times spent on each core).
In windows I have no idea, but I guess you find a "user" time which is 13.5 times larger than the "cpu" time, which does not say anything about speedup, but rather that all your 14 threads are in use.
You should rather compare the "real" time between a single thread program and an openmp program.
Did you use the correct compiler / linker switches, eg -fopenmp -lgomp ? Maybe try a simple example from the OpenMP docs first to prove you have the setup right.

Resources