JVM Major Garbage Collections Not Running for months - garbage-collection

My application garbage collector used to run a major frequently, maybe once a day. But it stopped working suddenly. Now it has reached to 90 % and I had to restart the application few times.
This is in a production environment and what I allowed to do is read the logs and see the JVM states via provided UI.
Another observation I made was for the last 3 months, 1st 2 months there was no minor garbage collections but a lot of majors. For the last month no major collections but many minor collections.

Perhaps it never does a major collection because you are restarting the application before it gets a chance.
You should be getting many minor collections if the young space is a reasonable size.
If you were only getting major collections most likely your JVM wasn't tuned correctly. I would try to remove as many GC tuning parameters as possible and only add each one if you know it helps. Having too many tuning parameters set is a good way to get strange behaviour.

Related

JMtere: What can be the reason of sudden spike in Response time graph which then decreased to run consistently?

As seen in the above graphs, Graph no. 1 is the Response time graph and it showed a sudden spike in the middle of the test. But then it seems to be running consistently.
On the other hand, the throughput graph, Graph no. 2, showed a down spike but not a sudden spike, it gradually decreased. Also, I got two different throughput values, before and after a down spike.
I first thought it to be a memory issue, but then it should have affected response time as well.
Could anyone help me in knowing the reason behind the sudden spike in the Response Time graph?
And also what could be the possible bottleneck if not memory leakage issue?
Unfortunately these 2 charts don't tell the full story and not knowing your application details technology stack it's quite hard to suggest anything meaningful.
A couple of possible reasons could be:
Your application is capable of autoscaling so when the load reaches certain threshold it either adds more resources or kicks off another node of the cluster
Your application is going i.e. Garbage Collection as its heap is busy with stale objects and once the collection is done it starts working at full speed again. You might want to run a Soak Test to see whether the pattern repeats or not
Going forward consider collecting information on what's going on at your application under test side using i.e. JMeter PerfMon Plugin or SSHMon Listener

How to track java Object during garbage collection

We recently encountered the problem of too frequent fullgc, which made us very confused. It was observed that a large number of objects lived through younggc 15 times while processing the request, and can be collected during fullgc.
The question is how can we find these objects that can be recycled by fullgc but not by younggc? We need to use this as a point to locate the corresponding business code. I checked many documents and found no way to track these objects.
this is observed using jstat -gcold and print every second.
jstat

Performance issue while using Parallel.foreach() with MaximumDegreeOfParallelism set as ProcessorCount

I wanted to process records from a database concurrently and within minimum time. So I thought of using parallel.foreach() loop to process the records with the value of MaximumDegreeOfParallelism set as ProcessorCount.
ParallelOptions po = new ParallelOptions
{
};
po.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(listUsers, po, (user) =>
{
//Parallel processing
ProcessEachUser(user);
});
But to my surprise, the CPU utilization was not even close to 20%. When I dig into the issue and read the MSDN article on this(http://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx), I tried using a specific value of MaximumDegreeOfParallelism as -1. As said in the article thet this value removes the limit on the number of concurrently running processes, the performance of my program improved to a high extent.
But that also doesn't met my requirement for the maximum time taken to process all the records in the database. So I further analyzed it more and found that there are two terms as MinThreads and MaxThreads in the threadpool. By default the values of Min Thread and MaxThread are 10 and 1000 respectively. And on start only 10 threads are created and this number keeps on increasing to a max of 1000 with every new user unless a previous thread has finished its execution.
So I set the initial value of MinThread to 900 in place of 10 using
System.Threading.ThreadPool.SetMinThreads(100, 100);
so that just from the start only minimum of 900 threads are created and thought that it will improve the performance significantly. This did create 900 threads, but it also increased the number of failure on processing each user very much. So I did not achieve much using this logic. So I changed the value of MinThreads to 100 only and found that the performance was much better now.
But I wanted to improve more as my requirement of time boundation was still not met as it was still exceeding the time limit to process all the records. As you may think I was using all the best possible things to get the maximum performance in parallel processing, I was also thinking the same.
But to meet the time limit I thought of giving a shot in the dark. Now I created two different executable files(Slaves) in place of only one and assigned them each half of the users from DB. Both the executable were doing the same thing and were executing concurrently. I created another Master program to start these two Slaves at the same time.
To my surprise, it reduced the time taken to process all the records nearly to the half.
Now my question is as simple as that I do not understand the logic behind Master Slave thing giving better performance compared to a single EXE with all the logic same in both the Slaves and the previous EXE. So I would highly appreciate if someone will explain his in detail.
But to my surprise, the CPU utilization was not even close to 20%.
…
It uses the Http Requests to some Web API's hosted in other networks.
This means that CPU utilization is entirely the wrong thing to look at. When using the network, it's your network connection that's going to be the limiting factor, or possibly some network-related limit, certainly not CPU.
Now I created two different executable files … To my surprise, it reduced the time taken to process all the records nearly to the half.
This points to an artificial, per process limit, most likely ServicePointManager.DefaultConnectionLimit. Try setting it to a larger value than the default at the start of your program and see if it helps.

Slow performance in Mongoose after upgrading from 3.5.7 to 3.8.4

I changed the version number of mongoose from 3.5.7 to 3.8.4 and performance took a huge hit in an import process. This process reads lines from a file and populates an empty database (no indexes) with about 2.5 million rows.
This is the only change I've made; just upgrading the version. I can switch back and forth and see the difference in performance.
The performance hits are: 1) The process pegs at 100% CPU, where before it ran upwards of maybe 25% or so. 2) Entry into the database into the database is slow. 3.5.7 inserted about 10K records every 20 seconds while 3.8.4 seems to be inserting at more around 1 per second. 3) nodejs seems to "disappear" into something CPU intensive and all other I/O functions get blocked (http requests, etc.); previously the system remained very responsive.
It's hard to isolate the code, but roughly here's what's happening:
Read a line from a file
Parse it
Run a query to see if the record already exists
Insert/update a record with the values from the line read
Write the existing record to an already-open file stream
Continue with the next line in the file
At a guess, I would say it's related to a change in how requests are throttled either in the underlying driver that mongoose depends on or mongoose itself. My first thought was to try and handle the case where requests are getting queued up and put a pause on the file read. This works really well when writing the results of a query (pausing the querystream when the file starts caching writes, then resuming on drain). But I haven't been able to find where mongoose might be emitting information about its back-pressure.
The reason I upgraded in the first place is because of a memory leak error I was getting when setting an event handled in mongoose that I had read was fixed (sorry I lost reference to that).
I'd like to stay upgraded and figure out the issue. Any thoughts on what it might be? Is there an event given somewhere that notifies me of back-pressure so I can pause/resume the input stream?
I solved this by simply reverting back to 3.5.7 and solving the original memory leak warnings another way.
I attempted to isolate my code, but the worst issue I was able to raise was high memory consumption (which I resolved by nulling objects, which apparently helps the GC figure out what it can collect). I started adding in unrelated code, but at some point it became clear that the issue wasn't with mongoose or the mongodb driver itself.
My only guess on what really causes the performance issue when I upgrade mongoose is that some library depended on by mongoose introduced a change that my non-mongoose-related code isn't playing well with.
If I ever get to the bottom of it, I'll come back and post a clearer answer.

LoadRunner and the need for pacing

Running a single script with only two users as a single scenario without any pacing, just think time set to 3 seconds and random (50%-150%) I experience that the web app server runs of of memory after 10 minutes every time (I have run the test several times, and it happens at the same time every time).
First I thouhgt this was a memory leak in the application, but after some thought I figured it might have to do with the scenario design.
The entire script having just one action including log in and log out within the only action block takes about 50 seconds to run and I have the default as soon as the previous iteration ends set not the with delay after the previous iteration ends or fixed/random intervalls set.
Could not using fixed/random intervalls cause this "memory leak" to happen? I guess non of the settings mentioned would actually start a new iteration before the one before ends, this obvioulsy leading to accumulation of memory on the server resulting in this "memory leak". But with no pacing set is there a risk for this to happen?
And having no iterations in my script, could I still be using pacing?
To answer your last question: NO.
Pacing is explicitly used when a new iteration starts. The iteration start is delayed according to pacing settings.
Speculation/Conclusions:
If the web-server really runs out of memory after 10 minutes, and you only have 2 vu's, you have a problem on the web-server side. One could manually achieve this 2vu load and crash the web-server. The pacing in the scripts, or manual user speeds are irrelevant. If the web-server can be crashed from remote, it has bugs that need fixing.
Suggestion:
Try running the scenario with 4 users. Do you get OUT OF MEMORY on the web-server after 5 mins?
If there really is a leak, your script/scenario shouldn't be causing it, but I would think that you could potentially cause it to appear to be a problem sooner depending on how you run it.
For example, let's say with 5 users and reasonable pacing and think times, the server doesn't die for 16 hours. But with 50 users it dies in 2 hours. You haven't caused the problem, just exposed it sooner.
i hope its web server problem.pacing is nothing but a time gap in between iterations,it's not effect actions or transactions in your script

Resources