JMETER is very slow after a few hours - multithreading

I'm using JMeter 3.1.1 to run a load test. My test plan is with 40 threads and each thread executes 6 HTTP Requests. It is running fine for the first few hours with a latency of around 20ms.
After few hours, latency grows up to 500ms. I verified that server is processing fine. Also, I have no 'Listeners' in my test plan and I run it in NonUI mode.
Also it seems that the thread group is executing only one thread at time. Coz I see hardly one or two requests being executed by thread group per second.
Im really clueless what to suspect. Any help would be greatly appreciated.
BTW., memory and CPU consumption are normal.
About my TestPlan:
Total Thread Groups:4
1. Setup Thread Group
2. Load test thread group with 40 threads
(Action To be taken after error :Continue
Ramp-Up period: 0
Number of Threads: 40
Loop Count: Forever)
2.1 Counter
2.2 Random Variable
2.3 User Defined Variables
2.4 If Condition = true
- 2.4.1 HTTP Request1
- 2.4.2 HTTP Request2
- 2.4.2 Loop for 5 times
-- 2.4.2.1 HTTP Request1
-- 2.4.2.2 HTTP Request2
3. Introspection thread group with 1 thread
4. Tear Down thread group
Please let me know, if more details are needed
Another observation is:
Server is having TIME_WAITs : 4418 (I check 'Use keep-alive' option for HTTP Request., still so many TIME_WAITs)
Latest Observations(Thanks to one and all for your valuable comments)
Actually, the memory must be an issue., I have given already like this.,
-Xms512m -Xmx2048m -XX:NewSize=512m -XX:MaxNewSize=2048m
But I really wonder why JVM was not going 512 MB. So i tried with both Xms ans Xmx with 2g each. Now its kind of running for more longer. However it' performance is slowing down still. May be my Beanshell Post Processors are consuming all the memory. I really wonder why they are not releasing the memory. If you see., per hour how the performance is degraded.
Hour #Requests sent
---- --------------
Hour 1: 1471917
Hour 2: 1084182 (Seems all 2g heap is used up by this time)
Hour 3: 705471
Hour 4: 442912
Hour 5: 255826
Hour 6: 136292
I read that Beanshell hogs memory, but I have no choice but to use it as I have to use a third party jar with in the sampler to make few java calls. I'm not sure if I can do the same using JSR223(Groovy) or any other better performing sampler (pre / post processor)

as you have seen the Heap settings I did., here are the Memory and CPU Utilization of Jmeter. Im running 100 threads now. What should I do in my test plan to reduce the CPU utilization. I have a 100ms sleep after every 4 HTTP Requests.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11850 xxx 20 0 7776m 2.1g 4744 S 678.2 27.4 6421:50 java
%CPU: 678.2 (Fluctuating between 99% - 700%)
MEM: 2.1g (Xmx = 2g)

1) Do you run the JMeter with standard script (jmeter/jmeter.bat)?
Then mind that the default size for JVM heap in there is capped at 512M. Consider increasing it, at least at the maximum end (means, change default -Xmx512m).
Next thing to consider is the -XX:NewSize=128m -XX:MaxNewSize=128m default values.
Here's what Oracle suggesting:
In heavy throughput environments, you should consider using this
option to increase the size of the JVM young generation. By default,
the young generation is quite small, and high throughput scenarios can
result in a large amount of generated garbage. This garbage
collection, in turn, causes the JVM to inadvertently promote
short-lived objects into the old generation.
So try to play with this parameters, that may help.
P.S. Aren't you, by chance, running it at AWS EC2 instance? If yes - what's the instance type?

Thanks to all, who all tried to help me.
However., I could resolve this.,
Culprit is : If Controller, which is evaluating the condition for every iteration of every thread. Sounds quite normal isn't it? The problem is., condition is evaluation is JavaScript based. So all threads are eating CPU and Memory for invoking JavaScript.
Now Im getting consistent requests to server and JMeter is also almost stable at 1.9g of memory for 100 threads.
Im posting this just in case any one can get benefited without wasting day and nights figuring out the issue :)

Related

Why does the response time curve of NodeJS API become sinus like under load?

I am currently performing an API Load Test on my NodeJS API using JMeter and am completely new to the field. The API is deployed on an IBM Virtual Server with 4 vCPUs and 8GB of RAM.
One of my load tests includes stress testing the API in a 2500 thread (users) configuration with a ramp-up period of 2700ms (45 min) on infinite loop. The goal is not to reach 2500 threads but rather to see at what point my API would throw its first error.
I am only testing one endpoint on my API, which performs a bubble sort to simulate a CPU intensive task. Using Matplotlib I plotted the results of the experiment. I plotted the response time in ms over the active threads.
I am unsure why the response time curve becomes sinus like once crossing roughly 1100 Threads. I expected the response time curve keep rising in the same manner it does in the beginning (0 - 1100 threads). Is there an explanation for the sinus like behaviour of the curve towards the end?
Thank you!
Graph:
Red - Errors
Blue - Response time
There could be 2 possible reasons for this:
Your application cannot handle such a big load and performs frequent garbage collection in order to free up resources or tasks are queuing up as application cannot process them as they come. You can try using i.e. JMeter PerfMon Plugin to ensure that the system under test doesn't lack CPU or RAM
JMeter by default comes up with relatively low JVM Heap size and a very little GC tuning (like it's described in Concurrent, High Throughput Performance Testing with JMeter article where the guy has very similar symptoms) so it might be the case JMeter cannot send requests fast enough, make sure to follow JMeter Best Practices and consider going for distributed testing if needed.

Jmeter Duration exceeded though is fixed

I tried to run Thread group with below details:
Number of thread: 50
Ramp-up period: 120
Duration : 300 s
Loop count: 1 (I create a Loop Controller under Thread Group -infinite loop count box unchecked)
HEAP=-Xms1024m -Xmx1024m
and I set a Constant Timer for (5000ms) as think time after the request
The Problem I have ,is the test has exceeded the duration time....and I don't know when it will finish (the command did not show the prompt)
this is the Jmeter log:
09:15:40,486 INFO o.a.j.t.JMeterThread: Stopping because end time detected by thread: Thread Group 1-46
09:15:40,486 INFO o.a.j.t.JMeterThread: Thread finished: Thread Group 1-46
what did I wrong on this scenario?
is there any idea how to fix it....(Link for screenshot):
https://imgur.com/a/We3PfO9
Your Loop Controller does nothing, it controls how many times its children will be executed and since it doesn't have any children it doesn't make any sense.
Your 2 log lines are not very informative, if you want to know for sure where JMeter is stuck you need to take a thread dump and see what exactly the threads are doing. A "blind shot": at least one of your HTTP Request samplers fails to get response in 5 minutes as the application doesn't respond or there is another transport-layer problem. Consider setting reasonable connect and response timeouts at the "Advanced" tab of the HTTP Request sampler (or even better use HTTP Request Defaults)
Your heap size might be too low, consider monitoring heap space usage and garbage collector activity using i.e. JVisualVM
Make sure to follow JMeter Best Practices

ArangoDB Java Batch mode insert performance

I'm using ArangoDb 3.0.5 with arangodb-java-driver 3.0.1. ArangoDB is running on a 3.5ghz i7 with 24gb ram and an ssd.
Loading some simple Vertex data from Apache Flink seems to be going very slowly, at roughly 1000 vertices/sec. Task Manager shows it is CPU bound on the ArangoDB process.
My connector is calling startBatchMode, iterating through 500 calls to graphCreateVertex (with wait for sync set to false), and then calling executeBatch.
System resources in the management interface shows roughly 15000 (per sec?) while the load is running, and used CPU time fixed to 1 for user time. I'm new to ArangoDB and am not sure how to profile what is going on. Any help much appreciated!
Rob
Your performance result is the expected behavior. The point with batchMode is, that all of you 500 calls are send in one and executed on the server in only one thread.
To gain better performance, you can use more than one thread in your client for creating your vertices. More requests in parallel will allow the server to use more than one thread.
Also you can use createDocument instead of graphCreateVertex. This avoids consistency checks on the graph, but is a lot faster.
If you don't need these checks, you can also use importDocuments instead of batchMode + createDocument which is even faster.

Performance issue while using Parallel.foreach() with MaximumDegreeOfParallelism set as ProcessorCount

I wanted to process records from a database concurrently and within minimum time. So I thought of using parallel.foreach() loop to process the records with the value of MaximumDegreeOfParallelism set as ProcessorCount.
ParallelOptions po = new ParallelOptions
{
};
po.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(listUsers, po, (user) =>
{
//Parallel processing
ProcessEachUser(user);
});
But to my surprise, the CPU utilization was not even close to 20%. When I dig into the issue and read the MSDN article on this(http://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx), I tried using a specific value of MaximumDegreeOfParallelism as -1. As said in the article thet this value removes the limit on the number of concurrently running processes, the performance of my program improved to a high extent.
But that also doesn't met my requirement for the maximum time taken to process all the records in the database. So I further analyzed it more and found that there are two terms as MinThreads and MaxThreads in the threadpool. By default the values of Min Thread and MaxThread are 10 and 1000 respectively. And on start only 10 threads are created and this number keeps on increasing to a max of 1000 with every new user unless a previous thread has finished its execution.
So I set the initial value of MinThread to 900 in place of 10 using
System.Threading.ThreadPool.SetMinThreads(100, 100);
so that just from the start only minimum of 900 threads are created and thought that it will improve the performance significantly. This did create 900 threads, but it also increased the number of failure on processing each user very much. So I did not achieve much using this logic. So I changed the value of MinThreads to 100 only and found that the performance was much better now.
But I wanted to improve more as my requirement of time boundation was still not met as it was still exceeding the time limit to process all the records. As you may think I was using all the best possible things to get the maximum performance in parallel processing, I was also thinking the same.
But to meet the time limit I thought of giving a shot in the dark. Now I created two different executable files(Slaves) in place of only one and assigned them each half of the users from DB. Both the executable were doing the same thing and were executing concurrently. I created another Master program to start these two Slaves at the same time.
To my surprise, it reduced the time taken to process all the records nearly to the half.
Now my question is as simple as that I do not understand the logic behind Master Slave thing giving better performance compared to a single EXE with all the logic same in both the Slaves and the previous EXE. So I would highly appreciate if someone will explain his in detail.
But to my surprise, the CPU utilization was not even close to 20%.
…
It uses the Http Requests to some Web API's hosted in other networks.
This means that CPU utilization is entirely the wrong thing to look at. When using the network, it's your network connection that's going to be the limiting factor, or possibly some network-related limit, certainly not CPU.
Now I created two different executable files … To my surprise, it reduced the time taken to process all the records nearly to the half.
This points to an artificial, per process limit, most likely ServicePointManager.DefaultConnectionLimit. Try setting it to a larger value than the default at the start of your program and see if it helps.

Why does scala.io.Source use all cores?

I notice that the following code using multiple threads and keep all CPU cores busy about 100% while it is reading the file.
scala.io.Source.fromFile("huge_file.txt").toList
and I assume the following is the same
scala.io.Source.fromFile("huge_file.txt").foreach
I interrupt this code as a unit test under Eclipse debugger on my dev machine (OS X 10.9.2) and showing these threads: main, ReaderThread, 3 Daemon System Thread. htop shows all threads are busy if I run this in a scala console in a 24-cores server machine (ubuntu 12).
Questions:
How do I limit this code on using N number of threads?
For the sake of understanding the system performance, can you explain to me what, why and how this is done in io.Source? Reading the source doesn't helping.
I assume each line is read in sequence; however, since it is using multiple threads, so is the foreach run in multiple threads? My debugger seems to tell me that the code still run in the main thread.
Any insight would be appreciated.
As suggested, I put my findings here.
I use the following to test my dummy code with and without -J-XX:+UseSerialGC option
$ scala -J-XX:+UseSerialGC
scala> var c = 0
scala> scala.io.Source.fromFile("huge_file.txt").foreach(e => c += e)
Before I use the option, all 24 cores in my server machine are busy during the file read. After the option, only two threads are busy.
Here is the memory profile I captured on my dev machine, not server. I first perform the GC to get the baseline, then I run the above code several times. The Eden Space got clean up periodically. The memory swing is about 20M, while the smaller file I read is about 200M i.e. io.Source creates 10% of temporary objects per each run.
This characteristics will create trouble in a shared system. This will also limit us to handle multiple big files all at once. This stresses memory, i/o and CPU usage in a way that I can't run my code with other production jobs, but run it separately to avoid having this system impact.
If you know a better way or suggestion to handle this situation in a real shared production environment, please let me know.

Resources