JMtere: What can be the reason of sudden spike in Response time graph which then decreased to run consistently? - performance-testing

As seen in the above graphs, Graph no. 1 is the Response time graph and it showed a sudden spike in the middle of the test. But then it seems to be running consistently.
On the other hand, the throughput graph, Graph no. 2, showed a down spike but not a sudden spike, it gradually decreased. Also, I got two different throughput values, before and after a down spike.
I first thought it to be a memory issue, but then it should have affected response time as well.
Could anyone help me in knowing the reason behind the sudden spike in the Response Time graph?
And also what could be the possible bottleneck if not memory leakage issue?

Unfortunately these 2 charts don't tell the full story and not knowing your application details technology stack it's quite hard to suggest anything meaningful.
A couple of possible reasons could be:
Your application is capable of autoscaling so when the load reaches certain threshold it either adds more resources or kicks off another node of the cluster
Your application is going i.e. Garbage Collection as its heap is busy with stale objects and once the collection is done it starts working at full speed again. You might want to run a Soak Test to see whether the pattern repeats or not
Going forward consider collecting information on what's going on at your application under test side using i.e. JMeter PerfMon Plugin or SSHMon Listener

Related

Why does the response time curve of NodeJS API become sinus like under load?

I am currently performing an API Load Test on my NodeJS API using JMeter and am completely new to the field. The API is deployed on an IBM Virtual Server with 4 vCPUs and 8GB of RAM.
One of my load tests includes stress testing the API in a 2500 thread (users) configuration with a ramp-up period of 2700ms (45 min) on infinite loop. The goal is not to reach 2500 threads but rather to see at what point my API would throw its first error.
I am only testing one endpoint on my API, which performs a bubble sort to simulate a CPU intensive task. Using Matplotlib I plotted the results of the experiment. I plotted the response time in ms over the active threads.
I am unsure why the response time curve becomes sinus like once crossing roughly 1100 Threads. I expected the response time curve keep rising in the same manner it does in the beginning (0 - 1100 threads). Is there an explanation for the sinus like behaviour of the curve towards the end?
Thank you!
Graph:
Red - Errors
Blue - Response time
There could be 2 possible reasons for this:
Your application cannot handle such a big load and performs frequent garbage collection in order to free up resources or tasks are queuing up as application cannot process them as they come. You can try using i.e. JMeter PerfMon Plugin to ensure that the system under test doesn't lack CPU or RAM
JMeter by default comes up with relatively low JVM Heap size and a very little GC tuning (like it's described in Concurrent, High Throughput Performance Testing with JMeter article where the guy has very similar symptoms) so it might be the case JMeter cannot send requests fast enough, make sure to follow JMeter Best Practices and consider going for distributed testing if needed.

Node.Js - multiple workers on 1 cpu? (Cluster)

The situation is as following:
I have a node.js server with a script which takes pretty long before it finishes.
The script is getting an ID, looks up in a database which pictures belongs to this ID, and then it cache's the images and once all images are cached, it finishes.
Now the problem is that its possible there are 2 or more people at the same time using this feature. And once there are multiple people trying to get all these images, the images are combined to eachother and person A gets the pictures of person A + B. and also person B gets the pictures of A+B.
Now i know that a worker require's 1 cpu. i edited this so i can have multiple workers on 1 CPU. But they only switch from workers when the CPU usage is really high.
I want to switch workers when someone is already busy with getting these images, and someone else is trying to also get his/her images. (which are different for every person.)
How can this be done? Because the cluster only switches workers when the CPU usage is high. Or did i understand this incorrectly?
The clustering is not made for that.
You use clusters to avoid situations where one core is 100% busy while other cores are barely doing anything - like this:
You have a problem with improperly handling concurrent requests in your code and clustering will not solve that. Even if you have a cluster of 1000 workers there can still be situation when you get 1001 requests and all bets are off.
Working with Node you always have to take into account concurrency because if you don't you will not be able to use a simple solution like add clustering to solve the problems.
You didn't show even a single line of code so it's impossible to tell you what's wrong with it, but there is clearly a problem with improper request handling. Maybe you use global variables? Maybe you store some state in the wrong scope? The situation that you describe should never happen in any Node application, and the solution you're asking about would not solve it anyway. You need to fix your code.

Is reproducible benchmarking possible?

I need to test some node frameworks, or at least their routing part. That means from the request arrives at the node process for processing until a route has been decided and a function/class with the business logic is called, e.g. just before calling it. I have looked hard and long for a suitable approach, but concluded that it must be done directly in the code and not using an external benchmark tool. I fear measuring the wrong attributes. I tried artillery and ab but they measure a lot more attributes then I want to measure, like RTT, bad OS scheduling, random tasks executing in the OS and so on. My initial benchmarks for my custom routing code using process.hrtime() shows approx. 0.220 ms (220 microseconds) execution time but the external measure shows 0.700 (700 microseconds) which is not an acceptable difference, since it's 3.18x additional time. Sometimes execution time jumps to 1.x seconds due to GC or system tasks. Now I wonder how a reproducible approach would look like? Maybe like this:
Use Docker with Scientific Linux to get a somewhat controlled environment.
A minimal docker container install, node enabled container only, no extras.
Store time results in global scope until test is done and then save to disk.
Terminate all applications with high/moderate diskIO and/or CPU on host OS.
Measure time as explained before and crossing my fingers.
Any other recommendations to take into consideration?

Performance issue while using Parallel.foreach() with MaximumDegreeOfParallelism set as ProcessorCount

I wanted to process records from a database concurrently and within minimum time. So I thought of using parallel.foreach() loop to process the records with the value of MaximumDegreeOfParallelism set as ProcessorCount.
ParallelOptions po = new ParallelOptions
{
};
po.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(listUsers, po, (user) =>
{
//Parallel processing
ProcessEachUser(user);
});
But to my surprise, the CPU utilization was not even close to 20%. When I dig into the issue and read the MSDN article on this(http://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx), I tried using a specific value of MaximumDegreeOfParallelism as -1. As said in the article thet this value removes the limit on the number of concurrently running processes, the performance of my program improved to a high extent.
But that also doesn't met my requirement for the maximum time taken to process all the records in the database. So I further analyzed it more and found that there are two terms as MinThreads and MaxThreads in the threadpool. By default the values of Min Thread and MaxThread are 10 and 1000 respectively. And on start only 10 threads are created and this number keeps on increasing to a max of 1000 with every new user unless a previous thread has finished its execution.
So I set the initial value of MinThread to 900 in place of 10 using
System.Threading.ThreadPool.SetMinThreads(100, 100);
so that just from the start only minimum of 900 threads are created and thought that it will improve the performance significantly. This did create 900 threads, but it also increased the number of failure on processing each user very much. So I did not achieve much using this logic. So I changed the value of MinThreads to 100 only and found that the performance was much better now.
But I wanted to improve more as my requirement of time boundation was still not met as it was still exceeding the time limit to process all the records. As you may think I was using all the best possible things to get the maximum performance in parallel processing, I was also thinking the same.
But to meet the time limit I thought of giving a shot in the dark. Now I created two different executable files(Slaves) in place of only one and assigned them each half of the users from DB. Both the executable were doing the same thing and were executing concurrently. I created another Master program to start these two Slaves at the same time.
To my surprise, it reduced the time taken to process all the records nearly to the half.
Now my question is as simple as that I do not understand the logic behind Master Slave thing giving better performance compared to a single EXE with all the logic same in both the Slaves and the previous EXE. So I would highly appreciate if someone will explain his in detail.
But to my surprise, the CPU utilization was not even close to 20%.
…
It uses the Http Requests to some Web API's hosted in other networks.
This means that CPU utilization is entirely the wrong thing to look at. When using the network, it's your network connection that's going to be the limiting factor, or possibly some network-related limit, certainly not CPU.
Now I created two different executable files … To my surprise, it reduced the time taken to process all the records nearly to the half.
This points to an artificial, per process limit, most likely ServicePointManager.DefaultConnectionLimit. Try setting it to a larger value than the default at the start of your program and see if it helps.

Too much data (at a time) for Core Data?

My iPhone app uses core data and things are fine for most part. But here is a problem:
after a certain amount of data, it stalls at first time execution (where core data entities must be loaded).
Some experimenting showed that things are OK up to a certain amount of data loaded in Core Data at start.
If I go over a critical amount the installation starts failing. The bigger the amount of data for start, the higher the probability that it fails.
By making separate tests I made sure the data themselves are not faulty.
I also can say this problem does not appear in the simulator.
It also does not happen when I connect the debugger to the device.
It looks like too much data loaded in core data in a short amount of time creates some kind of overload.
Is that true? Any idea on a possible solution?
At this point I made up a partial solution using a UIActionSheet object to kill some time (asking the user to push a button). But this is not very satisfactory, though for the time being it works.
Any comment or advice for a better way would be appreciated.
It is not quite clear what do you mean by "it fails".
However if you are using SQLite, by loading into CoreData, if you mean "create and save" entities at start up to populate CoreData, then remember to not call [managedObjectContext save...] only at the end especially with large amount of data, but create and save a reasonable set of NSManagedObject.
Otherwise, if you mean you have large amount of data that are retrieved as NSManagedObject, probably loaded into a UITableView consider using some kind of NSOperation for asynchronous loading.
If those two cases doesn't apply to you just tell us the error you are getting, or what you mean by "fails" os "stalls".

Resources