i have successfully run the load test but i am getting too much threshold violation, what is the reason for it And how to reduce the threshold violation.
Threshold violation say that the named computer(s) are close to being overloaded.
If the named computer(s) are the computers running the tests then you should consider spreading the load using controllers and agents, or using more powerful computers. In this case the load test may be reporting, or close to reporting, incorrect results because it is on the verge of being overloaded.
If the named computer(s) are the servers in the system being tested then the violations show that the servers are close to their limit and will not support many more virtual users. In this case the performance test has found a limitation of the system under test.
Related
I have performed some performance tests on WSO2 APIM on both WebServices (WSDL) and Gateway interfaces. Everything went good on the gateway one, however I am facing an odd behavior when using the WebServices one.
Basically I created a test that add, change password and delete a user and run a test plan using 64 threads. At the very beggining my throughput increases a lot up until reach all 64 threads (throughput peak was 1600 req/seg). However, after that the throughput start to decrease with no reason.
All 64 threads are still active and running, and the machine hosting the wso2am reduce CPU usage. It seems that APIM is given up of handling the request even though it has threads and processors for that.
The picture below shows the vmstat result for processor (user, system and idle) and the context switch and interruptions. It is possible to cpu/context switch follows the throughput.
And the next picture illustrate the jmeter test result after at the end (after decrease throughput).
Basically what I need is a clue on what may be the reason for such behavior. I have already tried to increase the pool of threads on both wso2am and tomcat, however it has no effect. It is like the requests were not arriving at all. Even though jmeter is full of power and had already send a bigger throughput before.
I would bet that a simple configuration on tomcat or wso2 is the answer for that. Any help is appreciate.
Thanks and Regards
It may be due to JMeter not being able to send the requests fast enough, try the following steps:
Upgrade JMeter to the latest version (3.1 as of now), you can get the most recent JMeter distribution from JMeter download page
Run your test in command-line non-GUI mode. JMeter GUI can be used for tests development and/or debugging only, it is not designed for running load tests.
Remove (or disable) all the listeners during test execution. Later on you can open JMeter GUI, add the listener of your choice, load .jtl results file and perform analysis or create an HTML Reporting Dashboard out of results file
See 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure article for above points explained in details and few more tips on configuring JMeter for maximum performance and throughput
I have a requirement to write a load test measuring message transmission latencies. In order to simulate a large number of simultaneous uses without running into thread contention problem on one box, I'm spinning up multiple servers in Azure.
When I got my first results back, I was a little shocked to see that the results indicated the message was received before it was sent. I immediately realized that, while I had an implicit assumption that all the VMs would have their clocks synced to within milliseconds, that was clearly not the case.
I've spent several hours googling ways to resolve this, and I'm not getting anywhere. One thought was to have each VM query the time on a central server using NetRemoteTOD() using a technique similar to this NetRemoteTOD, and then establish a per-machine correction factor to be added to the time measured from the local machine's clock. However when I tried to run that method, I got a error 2184, "The service has not been started" I have verified that both the RPC service and the Windows Time service are running on the both the client and target machines, and I have not been successful in finding any information indicating what other service needs to be running (or even if the error really means what it seems to mean). (I also get the same error when running between my development desktop and a server on our corporate network. However, I can run it successfully to a PDC on the corporate network - but I can't find a PDC on Azure, since neither machine is part of a domain.)
So, does any one have either any information on what service needs to be started to get NetRemoteTOD (or the windows NET TIME command, which relies on NetRemoteTOD under the covers) working. Alternatively, does anyone have a suggestion for some other technique to get a consistent time reference across multiple VMs in Azure? (Note, I don't necessarily need their clocks synced, I just need a way to establish a consistent correction factor to reference the times to a common source. Note also, I need sub-second accuracy - probably about 100 msec will do.) Basically, I just need a windows function or shell command that will get me the time to sub-second accuracy on a given remote server.
Thanks in advance.
PS. Azure servers are running Server 2008 R2 SP1
I'm trying to run some load tests for a web application hosted on IIS using Visual Studio 2012.
Testing for several hundred users works fine (but isn't too helpful).
When I try raising the number to 1000+, the test fail;
But not because the website can't handle the load --- it's because my computer won't handle it!
Is there any way to test for large amounts of users, without crashing my own computer?
Distributed load tests are a better bet in several ways. In addition to not overwhelming your computer, they better simulate traffic from different locations, whether around the country or around the world. You get a better picture of how actual traffic can affect your site response.
I haven't done load testing from Visual Studio, but it should be possible to hit your site (assuming it's on one or more servers, and not also running locally) from multiple computers running VS. Alternately, you might want to look into load testing services, such as from SOASTA, that run load tests from the cloud.
Some commercial load testing tools also allow for distributed load testing. Disclaimer: I work for Telerik in the Test Studio group, and we have distributed load testing.
Jacob,
It sounds like you're overloading your system. With any significant load run, regardless of the toolset, you need to get the various load components distributed out to run on different systems.
If you haven't already, you should have a look at the Working With Load Tests documentation on MSDN. See also the Considerations for Large Load Tests (especially the part about overloading Agents) as well as the Run a Load Test Using Agents pages.
Behavior:
Application is loaded and being used as expected.
Suddenly, a particular DLL can no longer be loaded. The error message is:
ActiveX component cannot create object.
In each case, the object had been created successfully many times before failure. All objects are marked for "retain in memory".
This error is cleared when the application pool is recycled. It may be hours or months before it is seen again.
Issue has happened within two hours of a refresh, as well as never happened in months of uptime.
Issue has happened with hundreds of simultaneous users (heavy usage) and also with 1-3 users.
While the issue is occurring, the process running that application pool cannot create the object that is failing. However it can create any other objects. Memory, CPU, and other resources all remain at normal usage. In addition, other processes (such as a stand-alone exe) can successfully create the object.
The first instance of the issue appeared in mid 2008. There have been less than fifty instances since then, despite a pool of hundreds of servers for it to occur on. All instances except one have failed on the same DLL.
DLL Failure Info:
most common - generic data structure implementing a b-tree, has no references other than to its interface. Code consists of arrays and one use of the vb6 Event functionality. The object has not been changed in any way since 2005.
one-time - interop to a .NET module. the failure is occurring when trying to create the interop object, not the .NET object. This object is updated a few times each year.
Application Environment:
IIS hosted application
VB6, classic ASP, some interop to minor .NET components
Windows Server 2003 / Windows Server 2008 (both have independently had the problem)
Attempts to Reproduce:
Using scripts (and real-life humans) to run the same end-user workflows that our logs reported the days before the issue occurred.
Using scripts to create/destroy suspected objects as fast as possible from multiple simultaneous sessions.
Wild speculation.
No intentional success, but it does manifest randomly on the servers on its own.
Troubleshooting:
Code reviews
Test harnesses to investigate upper limits of object creation / destruction
Verification of ability to create object outside of the process experiencing the issue
Monitoring of resources over time on servers under load
Review of IIS, error, and event logs to determine events leading up to issue
Questions:
Any ideas on how to reproduce the issue?
What could cause this behavior?
Ideas for bypassing the first two questions in favor of a fast solution?
The DLL isn't on a network drive is it? You can get "glitches" where the drive is not available momentarily that then means COM can't do what it needs and could then fail to notice the drive is available again.
I used Process Monitor to debug similar problem when accessing ADO/OLEDB stack. Turned out environment got corrupted at some point and ADO classes are registered with InprocServer32 being REG_EXPAND_SZ pointing to %CommonProgramFiles%\System\ado\msado15.dll or similar ot x64 OSes.
Also when you register an application with Restart Manager, on failure the process gets restarted by winlogon process whose environment is different than explorer's one and unfortunately is missing %CommonProgramFiles% -- ouch!
This seems like a random failure; some race condition.
Try VMWARE to record the state of the machine you run this dll on. When the error happens you can then replay the record and inspect the memory contents. That why you won't have to play try and catch the error. At least you will have a solid record of it.
While I can't provide a solution, try catching the error and retry loading the dll when this happens after a refresh to the environment.
I have a webserver that is pegged and I've been able to isolate it to a particular website instance. I'd like to dig deeper and isolate the particular page/process that is causing the issue.. Any tips?
You can take a memory dump of the process and poke around with windbg.
There are posts on this issue from Tess Ferrandez blog. Just do as she say.
Which version of IIS are you using? Some of the higher ones allow for a separation of which process gets used to handle requests such as a worker process that you could isolate a bit more that way. I'd also suggest reading through the IIS logs to see what requests were being handled, how long they took, etc.
There are many different quirks to each IIS version. The really low ones just had a start/stop functionality, but the newer ones have really given administrators much more control and power, IMO.
You should try using a profiler to identify what is using up the most resources. I've used dotTrace Profiler, although that can be expensive if you're on a tight budget.
It allows you to see exactly what processes and method calls use of the most processing time of a request really well so you can isolate the most resource intensive operations.
You should really be able to use any profiler to do this, not just dotTrace. I just happen to only have experience with this one in particular.
Change your web garden setting to 10 or greater. Then watch your CPU and memory utilization on the web server.
Continue to increase the web garden setting until either the app is completely responsive with less than 5% average utilization OR you have actually maxed your web server's memory.
UPDATE
It's not about diagnosing, it's about properly configuring the IIS server. Web Gardens are one of the top misunderstood features of IIS. By increasing the available threads to handle new requests you remove the appearance of contention at the web server level and place it squarely where it belongs. In this case at your database. Instead of masking a problem it actually highlights exactly where the problem is.
This turned out to be a SQL problem (sql 2005). The solution was found by using SQL activity monitor to identify a suspended process with a Async_network_io wait type. We then ran SQL profiler to narrow it down to two massive queries which were returning an over abundance of results.