very high max response and error when submit looping form submission - performance-testing

so my requirement is to run 90 concurrent user doing mutiple scenario (15 scenario)simultenously for 30 minutes in virtual some of the threads i use concurrent thread group and normal thread group.
now my issue is
1)after i execute all 15 scenarios, my max response for each scenario displayed very high (>40sec). is there any suggestion to reduce this high max response?
2)one of the scenario is submit web form, there is no issue if submit only one, however during the 90 concurrent user execution, some of submit web form will get 500 error code. is the error is because i use looping to achieve 30 min duration?

In order to reduce the response time you need to find the reason for this high response time, the reasons could be in:
lack of resources like CPU, RAM, etc. - make sure to monitor resources consumption using i.e. JMeter PerfMon Plugin
incorrect configuration of the middleware (application server, database, etc.), all these components need to be properly tuned for high loads, for example if you set maximum number of connections on the application server to 10 and you have 90 threads - the 80 threads will be queuing up waiting for the next available executor, the same applies to the database connection pool
use a profiler tool to inspect what's going on under the hood and why the slowest functions are that slow, it might be the case your application algorithms are not efficient enough
If your test succeeds with single thread and fails under the load - it definitely indicates the bottleneck, try increasing the load gradually and see how many users application can support without performance degradation and/or throwing errors. HTTP Status codes 5xx indicate server-side errors so it also worth inspecting your application logs for more insights


OpenSearch (ElasticSearch) latency issue under multi-threading using RestHighLevelClient

We use RestHighLevelClient to query AWS OpenSearch in our service. Recently we have seen some latency issues related to OpenSearch calls so I'm doing stress test to troubleshoot but observed some unexpected behaviors.
In our service when a request is received, we start 5 threads and make one OpenSearch call within each thread in parallel in order to achieve the latency performance similar to one call. During load tests even when I send traffic with 1TPS, for the same request I'm seeing very different latency numbers for different threads, specifically there's usually one or two threads seeing huge latency compared to others, which seems like that thread is being blocked by something, for example 390 ms, 300ms, 1.1 sec, 520ms, 30ms for each thread while in the mean time I don't see any search latency spike reported on OpenSearch service, with the max SearchLatency being under 350ms all the time.
I read that the low level rest client used in the RestHighLevelClient is managing a conn pool with very small default maxConn values so I've override both the DEFAULT_MAX_CONN_PER_ROUTE to be 100 and DEFAULT_MAX_CONN_TOTAL to be 200 when creating the client but it doesn't seem working based on the test results I saw before and after updating these two values.
I'm wondering if anyone has seen similar issues or has any ideas on what could be the reason for this behavior. Thanks!

Limiting number of requests in cassandra without causing starting timeout ticking

The DataStax Cassandra driver of version 4 has got a feature of the throttling.
The documentation states:
Similarly, the request timeout encompasses throttling: the timeout starts ticking before the
throttler has started processing the request; a request may time out while it is still in the
throttler's queue, before the driver has even tried to send it to a node.
Great. However, let's say I have a dynamic list of some ids and I want to execute select requests to cassandra in parallel (using executeAsync()) for all ids in the list. Having list too large I will eventually face timeouts if requests are residing in the throttler's queue too long.
How can I overcome this issue? Is there any built-in rate limiting technique so I can do not care about how many requests in parallel I can execute, but just throw all of them to cassandra and then wait until they all are completed??
UPD: I am not interested in custom code solutions, as ofc we are capable to implement our own rate limit solution. I am asking precisely about driver's built-in mechanisms to achieve this.

air traffic controller for threads when calling a REST API

DISCLAIMER: If this post is off-topic to this site, please recommend a site where this post would be appropriate.
On Ubuntu 18.04, in bash, I am writing a network-based, threaded application that requires multiple servers. It receives files through the network and processes them, ultimately making an API call that finishes the processing and logs the results to a database for later retrieval and reporting.
So far I have written the application using non-threaded programming models and concepts. That means the files are processed one at a time in real-time. This works great if there is no sudden burst of files and/or a backlog of files to process. The main bottle neck has been the way I sequentially send files to the API one after another, waiting until the entire operation has taken place for one file and the API returns the results. The API has a rate limit of 8 calls per second. But since each call takes from .75 to 1 second, my program waits until the operation is done and only processes about 1 file per second through the API. In short, I did not have to worry about scheduling API calls because I could barely do one call per second.
Since the capacity is there to process 8 files per second, and I need more speed, I have been converting my single-threaded, sequential application into a parallel, scalable, multi-threaded application. This new version can spawn enough threads to send 8 files per second to the REST API and much more. So now I have the opposite problem. I am sending too many requests per second to the REST API and am in danger of triggering penalties, etc. Ultimately, when my traffic is higher, I will upgrade my subscription to the API and get more calls per second, but this current dilemma has got me thinking about how to schedule the API calls with different threads.
The purpose of this post is to discuss an idea about how to schedule these REST API calls across various threads. Specifically, I want to discuss how to coordinate timing and usage of the API while maintaining efficiency and yet not overloading the API. In short, I want to coordinate a group of threads so that the API is properly used. Not too fast and not too slow.
Independent of my application, this idea could be useful in a number of generically similar scenarios.
My idea is to create an "air traffic controller" ("ATC") so that the threads of the application have a centralized timing authority to check when they are ready to submit files to the REST API. The ATC would know how many time slots/calls per time period (in this case, calls per second) the API can schedule. The ATC would be listening for the threads to request a time slot ("launch code") which would give them a time slot in the future to perform their API call. The ATC would decide based on the schedule of other launch codes that it has already handed out.
In my case, from the start of the upload of the file to the API, it could take 0.75 to 1 second to complete the processing and receive a response from the API. This does not affect the count of new API calls that can be performed. It is just a consideration of how long the threads will be waiting once they call the API. It may not be relevant to this overall discussion.
Each thread would obviously have to do some error handling. If the API timed out or threw an error, then the thread would have to handle it and get back in line with the ATC -if appropriate- and ask for a new launch code. Maybe it should report the error to the ATC for centralized logging?
In situations where the file processing needs burst above 8 files per second, there would be a scheduling backlog where the threads should wait their turn as assigned by the ATC.
Here are some other considerations:
The ATC would be a lightweight daemon that does the following:
- listens on some TCP port
- receives a request
security token (?), thread id, priority
- authenticates the request (?)
- examines schedule
- reserves the next available time slot
- returns the launch code
security token (?), current time, launch timing offset to current time, URL and auth token for the API
- expunged expired launch codes
The ATC would need the following:
- to know what port it is supposed to run on
- to know how many slots per time period it was set to schedule
(e.g. 8 per second)
- to have a super fast read/write access to the schedule (associative array?)
- to know the URL and corresponding auth token for the thread to use
- maybe to know multiple URLs and auth tokens for load balancing
Here are more things to consider:
How could we keep the ATC secure while ensuring high performance?
Network-level security (e.g. firewalls allowing only the IP addresses of the file-processing servers?)
Auth tokens or logins and passwords?
What would the requirements be for this ATC server? Would this be taxing to a CPU and memory?
How often would an NTP call be needed? By the ATC server? By the servers which call the API?
Being able to provide different URLs and auth tokens would allow the ATC to load balance with different API providers.
Threading of the ATC itself
Would the ATC need to spawn threads to be able to handle each new request?
How does a web server handle requests?
How would the various threads share a common schedule?
In a non-threaded environment, the ATC would possibly keep an associative array in memory to keep performance as high as possible. How would the various threads of the ATC have access to the same schedule?
So here is my question. Does this exist? If not, what are some best practices in trying to build the above?
It seems like a beanstalkd kind of network service except it only provides permission/scheduling and is extremely dependant on timing.

Bursts of Redis errors

We've recently created a new Standard 1 GB Azure Redis cache specifically for distributed locking - separated from our main Redis cache. This was done to improve stability on our main Redis cache which is a very long term issue which this action seems to of significantly helped with.
On our new cache, we observe bursts of ~100 errors within the same few seconds every 1 - 3 days. The errors are either:
No connection is available to service this operation (StackExchange.Redis error)
Could not acquire distributed lock: Conflicted ( error)
As they are errors from different packages, I suspect the Redis cache itself is the problem here. None of the stats during this time look out of the ordinary and the workload should fit comfortably in the Standard 1GB size.
I'm guessing this could be caused by the advertised Low network performance advertised, is this likely the cause?
Your theory sounds plausible.
Checking for insufficient network bandwidth
Here is a handy table showing the maximum observed bandwidth for various pricing tiers. Take a look at the observed maximum bandwidth for your SKU, then head over to your Redis blade in the Azure Portal and choose Metrics. Set the aggregation to Max, and look at the sum of cache read and cache write. This is your total bandwidth consumed. Overlay the sum of these two against the time period when you're experiencing the errors, and see if the problem is network throughput. If that's the case, scale up.
Checking server load
Also on the Metrics tab, take a look at server load. This is the percentage that Redis is busy and is unable to process requests. If you hit 100%, Redis cannot respond to new requests and you will experience timeout issues. If that's the case, scale up.
Reusing ConnectionMultiplexer
You can also run out of connections to a Redis server if you're spinning up a new instance of StackExchange.Redis.ConnectionMultiplexer per request. The service limits for the number of connections available based on your SKU are here on the pricing page. You can see if you're exceeding the maximum allowed connections for your SKU on the Metrics tab, select max aggregation, and choose Connected Clients as your metric.
Thread Exhaustion
This doesn't sound like your error, but I'll include it for completeness in this Rogue's Gallery of Redis issues, and it comes into play with Azure Web Apps. By default, the thread pool will start with 4 threads that can be immediately allocated to work. When you need more than four threads, they're doled out at a rate of one thread per 500ms. So if you dump a ton of requests on a Web App in a short period of time, you can end up queuing work and eventually having requests dropped before they even get to Redis. To test to see if this is a problem, go to Metrics for your Web App and choose Threads and set the aggregation to max. If you see a huge spike in a short period of time that corresponds with your trouble, you've found a culprit. Resolutions include making proper use of async/await. And when that gets you no further, use ThreadPool.SetMinThreads to a higher value, preferably one that is close to or above the max thread usage that you see in your bursts.
Rob has some great suggestions but did want to add information on troubleshooting traffic burst and poor ThreadPool settings. Please see: Troubleshoot Azure Cache for Redis client-side issues
Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.
Monitor how your ThreadPool statistics change over time using an example ThreadPoolLogger. You can use TimeoutException messages from StackExchange.Redis like below to further investigate:
System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)
Notice that in the IOCP section and the WORKER section you have a Busy value that is greater than the Min value. This difference means your ThreadPool settings need adjusting.
You can also see in: 64221. This value indicates that 64,211 bytes have been received at the client's kernel socket layer but haven't been read by the application. This difference typically means that your application (for example, StackExchange.Redis) isn't reading data from the network as quickly as the server is sending it to you.
You can configure your ThreadPool Settings to make sure that your thread pool scales up quickly under burst scenarios.
I hope you find this additional information is helpful.

In jmeter is it correct to decrese number of threads and increase reqest(copy of same request) to increase response time

In my application when i execute 2000 virtual users in thread(No: of threads) for 1 http request my response time was 30 sec , when i changed no of threads to 500 and instead of 1 http request I put 4 copies of same http request, RESPONSE TIME WAS 3 SEC . What is the difference? Is it the right way to reduce no of threads and increasing replicas of request? please help
Note: In each reqest i have changed the user id also
In terms of HTTP Request samplers your test must behave exactly like real browser behaves so artificially adding more HTTP Requests may (and will) break the logic of your workload (if it is in place).
In your case high response time seems to be caused by incorrect JMeter configuration, i.e. if JMeter is not properly configured to high load it simply will not be able to fire requests fast enough resulting in increased response time while your server will just be idle.
2000 threads sounds like quite a big number so make sure to:
Follow JMeter Best Practices
Follow recommendations from 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure especially these:
Increase JVM Heap size allocated for JMeter
Run your test in non-GUI mode
Remove all the Listeners from the Test Plan
Monitor baseline OS health metrics on the machine where JMeter is running (CPU, RAM, Disk, Network usage). You can use JMeter PerfMon Plugin for this. If you will notice the lack of any of the aforementioned resources, i.e. usage will start exceeding, say, 90% of total available capacity - JMeter is not acting at full speed and you will need to consider Distributed Testing.
To extend #Dmitri T answer, If your server response 10 times more on load, as you execute 2000 virtual users, it means there's a bottleneck that you need to identify.
Read JMeter's Best Practices
consider running multiple non-GUI JMeter instances on multiple machines using distributed mode
Also check Delay Thread creation until needed checkbox in Thread Group
JMeter has an option to delay thread creation until the thread starts sampling, i.e. after any thread group delay and the ramp-up time for the thread itself. This allows for a very large total number of threads, provided that not too many are active concurrently.
And set Thread Group Ramp-up to 2000
Start with Ramp-up = number of threads and adjust up or down as needed.
