How to achieve constant concurrent active users in closed model using Gatling? - performance-testing

My question is somehow related to this and this , but somehow doesn't really answer my question.
My test case is pretty simple, I need to generate constant active concurrent users (e.g. 10 concurrent users) constantly over a period of time (e.g. 60 sec).
My codes look like this
val TestProtocolBuilder: HttpProtocolBuilder = http.baseUrl("https://computer-database.gatling.io")
object Test {
val test =
exec(http("Test_Only")
.get("/computers")
.check(status.in(200, 404))
)
}
val TestOnly = scenario("Test Only").exec(Test.test)
setUp(
TestOnly.inject(
constantConcurrentUsers(10) during(60 seconds)
).protocols(TestProtocolBuilder)
)
This documentation says
constantConcurrentUsers(nbUsers) during(duration) : Inject so that number of concurrent users in the system is constant
I am expecting to get 10 concurrent active users constantly hitting the API for 60 seconds. No more than 10 users and no less than 10 users at any time.
What I see in the HTML report is that the active users at any given time is much higher than 10 (almost double).
Learning from the documentation , it says
This chart displays the active users during the simulation : total and per scenario.
“Active users” is neither “concurrent users” or “users arrival rate”. It’s a kind of mixed metric that serves for both open and closed workload models and that represents “users who were active on the system under load at a given second”.
It’s computed as:
(number of alive users at previous second) + (number of users that were started during this second) - (number of users that were terminated during previous second)
Questions:
Why Gatling keep terminating users and starting new users during the test period? What's the point?
How can I get a constant 10 concurrent active users (no more, no less) keep hitting my API for the duration of the test, if constantConcurrentUsers(10) during(60 seconds) give me a much higher active users and keep fluctuating during the test period ? I need to stick to my test case and not over-loading the API.
In the image above, at that given time the number of request = 7 and the active users = 20. Does it mean that at that given time, there are 7 active users sending out requests and there are 20 - 7 = 13 active users sitting idle waiting for the responses to come back from the API ?
Thanks.

Why Gatling keep terminating users and starting new users during the test period? What's the point?
Virtual users' lifespan is driven by your scenario. Injection profiles only drive when they are injected/started.
If you want to have your users not terminate after just one requests, add a loop in your scenario.
How can I get a constant 10 concurrent active users
Nonsensical. As you quoted yourself concurrent != active. I guarantee you have a constant number of concurrent users, meaning exactly 10 users alive at the same time. The thing is that as your scenario only has 1 single request, users terminate right after and are replaced with a new one.
Does it mean that at that given time, there are 7 active users sending out requests and there are 20 - 7 = 13 active users sitting idle waiting for the responses to come back from the API ?
It means virtual users lifespan overlapped between 2 seconds so they were seen alive in 2 different second buckets.

Related

Chrome Extension performance difference - alarm + history vs storage + history

My extension(manifest v3) needs to track the number of times a set of websites are visited either during the whole day or during certain time windows and then perform an action if the visit count exceeds a limit.
There are two ways I could think of implementing this:
alarm + history: Create an alarm that runs every 5 mins, search the history for the required websites and count the visits. If the count exceeds the limit perform an action
storage + history: Add a listener to chrome.history.onVisited. If the visited site is from the required list, increment the visit count in storage. If the storage count exceeds the limit perform an action
Which of the above approaches has least impact on Chrome's browsing performance? Or, is there any another api(s) that I can use to achieve the same?
I would like my extension to consume least amount of user's battery :)
In 1 the extension will do a lot of unnecessary work when the user isn't using the browser.
In 2 the extension's background script will restart more often if the user navigates a lot but makes pauses between navigating for more than the lifetime duration of the service worker (30 seconds by default), which is a typical interaction scenario.
In both cases the bigger inherent problem of ManifestV3 for an extension such as yours that observes user activity is not what the extension does itself, but the extremely huge overhead to restart the background worker, which is automatically terminated after 30 seconds since the last observed event (or 5 minutes if you use waitUntil). Such pauses in user activity are typical when browsing/interacting so for many users the worker will restart hundreds of times a day. Starting the worker takes 50-100ms and stresses the CPU+memory+disk for the entire duration, while a typical time spent in a simple observation extension's code is just 1-2ms.
In other words, an extension that observes user activity, such as yours, is inherently 25-100 times less efficient in ManifestV3 than it would be in ManifestV2 with a persistent background script.
Solutions.
Prolong the service worker's lifetime to reduce the amount of its restarts as shown here. To avoid wasting memory for users that keep the browser open without using it for hours you can dynamically adjust the lifetime duration by measuring and averaging intervals between the events or offer an option to set the duration in your extension UI. Hopefully, in the future the browser will do it automatically, but it may take years before this feature is actually implemented and even then it will still likely restart the background script way too often.
Use chrome.webNavigation events with a URL filter for your sites so that the background script wakes up only when these specific URLs are visited. If the URLs are configured by the user, you will need to unregister the listener first (e.g. by making the listener a named global function), then register it with the new URL filter. You may still need to prolong the worker's lifetime if these URLs are visited a lot.

Why is Python consistently struggling to keep up with constant generation of asyncio tasks?

I have a Python project with a server that distributes work to one or more clients. Each client is given a number of assignments which contain parameters for querying a target API. This includes a maximum number of requests per second they can make with a given API key. The clients process the response and send the results back to the server to store into a database.
Both the server and clients use Tornado for asynchronous networking. My initial implementation for the clients relied on the PeriodicCallback to ensure that n-number of calls to the API would occur. I thought that this was working properly as my tests would last 1-2 minutes.
I added some telemetry to collect statistics on performance and noticed that the clients were actually having issues after almost exactly 2 minutes of runtime. I had set the API requests to 20 per second (the maximum allowed by the API itself) which the clients could reliably hit. However, after 2 minutes performance would fluctuate between 12 and 18 requests per second. The number of active tasks steadily increased until it hit the maximum amount of active assignments (100) given from the server and the HTTP request time to the API was reported by Tornado to go from 0.2-0.5 seconds to 6-10 seconds. Performance is steady if I only do 14 requests per second. Anything higher than 15 requests will experience issues 2-3 minutes after starting. Logs can be seen here. Notice how the column of "Active Queries" is steady until 01:19:26. I've truncated the log to demonstrate
I believed the issue was the use of a single process on the client to handle both communication to the server and the API. I proceeded to split the primary process into several different processes. One handles all communication to the server, one (or more) handles queries to the API, another processes API responses into a flattened class, and finally a multiprocessing Manager for Queues. The performance issues were still present.
I thought that, perhaps, Tornado was the bottleneck and decided to refactor. I chose aiohttp and uvloop. I split the primary process in a similar manner to that in the previous attempt. Unfortunately, performance issues are unchanged.
I took both refactors and enabled them to split work into several querying processes. However, no matter how much you split the work, you still encounter problems after 2-3 minutes.
I am using both Python 3.7 and 3.8 on MacOS and Linux.
At this point, it does not appear to be a limitation of a single package. I've thought about the following:
Python's asyncio library cannot handle more than 15 coroutines/tasks being generated per second
I doubt that this is true given that different libraries claim to be able to handle several thousand messages per second simultaneously. Also, we can hit 20 requests per second just fine at the start with very consistent results.
The API is unable to handle more than 15 requests from a single client IP
This is unlikely as I am not the only user of the API and I can request 20 times per second fairly consistently over an extended period of time if I over-subscribe processes to query from the API.
There is a system configuration causing the limitation
I've tried both MacOS and Debian which yield the same results. It's possible that's it a *nix problem.
Variations in responses cause a backlog which grows linearly until it cannot be tackled fast enough
Sometimes responses from the API grow and shrink between 0.2 and 1.2 seconds. The number of active tasks returned by asyncio.all_tasks remains consistent in the telemetry data. If this were true, we wouldn't be consistently encountering the issue at the same time every time.
We're overtaxing the hardware with the number of tasks generated per second and causing thermal throttling
Although CPU temperatures spike, neither MacOS nor Linux report any thermal throttling in the logs. We are not hitting more than 80% CPU utilization on a single core.
At this point, I'm not sure what's causing it and have considered refactoring the clients into a different language (perhaps C++ with Boost libraries). Before I dive into something so foolish, I wanted to ask if I'm missing something simple.
Conclusion
Performance appears to vary wildly depending on time of day. It's likely to be the API.
How this conclusion was made
I created a new project to demonstrate the capabilities of asyncio and determine if it's the bottleneck. This project takes two websites, one to act as the baseline and the other is the target API, and runs through different methods of testing:
Spawn one process per core, pass a semaphore, and query up to n-times per second
Create a single event loop and create n-number of tasks per second
Create multiple processes with an event loop each to distribute the work, with each loop performing (n-number / processes) tasks per second
(Note that spawning processes is incredibly slow and often commented out unless using high-end desktop processors with 12 or more cores)
The baseline website would be queried up to 50 times per second. asyncio could complete 30 tasks per second reliably for an extended period, with each task completing their run in 0.01 to 0.02 seconds. Responses were very consistent.
The target website would be queried up to 20 times per second. Sometimes asyncio would struggle despite circumstances being identical (JSON handling, dumping response data to queue, returning immediately, no CPU-bound processing). However, results varied between tests and could not always be reproduced. Responses would be under 0.4 seconds initially but quickly increase to 4-10 seconds per request. 10-20 requests would return as complete per second.
As an alternative method, I chose a parent URI for the target website. This URI wouldn't have a large query to their database but instead be served back with a static JSON response. Responses bounced between 0.06 seconds to 2.5-4.5 seconds. However, 30-40 responses would be completed per second.
Splitting requests across processes with their own event loop would decrease response time in the upper-bound range by almost half, but still took more than one second each to complete.
The inability to reproduce consistent results every time from the target website would indicate that it's a performance issue on their end.

Occasional duplicate request using jmeter

I'm using JMeter 4.0 trying to create a stress test. The purpose is to emulate the types of requests we receive in production, which is generally an array of requests of different types with a certain frequency and occasionally (1 in 1000) duplicate requests of the same type within milliseconds of each other.
I've managed to create a thread group emulating frequent requests of different types and a second thread group emulating duplicate requests (using synchronizing timer to ensure the requests fire off together).
I'm almost finished. My only problem is that there is no relationship between the thread groups whatsoever. If I wanted to perform a duplicate request once every 1000 requests, I'd need to know how long it takes to perform an average request (which is complicated by the fact that there are several request types) and calculate the time it would require for roughly 1000 requests to be made, and add an appropriate constant timer in the other thread group.
This isn't ideal. I'll settle for this if I must, but I was hoping the bright minds of stackoverflow could shine some insight for my issue.
Some ideas I've had:
Add a run counter which cycles every 1000 normal requests and once run counter hits 1000, I perform a second request (though it would be under the same thread and after I've received the response from the first). Could this be made to work using a synchronized timer?
Use a constant throughput timer with "all active threads (shared)" set whose samples per minutes is set to 1000.
Is there a better way still? The actual requests are HTTP requests, though there are several steps prior in preparation of the message to send. I'm already using a constant throughput timer in the first thread group (random service requests) to maintain a specific amount of requests per minute, so I'm not sure if adding a second constant throughput timer in the other thread group would create issues.
Thank you for your time.
You can add If Controller with condition of 1 every 1000 threads
${__jexl3(${__threadNum} % 1000 == 0)}
and inside If Controller execute your duplicate HTTP Request
__threadNum return current thread/user number

Performance issue while using Parallel.foreach() with MaximumDegreeOfParallelism set as ProcessorCount

I wanted to process records from a database concurrently and within minimum time. So I thought of using parallel.foreach() loop to process the records with the value of MaximumDegreeOfParallelism set as ProcessorCount.
ParallelOptions po = new ParallelOptions
{
};
po.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(listUsers, po, (user) =>
{
//Parallel processing
ProcessEachUser(user);
});
But to my surprise, the CPU utilization was not even close to 20%. When I dig into the issue and read the MSDN article on this(http://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx), I tried using a specific value of MaximumDegreeOfParallelism as -1. As said in the article thet this value removes the limit on the number of concurrently running processes, the performance of my program improved to a high extent.
But that also doesn't met my requirement for the maximum time taken to process all the records in the database. So I further analyzed it more and found that there are two terms as MinThreads and MaxThreads in the threadpool. By default the values of Min Thread and MaxThread are 10 and 1000 respectively. And on start only 10 threads are created and this number keeps on increasing to a max of 1000 with every new user unless a previous thread has finished its execution.
So I set the initial value of MinThread to 900 in place of 10 using
System.Threading.ThreadPool.SetMinThreads(100, 100);
so that just from the start only minimum of 900 threads are created and thought that it will improve the performance significantly. This did create 900 threads, but it also increased the number of failure on processing each user very much. So I did not achieve much using this logic. So I changed the value of MinThreads to 100 only and found that the performance was much better now.
But I wanted to improve more as my requirement of time boundation was still not met as it was still exceeding the time limit to process all the records. As you may think I was using all the best possible things to get the maximum performance in parallel processing, I was also thinking the same.
But to meet the time limit I thought of giving a shot in the dark. Now I created two different executable files(Slaves) in place of only one and assigned them each half of the users from DB. Both the executable were doing the same thing and were executing concurrently. I created another Master program to start these two Slaves at the same time.
To my surprise, it reduced the time taken to process all the records nearly to the half.
Now my question is as simple as that I do not understand the logic behind Master Slave thing giving better performance compared to a single EXE with all the logic same in both the Slaves and the previous EXE. So I would highly appreciate if someone will explain his in detail.
But to my surprise, the CPU utilization was not even close to 20%.
…
It uses the Http Requests to some Web API's hosted in other networks.
This means that CPU utilization is entirely the wrong thing to look at. When using the network, it's your network connection that's going to be the limiting factor, or possibly some network-related limit, certainly not CPU.
Now I created two different executable files … To my surprise, it reduced the time taken to process all the records nearly to the half.
This points to an artificial, per process limit, most likely ServicePointManager.DefaultConnectionLimit. Try setting it to a larger value than the default at the start of your program and see if it helps.

for autoextended liferay sessions: why is a smaller session.timout recommended?

In the user guide section on session.timeout.auto.extend, Liferay specifies:
It is recommended to use this setting along with a smaller session.timeout,
such as 5 minutes, for better performance.
What exactly does 'better performance' mean ?
I haven't changed my session timeout and some of my users are getting lost sessions. I assume changing this value might solve the issue, but I would really like to understand better.
Thanks,
Alain
What exactly does 'better performance' mean ?
Lets say you have 10 Users who are currently logged in. If session.timeout value specified is 5 minutes, then, if these 10 Users are idle for 5 minutes their session will get destroyed. Its a good practice (and everyone does) that you close any open connections (like DB), free the thread and perform other resource management actions so that other Users who are going to login can use them.
But lets say the session.timeout value specified is 20 minutes. Then the resources allocated to these 10 Users will get freed only after they go idle for 20 minutes (unless they themselves logout). This example was explained only with 10 users. But, in real time, there would be thousands to lakh users who will be using your resources.
In summary, better performance can be achieved by keeping a lower value for session.timeout because you will be freeing the allocated threads and other resources quickly.

Resources