Glimpse seems to add a time penalty of between 2 and 4 of Action execution process time, Is this expected? - glimpse

I am using MVC3, ASP.NET 4.5, C#, Razor, EF5, SQL Server 2008 R2.
I am upto to date with Glimpse and have installed:
Glimpse.ADO
Glimpse.ASP.NET
Glimpse.Core
Glimpse.EF5
Glimpse.MVC3
Glimpse for Elmah
I have noticed that Glimpse adds between 2 and 4 times the Execution time to the total time, and this time penalty is recorded under the "Server" category. So if I have an Action that performs some DML via LINQ which takes 1 sec, then the total server time is 3 sec. When I turn Glimpse off via Glimpse.axd, the time reduces to 1 sec.
Is this expected behaviour, and if so why?
The important bit is knowing that it is due to Glimpse and not my code, especially as Glimpse provides such great information for the Action Code.
Thanks in advance.
EDIT
Related Question

Related

Is there a way to improve priority of API calls from VBA since there are unpredictable latency/response times all the way up to 5 milliseconds

I am looking for advice on how to reduce unpredictable horrible latency/response times for API calls from VBA. I did some statistical analysis of Excel VBA API calls to QueryPerformanceCounter and GetSystemTimeAsPrecisionFileTime.
On my machine (8 core, 5.2Ghz Max frequency, W10, Office 2019) both of these have 100nanosecond single tick resolution, they both need at a minimum of 6 ticks elapsed time to get response back, a mode of 7 ticks, an average of 8+ ticks, which I can live with.
But there are serious outliers in the distribution: 0.2% of the time they need at least 100 ticks (10 microseconds), and on very rare occasion as much as 5 milliseconds to get a response back to VBA. If I unplug the power supply from this laptop, the delays increase of course. They skyrocket to >11 ticks average, and ~0.2% of the time > 20 microseconds. I surmise this is some sort of queue time issue but I have failed to find any discussion on this issue.
Is there a way to improve priority for the API calls? Maybe something crazy like assigning two or three cores exclusively to Excel and the API, everything else to the other 5-6 cores?
Excel/VBA only uses max of ~30% CPU time according to task manager, so probably no hit on speed of execution for the code.
VBA does not support multi-threading it can only use one core of your computer. So if you need multi-threading switch to a real programming language eg. Python (there exist libraries to handle Excel data with Python). Or use one of the alternatives mentioned here: Multi-threading in VBA
In VBA Excel will always wait for one command to finish until it can start the next one (single threading).
By the way the time between the VBA command to the API and the API returning the result cannot be influenced by VBA (or any other solution). This is the API's calculation time to give the result. So this is not Excel's fault that it takes long, it is the API that needs that time to calculate the result (which depends on the values you give that API how long this calculation time actually is).
I have continued to research my question. I think the bottom line is that you are at the total mercy of windows unpredictable prioritization of events. You cannot force prioritization, even if you set affinity to one CPU core and priority to high. I have tried both and indeed I see these timing outliers. See for example hints of this in this thread about getting frequency for timing calcs.
So I have to be aware that any one result has a 2+ percent probability that the timing error will be 2+ microseconds, at least on my PC running in turbo mode.

Why is Python consistently struggling to keep up with constant generation of asyncio tasks?

I have a Python project with a server that distributes work to one or more clients. Each client is given a number of assignments which contain parameters for querying a target API. This includes a maximum number of requests per second they can make with a given API key. The clients process the response and send the results back to the server to store into a database.
Both the server and clients use Tornado for asynchronous networking. My initial implementation for the clients relied on the PeriodicCallback to ensure that n-number of calls to the API would occur. I thought that this was working properly as my tests would last 1-2 minutes.
I added some telemetry to collect statistics on performance and noticed that the clients were actually having issues after almost exactly 2 minutes of runtime. I had set the API requests to 20 per second (the maximum allowed by the API itself) which the clients could reliably hit. However, after 2 minutes performance would fluctuate between 12 and 18 requests per second. The number of active tasks steadily increased until it hit the maximum amount of active assignments (100) given from the server and the HTTP request time to the API was reported by Tornado to go from 0.2-0.5 seconds to 6-10 seconds. Performance is steady if I only do 14 requests per second. Anything higher than 15 requests will experience issues 2-3 minutes after starting. Logs can be seen here. Notice how the column of "Active Queries" is steady until 01:19:26. I've truncated the log to demonstrate
I believed the issue was the use of a single process on the client to handle both communication to the server and the API. I proceeded to split the primary process into several different processes. One handles all communication to the server, one (or more) handles queries to the API, another processes API responses into a flattened class, and finally a multiprocessing Manager for Queues. The performance issues were still present.
I thought that, perhaps, Tornado was the bottleneck and decided to refactor. I chose aiohttp and uvloop. I split the primary process in a similar manner to that in the previous attempt. Unfortunately, performance issues are unchanged.
I took both refactors and enabled them to split work into several querying processes. However, no matter how much you split the work, you still encounter problems after 2-3 minutes.
I am using both Python 3.7 and 3.8 on MacOS and Linux.
At this point, it does not appear to be a limitation of a single package. I've thought about the following:
Python's asyncio library cannot handle more than 15 coroutines/tasks being generated per second
I doubt that this is true given that different libraries claim to be able to handle several thousand messages per second simultaneously. Also, we can hit 20 requests per second just fine at the start with very consistent results.
The API is unable to handle more than 15 requests from a single client IP
This is unlikely as I am not the only user of the API and I can request 20 times per second fairly consistently over an extended period of time if I over-subscribe processes to query from the API.
There is a system configuration causing the limitation
I've tried both MacOS and Debian which yield the same results. It's possible that's it a *nix problem.
Variations in responses cause a backlog which grows linearly until it cannot be tackled fast enough
Sometimes responses from the API grow and shrink between 0.2 and 1.2 seconds. The number of active tasks returned by asyncio.all_tasks remains consistent in the telemetry data. If this were true, we wouldn't be consistently encountering the issue at the same time every time.
We're overtaxing the hardware with the number of tasks generated per second and causing thermal throttling
Although CPU temperatures spike, neither MacOS nor Linux report any thermal throttling in the logs. We are not hitting more than 80% CPU utilization on a single core.
At this point, I'm not sure what's causing it and have considered refactoring the clients into a different language (perhaps C++ with Boost libraries). Before I dive into something so foolish, I wanted to ask if I'm missing something simple.
Conclusion
Performance appears to vary wildly depending on time of day. It's likely to be the API.
How this conclusion was made
I created a new project to demonstrate the capabilities of asyncio and determine if it's the bottleneck. This project takes two websites, one to act as the baseline and the other is the target API, and runs through different methods of testing:
Spawn one process per core, pass a semaphore, and query up to n-times per second
Create a single event loop and create n-number of tasks per second
Create multiple processes with an event loop each to distribute the work, with each loop performing (n-number / processes) tasks per second
(Note that spawning processes is incredibly slow and often commented out unless using high-end desktop processors with 12 or more cores)
The baseline website would be queried up to 50 times per second. asyncio could complete 30 tasks per second reliably for an extended period, with each task completing their run in 0.01 to 0.02 seconds. Responses were very consistent.
The target website would be queried up to 20 times per second. Sometimes asyncio would struggle despite circumstances being identical (JSON handling, dumping response data to queue, returning immediately, no CPU-bound processing). However, results varied between tests and could not always be reproduced. Responses would be under 0.4 seconds initially but quickly increase to 4-10 seconds per request. 10-20 requests would return as complete per second.
As an alternative method, I chose a parent URI for the target website. This URI wouldn't have a large query to their database but instead be served back with a static JSON response. Responses bounced between 0.06 seconds to 2.5-4.5 seconds. However, 30-40 responses would be completed per second.
Splitting requests across processes with their own event loop would decrease response time in the upper-bound range by almost half, but still took more than one second each to complete.
The inability to reproduce consistent results every time from the target website would indicate that it's a performance issue on their end.

Powershell Foreach -Parallel not reaching throttle limit

I have a script using workflows and Foreach -Parallel -ThrottleLimit 15 which works fine if the code inside the loop is something simple like write-output or some basic math and a sleep statement, but when I swap in the actual logic (updating product for a particular client) it caps at 5 running simultaneously.
Unfortunately, I can't post the full source but at a high level the update process reads from some config files occasionally, writes to log files (separate logs for each thread), reads/writes to a couple of databases, nothing to cause the CPU/RAM/IO to max or anything though and to the best of our knowledge there is no contention going on between threads over these resources.
Things We've looked into:
Powershell 3 had a hard limit of 5 (We are using Powershell 5 and are able to run more than 5 at a time when replacing the actual work with simple math and a write-output)
CPU/RAM/Disk IO all appear to be pretty stable (not maxed out or spiking a lot)
Look for MS documentation on this param (not much available and doesn't mention any strange behavior that I've seen)
I'm at a loss as to where else to look to troubleshoot this. At first we thought maybe the ThrottleLimit was a maximum and the actual number was dynamic based on resources, but our limited testing seemed to indicate that it uses the full 15 even when CPU and Memory are pretty high. Also, the very first iteration is capped at 5, long before any resources are actually being used heavily (there is a start-sleep of 15 minutes right away in the update script as we allow a warning to our users to finish what they are doing before continuing to take down the DB for the update.)
Does anyone have any insight into what else we can check or look into that might cause this? Our previous parallel logic was using Jobs and I wanted to avoid returning to that logic when this seems like it should work perfectly fine.

What is the default / maximum value for WEBJOBS_RESTART_TIME?

I have a continuous webjob and sometimes it can take a REALLY, REALLY long time to process (i.e. several days). I'm not interested in partitioning it into smaller chunks to get it done faster (by doing it more parallel). Having it run slow and steady is fine with me. I was looking at the documentation about webjobs here where it lists out all the settings but it doesn't specify the defaults or maximums for these values. I was curious if anybody knew.
Since the docs say
"WEBJOBS_RESTART_TIME - Timeout in seconds between when a continuous job's process goes down (for any reason) and the time we re-launch it again (Only for continuous jobs)."
it doesn't matter how long your process runs.
Please clarify your question as most part of it is irrelevant to what you're asking at the end.
If you want to know the min - I'd say try 0. For max try MAX_INT (2147483647), that's 68 years. That should do it ;).
There is no "max run time" for a continuous WebJob. Note that, in practice, there aren't any assurances on how long a given instance of your Web App hosting the WebJob is going to exist, and thus your WebJob may restart anyway. It's always good design to have your continuous job idempotent; meaning it can be restarted many times, and pick back up where it left off.

LoadRunner and the need for pacing

Running a single script with only two users as a single scenario without any pacing, just think time set to 3 seconds and random (50%-150%) I experience that the web app server runs of of memory after 10 minutes every time (I have run the test several times, and it happens at the same time every time).
First I thouhgt this was a memory leak in the application, but after some thought I figured it might have to do with the scenario design.
The entire script having just one action including log in and log out within the only action block takes about 50 seconds to run and I have the default as soon as the previous iteration ends set not the with delay after the previous iteration ends or fixed/random intervalls set.
Could not using fixed/random intervalls cause this "memory leak" to happen? I guess non of the settings mentioned would actually start a new iteration before the one before ends, this obvioulsy leading to accumulation of memory on the server resulting in this "memory leak". But with no pacing set is there a risk for this to happen?
And having no iterations in my script, could I still be using pacing?
To answer your last question: NO.
Pacing is explicitly used when a new iteration starts. The iteration start is delayed according to pacing settings.
Speculation/Conclusions:
If the web-server really runs out of memory after 10 minutes, and you only have 2 vu's, you have a problem on the web-server side. One could manually achieve this 2vu load and crash the web-server. The pacing in the scripts, or manual user speeds are irrelevant. If the web-server can be crashed from remote, it has bugs that need fixing.
Suggestion:
Try running the scenario with 4 users. Do you get OUT OF MEMORY on the web-server after 5 mins?
If there really is a leak, your script/scenario shouldn't be causing it, but I would think that you could potentially cause it to appear to be a problem sooner depending on how you run it.
For example, let's say with 5 users and reasonable pacing and think times, the server doesn't die for 16 hours. But with 50 users it dies in 2 hours. You haven't caused the problem, just exposed it sooner.
i hope its web server problem.pacing is nothing but a time gap in between iterations,it's not effect actions or transactions in your script

Resources