reading Azure application insights report - azure

I have an API call that sometimes takes like 10 seconds to run.
I've checked the duration for each external call and it seems fine. I mean that there is external resource calls so I would understand that it should wait for ~200 ms.
What I don't understand is the time between a resource call and then there is nothing in between for 6 seconds until the next step.
What could be the reason?
Furthermore, it usually takes less than 1 second so I don't think my could cause a wait of 6 seconds :|

What I don't understand is the time between a resource call and then there is nothing in between for 6 seconds until the next step. What could be the reason?
If we call multiple dependencies for the first time, there is really a big gap in different dependencies normally. We need a period of time to load the new dependency. After the first call, if we run the same page again, we could see there is only slightly delay time in different dependencies. And later call results are similar to this.
If your problem does not happen on calling dependencies the first time, you could find which dependency’s duration is longest by clicking ‘View as timeline’. And you could optimize the code about this dependency in your project. Sometimes the delay also occurred in our internal processing. The official docs also have related explanation.
Request timeline
In a different case, there is no dependency call that is particularly long. But by switching to the timeline view, we can see where the delay occurred in our internal processing:
Call dependencies the first time:
Call after the first time:

Related

Persistent Spotify 429 errors - with ridiculous retry-after suggestion of 76,000s (about 21hr)

I am working on a application that uses the Spotify Web API to build and maintain playlists for the user based on a given recipe (just a JSON that represents a logic-scheme basically). Currently the application is in development mode. I use delays between each API call I make, currently about 400ms. And I also had delays of 7.5s when I got the occasional 429 error (too many requests).
Anyway, I recently made it so that all of the playlist recipes get rebuilt in an infinite loop. So the process is just always running and making API calls about every 100ms, in order to keep all of the playlists up-to-date based on the recipes. However after letting this loop run for about 10 minutes, I started persistently getting 429s even after retrying after 7.5s and longer.
Apparently the 429 responses contain a header called 'retry-after' which is how long Spotify suggests waiting before making another call (as I said, before I was just using a fixed 7.5s delay on 429s). I am seeing that the value I am receiving for 'retry-after' is on the order of about 76,000s (21 hours).
But I thought that the rate limits are enforced over a 30s window...
(see https://developer.spotify.com/documentation/web-api/guides/rate-limits/) So why is my 'retry-after' header so high?
This is mostly a design philosophy question so the code itself I think is mostly irrelevant but if you'd like to take a look it's available here: https://github.com/jakefoglia/Smart-Playlist-Manager
site/SPM-core/maintainer.js : contains the 'infinite loop'
site/SPM-core/spotify_api_hook.js : contains most of the API calls
The 30s window is presented in the documentation only as an example, not as an actual way in which the API works. As you correctly say, Retry-After header (value is seconds) is all the information you need to decide how long to wait before doing the next call.
Each time your app "violates" the rate limit by making an early request, it gets "punished" by an increased delay period, — and since the app apparently never even consulted the header, and repeatedly violated the limit, the delay got this high. This however did not result in shutdown, or blocking, or rejection, or something similar, because the header only suggests the duration of a delay, rather than enforcing it.

nodejs setTimout loop stopped after many weeks of iterations

Solved : It's a node bug. Happens after ~25 days (2^32 milliseconds), See answer for details.
Is there a maximum number of iterations of this cycle?
function do_every_x_seconds() {
//Do some basic things to get x
setTimeout( do_every_x_seconds, 1000 * x );
};
As I understand, this is considered a "best practice" way of getting things to run periodically, so I very much doubt it.
I'm running an express server on Ubuntu with a number of timeout loops .
One loop that runs every second and basically prints the timestamp.
One that calls an external http request every 5 seconds
and one that runs every X 30 to 300 seconds.
It all seemes to work well enough. However after 25 days without any usage, and several million iterations later, the node instance is still up, but all three of the setTimout loops have stopped. No error messages are reported at all.
Even stranger is that the Express server is still up, and I can load http sites which prints to the same console as where the periodic timestamp was being printed.
I'm not sure if its related, but I also run nodejs with the --expose-gc flag and perform periodic garbage collection and to monitor that memory is in acceptable ranges.
It is a development server, so I have left the instance up in case there is some advice on what I can do to look further into the issue.
Could it be that somehow the event-loop dropped all it's timers?
I have a similar problem with setInterval().
I think it may be caused by the following bug in Node.js, which seem to have been fixed recently: setInterval callback function unexpected halt #22149
Update: it seems the fix has been released in Node.js 10.9.0.
I think the problem is that you are relying on setTimeout to be active over days. setTimeout is great for periodic running of functions, but I don't think you should trust it over extended time periods. Consider this question: can setInterval drift over time? and one of its linked issues: setInterval interval includes duration of callback #7346.
If you need to have things happen intermittently at particular times, a better way to attack this would be to schedule cron tasks that perform the tasks instead. They are more resilient and failures are recorded at a system level in the journal rather than from within the node process.
A good related answer/question is Node.js setTimeout for 24 hours - any caveats? which mentions using the npm package cron to do task scheduling.

WebJob Running for so long time for some messages and never finished status

For every 1000 messages 1 message is running for 20 minutes and more than that where other messages are completing in less than 1 sec. What could be the reason and I don't know whether it is going to be complete.
Some messages are going to "Never Finished" state other than Success and Failure. What could be the reason and I think my function has no issues if so we are logging it.
If the message processing is taking a long time periodically (or not finishing at all), it must be that every now and then the operations in your job function take a long time or fail. All depends on what your job is actually doing internally. If it is going async in places, the SDK will continue to wait for it to return. We did add a new feature very recently TimeoutAttribute (see release notes: http://github.com/Azure/azure-webjobs-sdk/wiki/Release-Notes). The Dashboard should show any function errors.
If you suspect that your job may be hanging/failing at certain places, you might try verifying locally that this is handled correctly by your logging etc. You could add Task.Delays or errors at various spots and verify that it's logged/handled correctly.

Coded UI test fails after specific period of time on its own

It goes like this: in my test method, I have 3 Playback.Wait() calls. Each one is set to 2 minutes. Between those waits, I am doing some stuff here and there, - that stuff works, was tested without those 3 waits and is all OK. As soon as 3 waits are there, test method just exists on its own, somewhere during 2nd wait. There is no call stack, no useful info regarding the cause for test termination, nothing. I am pretty much clueless at the moment what is happening. Elapsed time is always 5 minutes, - no more no less.
Do you have any idea what can be wrong?
This is a setting in your .testsettings file, under Test Timeouts section. Specifically, "Mark individual test as failed if its execution time exceeds:", and then gives you fields to enter time in hours, minutes, and seconds. I believe that 5 minutes is the default. You can increase this to fit your purpose, or just remove it entirely (not reccommended).

LoadRunner and the need for pacing

Running a single script with only two users as a single scenario without any pacing, just think time set to 3 seconds and random (50%-150%) I experience that the web app server runs of of memory after 10 minutes every time (I have run the test several times, and it happens at the same time every time).
First I thouhgt this was a memory leak in the application, but after some thought I figured it might have to do with the scenario design.
The entire script having just one action including log in and log out within the only action block takes about 50 seconds to run and I have the default as soon as the previous iteration ends set not the with delay after the previous iteration ends or fixed/random intervalls set.
Could not using fixed/random intervalls cause this "memory leak" to happen? I guess non of the settings mentioned would actually start a new iteration before the one before ends, this obvioulsy leading to accumulation of memory on the server resulting in this "memory leak". But with no pacing set is there a risk for this to happen?
And having no iterations in my script, could I still be using pacing?
To answer your last question: NO.
Pacing is explicitly used when a new iteration starts. The iteration start is delayed according to pacing settings.
Speculation/Conclusions:
If the web-server really runs out of memory after 10 minutes, and you only have 2 vu's, you have a problem on the web-server side. One could manually achieve this 2vu load and crash the web-server. The pacing in the scripts, or manual user speeds are irrelevant. If the web-server can be crashed from remote, it has bugs that need fixing.
Suggestion:
Try running the scenario with 4 users. Do you get OUT OF MEMORY on the web-server after 5 mins?
If there really is a leak, your script/scenario shouldn't be causing it, but I would think that you could potentially cause it to appear to be a problem sooner depending on how you run it.
For example, let's say with 5 users and reasonable pacing and think times, the server doesn't die for 16 hours. But with 50 users it dies in 2 hours. You haven't caused the problem, just exposed it sooner.
i hope its web server problem.pacing is nothing but a time gap in between iterations,it's not effect actions or transactions in your script

Resources