Slow response times from free web app server every day at same time - azure

Every day at about 3:00PM-4:00PM GMT the response times start to increase (no memory increase or CPU increase)
There is a azure availability test going to server every 10 minutes.
As this is a dev site there is no traffic to it other than me (at the odd time) and the availability test
I log to a variable internally the startup time and this shows that the site is not restarting
The first request via a browser when this starts happening is very slow (2 minutes - probably some timeout).
After that it runs perfectly. That seems like the site is shutting down and then starting up on first request, but the pings are keeping it alive so the site is not shutting down (as far as I know)
On the odd log entry I get - I seem to be getting 502 errors - but I can't confirm this as the FEEB logs are usually off at this time.
FREB logs turn off automatically after 1 hour and as this is the middle of the night for me (NZDT) - I don't get a chance to turn on.
See attached images - as you can see the response times just increase at same time
Ignore the requests where they are above 20 - thats me going to it via browser
I always check the azure dashboard BEFORE viewing site in browser
Just got this error (from web browser randomly - keep accessing the same page:
502: The specified CGI application encountered an error and the server terminated the process.
Other relevant Info (Perhaps):
I initially had the availability test ping going to a ping endpoint /ping that only returned a 200 and empty string when I noticed this happening
It now points to the sites homepage to see if it changed anything - still the same.
Assuming the database is not the issue as the /ping endpoint doesn't touch the database - just a straight controller return.
Internal Exception handling is catching nothing
Service: Azure Free Web App (Development)
There are no web jobs or timed events on this site
Azure Dashboard Initial
Current tests:
Uploading as new site to a Basic 1 Small
Restarting dev site 12 hours before issues (usually 20 hours before)
Results:
Restarting free web-app 12ish hours before issue - same result at same time - so its not the app slowly overloading or it would me much later
Basic 1 Small: no problems - could it be something with the dev server ?
Azure Dashboard From Today
Observations:
Same behavior with /ping endpoint (just return empty string 200 Ok) and Main home page endpoint (database lookups [w/caching] / razer)
If anyone has any ideas what might be going on - I would very much appreciate it
:-)
Update:
It seems to of stopped (on its own) about 11/1/2016 1:50:49 AM GMT - my internal timestamp says it restarted - and then the errors started again same time as usual. Note: no-one is using the app. The basic 1 Small Server is still going fine.
Sorry I can't add anymore images (not enough rep)

By default, web apps are unloaded if they are idle for some period of time, which could cause the web site slow response during this period of time. Besides, this article is about troubleshooting HTTP "502 Bad Gateway" error or a HTTP "503 Service Unavailable" error in Azure web apps, you could read it. And from the article we could know scaling the web app could mitigate the issue.

Related

Why is my Azure node.js app becoming unresponsive?

I recently deployed a Node.js Backend Service to Azure and have the following problem. The service becomes unresponsive after a certain amount of time, and only comes back to life if a external request is sent. The problem is, that it takes about 3 minutes for the Container to start back up and actually return the request. I'm running Node 14 LTS. I also added a health check yesterday, but azure simply doesn't bother actually keeping the app alive, here is the metric off azure
I verified azure is actually trying to reach the correct endpoint, and it does. I also have "Always On" enabled. I also verified that the app itself, is not crashing. I log every request and all of a sudden requests are no longer received, which means the health endpoint doesn't respond either, but it does not result in a container restart. It just waits for an external request to appear and then decides to start everything back up, which takes too long.
I feel like it's some kind of configuration issue, because the app itself is not very complex and I never experienced crashes when doing local development.
The official document tells us that the Free pricing tier you are currently using, Always on does not take effect.
How do I decrease the response time for the first request after idle time?

HTTP Web Server: Agent did not complete within configured time limit

I have a web application that builds web-pages using agent (it's written in LS and we use [print html] to output HTML) and from time to time I see an error as below.
02-11-2020 10:00:18 HTTP Web Server: Agent did not complete within configured time limit [/path-to-database.nsf/web?openagent] Anonymous
02-11-2020 10:00:18 HTTP Server: Execution time limit exceeded by Agent '(Web)|Web' in database '/path-to-database.nsf'. Agent signer 'signer name'.
As a result HTTP task stuck so I have to restart it, but that means I have to monitor it all the time.
It does not seems to be related to agent time execution, otherwise I would have this issue constantly.
The activity does not seems to be the issue as well, according to google analytics it's around ~50 active users.
I doubt [Server Tasks\Agent manager] will help, because agent runs under HTTP task.
Does anybody know how to figure out what is the reason of such issue and where I have to dig to fix it.
Update
Domino version 11.0
The agent is triggered by anonymous visitor and does some relatively heavy computation to construct HTML response (loops and lookups are present, but I'm sure all loops ends properly, without infinitive run).
I guess settings for HTTP Agents are under this section (so 2 mins).
Web Agents and Web Services
Run web agents and web services concurrently? Enabled
Web agent and web services timeout: 120 seconds
In general request takes between 300ms-1 second, however there are some heavy pages with 1-5 seconds (but nothing like 10 seconds or more).
I notice the error only when we get more than 50 active users (who activity open new pages and thus trigger the agent).
I guess Richard is right and there must be some condition when agent stuck (maybe related to views update or some background process).
For now I simply restart HTTP to get this issue fixed (for some time).
So my question could be re-phrased to:
What can cause delay of the agent that build web page (taking into account it's related to 50-100 active users).
Thanks a lot :-)

Azure Http connection gets interrupted after 5 minutes

We have a setup with several RESTful APIs on the same VM in Azure.
The websites run in Kestrel on IIS.
They are protected by the azure application gateway with firewall.
We now have requests that would run for at least 20 minutes.
The request run the full length uninterrupted on Kestrel (Visible in the logs) but the sender either get "socket hang up" after exactly 5 minutes or run forever even if the request finished in kestrel. The request continue in Kestrel even if the connection was interrupted for the sender.
What I have done:
Wrote a small example application that returns after a set amount of
seconds to exclude our websites being the problem.
Ran the request in the VM (to localhost): No problems, response was received.
Ran the request within Azure from one to another VM: Request ran forever.
Ran the request from outside of Azure: Request terminates after 5 minutes
with "socket hang up".
Checked set timeouts: Kestrel: 50m , IIS: 4000s, ApplicationGateway-HttpSettings: 3600
Request were tested with Postman,
Is there another request or connection timeout hidden somewhere in Azure?
We now have requests that would run for at least 20 minutes.
This is a horrible architecture and it should be rewritten to be async. Don't take this personally, it is what it is. Consider returning a 202 Accepted with a Location header to poll for the result.
You're most probably hitting the Azure SNAT layer timeout —
Change it under the Configuration blade for the Public IP.
So I ran into something like this a little while back:
For us the issue was probably the timeout like the other answer suggests but the solution was (instead of increasing timeout) to add PGbouncer in front of our postgres database to manage the connections and make sure a new one is started before the timeout fires.
Not sure what your backend connection looks like but something similar (backend db proxy) could work to give you more ability to tune connection / reconnection on your side.
For us we were running AKS (azure Kubernetes service) but all azure public ips obey the same rules that cause issues similar to this one.
While it isn't an answer I know there are also two types of public IP addresses, one of them is considered 'basic' and doesn't have the same configurability, could be something related to the difference between basic and standard public ips / load balancers?

Application Insights for AzureWebsite showing GET / requests every five minutes even after deleting availability test

I have a production Azure website that I deployed a few days ago. I saw the "Availability" section in application insights (configured server side AI and client side AI is enabled right now) and through the new portal (portal.azure.com) I decided to set up availability "ping" test.
I left the majority of the settings at the default, Test Type was URL ping test, the URL is set to the root of my app, and frequency is 5 minutes. I left the test locations to the default of 5 in the different US regions.
What I noticed was that all the tests failed and I saw a lot of requests in my app for GET / with a status of 404. Since it was filling up my application insights request log with junk, and the availability test registered all failures, I deleted the availability test. Annoyingly after deleting, I noticed it was still there after refreshing the page, so I deleted it again and now it shows as "Not configured" and seems to be truly gone.
Before I deleted the availability test, I saw all these GET / requests in my AI logs and I looked at their IP addresses, they indeed seemed to come from the different US regions.
After I deleted the test, I assumed they would all stop. Unfortunately that did not happen, all but 1 of the 5 ping tests stopped but the one from with IP address as ::1 still seems to be happening. For almost a week that ping test still occurs every 5 minutes even though I deleted it.
How to remove the availability test completely from application insights?
The request that you see every 5 minutes is likely caused by the Always On feature, which uses it to keep the site alive. It is not related to the availability test.
You can verify that by temporarily turning off Always On and verifying that those requests stop.

Azure WebSites / App Service Unexplained 502 errors

We have a stateless (with shared Azure Redis Cache) WebApp that we would like to automatically scale via the Azure auto-scale service. When I activate the auto-scale-out, or even when I activate 3 fixed instances for the WebApp, I get the opposite effect: response times increase exponentially or I get Http 502 errors.
This happens whether I use our configured traffic manager url (which worked fine for months with single instances) or the native url (.azurewebsites.net). Could this have something to do with the traffic manager? If so, where can I find info on this combination (having searched)? And how do I properly leverage auto-scale with traffic-manager failovers/perf? I have tried putting the traffic manager in both failover and performance mode with no evident effect. I can gladdly provide links via private channels.
UPDATE: We have reproduced the situation now the "other way around": On the account where we were getting the frequent 5XX errors, we have removed all load balanced servers (only one server per app now) and the problem disappeared. And, on the other account, we started to balance across 3 servers (no traffic manager configured) and soon got the frequent 502 and 503 show stoppers.
Related hypothesis here: https://ask.auth0.com/t/health-checks-response-with-500-http-status/446/8
Possibly the cause? Any takers?
UPDATE
After reverting all WebApps to single instances to rule out any relationship to load balancing, things ran fine for a while. Then the same "502" behavior reappeared across all servers for a period of approx. 15 min on 04.Jan.16 , then disappeared again.
UPDATE
Problem reoccurred for a period of 10 min at 12.55 UTC/GMT on 08.Jan.16 and then disappeared again after a few min. Checking logfiles now for more info.
UPDATE
Problem reoccurred for a period of 90 min at roughly 11.00 UTC/GMT on 19.Jan.16 also on .scm. page. This is the "reference-client" Web App on the account with a Web App named "dummy1015". "502 - Web server received an invalid response while acting as a gateway or proxy server."
I don't think Traffic Manager is the issue here. Since Traffic Manager works at the DNS level, it cannot be the source of the 5XX errors you are seeing. To confirm, I suggest the following:
Check if the increased response times are coming from the DNS lookup or from the web request.
Introduce Traffic Manager whilst keeping your single instance / non-load-balanced set up, and confirm that the problem does not re-appear
This will help confirm if the issue relates to Traffic Manager or some other aspect of the load-balancing.
Regards,
Jonathan Tuliani
Program Manager
Azure Networking - DNS and Traffic Manager

Resources