Azure App service returns 502 bad gateway from HttpClient - azure

I have an app service (plan B2) running on Azure.
My integration tests running from docker container are calling some app service endpoints one by one and sometimes receive 500 or 502 error.
When I debug tests I make some pauses between calls and all requests work successfully. Also, when I scale up my app service, everything works properly.(I don't want to scale up because cpu and other params are low.)
In my tests I have only one HttpClient and I dispose it at the end so I don't think there should be any connections leaks.
Also, in TCP Connections I have around 60 total connections while in Azure docs the limit is 1,920.
This app is not accessed by any users but here it says that I had the maximum connections. Is there any way how can I track these connections? Why when I receive these 5xx errors I don't see anything in app insights? Also how 15 connections can exceed the limit when the limit is 1920? Are these connections related to my errors and how they can be fixed?

You don't see them in Application Insights because they're happening at IIS level which is breaking the request, and because of that, data is not being sent to Application Insights.
The place to look for information is "Diagnose and solve problems", then "Availability and Performance". More info in here:
https://learn.microsoft.com/en-us/azure/app-service/overview-diagnostics
PS: I do think the problem is related to the Dispose of your HTTPClient. It's a well known issue and the reason why they've introduced HttpClientFactory. More info in here:
https://www.stevejgordon.co.uk/httpclient-creation-and-disposal-internals-should-i-dispose-of-httpclient
https://stackoverflow.com/a/15708633/1384539

Related

Azure Functions service not recognizing request sent from outside client

We have a service which pings our EP1 Premium service and yesterday we received 3 client side timeout errors after 2 minutes of waiting. When opening the trace in App insights, these requests which time out are not even logged and have no trace of ever being received Azure side, and therefore stay unanswered. By looking at the metrics provided in the Azure Functions app, I found out that 1-2 minutes after the request has been sent, the app loses all its ability to work as its Total App Domains falls to 0 as well as all connections, threads and so on and this state lasts until the next request is received, therefore "skipping" the request that happened beforehand. This is a big issue as I need to make sure requests get answered in a timely manner.
The client service sent HTTP requests to the Azure Functions app expecting an answer, only to time out while the Azure-side doesn't have any record of ever receiving the request.
I believe this issues is related to Consumption Plan of Azure Functions called Cold Start behaviour. The "skipping" mechanism is explained below:
Apps may scale to zero when idle, meaning some requests may have additional latency at startup. The consumption plan does have some optimizations to help decrease cold start time, including pulling from pre-warmed placeholder functions that already have the function host and language processes running.https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#cold-start-behavior
Please also consider of having look on this article, which explains the behaviour. https://azure.microsoft.com/en-us/blog/understanding-serverless-cold-start/

Why I got an error request every 5 minutes in an Azure App Service

I have a java web app on Azure, and I got failed requests in it's Application Insights. It look likes someone are calling 'http://myApp.azurewebsites.net/error' every 5 minutes, but I do not have this interface, so there are many failed requests with 404 in Application Insights. Then I add this interface in app, but there are still many failed requests with 404 code. I have no idea about those requests, I do not know where are them from or what do them want to do. Did I set wrong configurations in my app?
There is a setting named 'Always on' in App Service's configuration, and it's works fine when I turned off this setting.
To narrow down this issues, you can enable the Diagnostic log for your web apps. Web Server Diagnostic logging helps you to trace the exception details originate from components. And if you suspect error comes from your application then "Application Diagnostic" is the source to trace the reason for errors.
Also, Enable the log stream on your web app so irrespective of peak or off peak hours, you can monitor the live log stream , how your web app performs and respond to each request.
It's caused by "Always On" being ON under the Configuration / General settings of your AppService.
As per the docs:
Always On: Keeps the app loaded even when there's no traffic. When Always On is not turned on (default), the app is unloaded after 20 minutes without any incoming requests. The unloaded app can cause high latency for new requests because of its warm-up time. When Always On is turned on, the front-end load balancer sends a GET request to the application root every five minutes. The continuous ping prevents the app from being unloaded.
To mitigate the impact, you can add a controller/ action that handles the default route.

Azure Http connection gets interrupted after 5 minutes

We have a setup with several RESTful APIs on the same VM in Azure.
The websites run in Kestrel on IIS.
They are protected by the azure application gateway with firewall.
We now have requests that would run for at least 20 minutes.
The request run the full length uninterrupted on Kestrel (Visible in the logs) but the sender either get "socket hang up" after exactly 5 minutes or run forever even if the request finished in kestrel. The request continue in Kestrel even if the connection was interrupted for the sender.
What I have done:
Wrote a small example application that returns after a set amount of
seconds to exclude our websites being the problem.
Ran the request in the VM (to localhost): No problems, response was received.
Ran the request within Azure from one to another VM: Request ran forever.
Ran the request from outside of Azure: Request terminates after 5 minutes
with "socket hang up".
Checked set timeouts: Kestrel: 50m , IIS: 4000s, ApplicationGateway-HttpSettings: 3600
Request were tested with Postman,
Is there another request or connection timeout hidden somewhere in Azure?
We now have requests that would run for at least 20 minutes.
This is a horrible architecture and it should be rewritten to be async. Don't take this personally, it is what it is. Consider returning a 202 Accepted with a Location header to poll for the result.
You're most probably hitting the Azure SNAT layer timeout —
Change it under the Configuration blade for the Public IP.
So I ran into something like this a little while back:
For us the issue was probably the timeout like the other answer suggests but the solution was (instead of increasing timeout) to add PGbouncer in front of our postgres database to manage the connections and make sure a new one is started before the timeout fires.
Not sure what your backend connection looks like but something similar (backend db proxy) could work to give you more ability to tune connection / reconnection on your side.
For us we were running AKS (azure Kubernetes service) but all azure public ips obey the same rules that cause issues similar to this one.
While it isn't an answer I know there are also two types of public IP addresses, one of them is considered 'basic' and doesn't have the same configurability, could be something related to the difference between basic and standard public ips / load balancers?

Azure - App Availability percentage is Zero

our Api app is in UAT on Azure with service plan (Standard 3 large). What should we do if App Availability is Zero. It is getting slow response or timeout issue. When i restart the application it is up to normal. (We are using Parallel Language programming.(Async/Await)
How to find the route cause from it for slowness issue.
Ensure that Always On feature is enabled.
Such problems may be caused by application level issues, such as:
network requests taking a long time
application code or database queries being inefficient
application using high memory/CPU
application crashing due to an exception
You could enable web server diagnostics to fetch more details on the issue.
Detailed Error Logging - Detailed error information for HTTP status codes that indicate a failure (status code 400 or greater). This may contain information that can help determine why the server returned the error code.
Failed Request Tracing - Detailed information on failed requests, including a trace of the IIS components used to process the request and the time taken in each component. This can be useful if you are attempting to improve web app performance or isolate what is causing a specific HTTP error.
Web Server Logging - Information about HTTP transactions using the W3C extended log file format. This is useful when determining overall web app metrics, such as the number of requests handled or how many requests are from a specific IP address.
Also, Azure Application Insights collects telemetry from your application to help analyze its operation and performance. You can use this information to identify problems that may be occurring or to identify improvements to the application that would most impact users. This tutorial takes you through the process of analyzing the performance of both the server components of your application and the perspective of the client: https://learn.microsoft.com/en-us/azure/application-insights/app-insights-tutorial-performance
Ref: https://learn.microsoft.com/en-us/azure/app-service/app-service-web-troubleshoot-performance-degradation

Azure WebSites / App Service Unexplained 502 errors

We have a stateless (with shared Azure Redis Cache) WebApp that we would like to automatically scale via the Azure auto-scale service. When I activate the auto-scale-out, or even when I activate 3 fixed instances for the WebApp, I get the opposite effect: response times increase exponentially or I get Http 502 errors.
This happens whether I use our configured traffic manager url (which worked fine for months with single instances) or the native url (.azurewebsites.net). Could this have something to do with the traffic manager? If so, where can I find info on this combination (having searched)? And how do I properly leverage auto-scale with traffic-manager failovers/perf? I have tried putting the traffic manager in both failover and performance mode with no evident effect. I can gladdly provide links via private channels.
UPDATE: We have reproduced the situation now the "other way around": On the account where we were getting the frequent 5XX errors, we have removed all load balanced servers (only one server per app now) and the problem disappeared. And, on the other account, we started to balance across 3 servers (no traffic manager configured) and soon got the frequent 502 and 503 show stoppers.
Related hypothesis here: https://ask.auth0.com/t/health-checks-response-with-500-http-status/446/8
Possibly the cause? Any takers?
UPDATE
After reverting all WebApps to single instances to rule out any relationship to load balancing, things ran fine for a while. Then the same "502" behavior reappeared across all servers for a period of approx. 15 min on 04.Jan.16 , then disappeared again.
UPDATE
Problem reoccurred for a period of 10 min at 12.55 UTC/GMT on 08.Jan.16 and then disappeared again after a few min. Checking logfiles now for more info.
UPDATE
Problem reoccurred for a period of 90 min at roughly 11.00 UTC/GMT on 19.Jan.16 also on .scm. page. This is the "reference-client" Web App on the account with a Web App named "dummy1015". "502 - Web server received an invalid response while acting as a gateway or proxy server."
I don't think Traffic Manager is the issue here. Since Traffic Manager works at the DNS level, it cannot be the source of the 5XX errors you are seeing. To confirm, I suggest the following:
Check if the increased response times are coming from the DNS lookup or from the web request.
Introduce Traffic Manager whilst keeping your single instance / non-load-balanced set up, and confirm that the problem does not re-appear
This will help confirm if the issue relates to Traffic Manager or some other aspect of the load-balancing.
Regards,
Jonathan Tuliani
Program Manager
Azure Networking - DNS and Traffic Manager

Resources