IIS zero down time site set up is still randomly sending requests to the server that is set to unhealthy - iis

We have a blue, green site and server farm set up for zero downtime deployment which works fine but we are seeing in the web logs for the site that does the routing that every now and then the wrong server is being sent requests.
2023-02-19 09:50:05 /DocumentView DID=23932&SERVER-ROUTED=LIVE-GREEN
2023-02-19 09:50:09 /FileDownload FID=50516&SERVER-ROUTED=LIVE-BLUE
2023-02-19 09:50:13 /Publish DocID=9154358&SERVER-ROUTED=LIVE-BLUE
2023-02-19 09:50:15 /SiteView DID=23932&SERVER-ROUTED=LIVE-GREEN
In this instance it tries to send two requests to Live-Blue which is set to Unhealthy and the site Stopped. The Health Test interval is set to 1 seconds (at the moment).
It's not happening a lot(maybe once every 25k request) but is very annoying to those that do get the 502 error message. (502.3 - Bad Gateway: Forwarder Connection Error (ARR).)
Anyone have any ideas on how to fix or diagnose the issue?

Related

How to Troubleshoot IIS Error 502.3 Codes 12030 and 12152 Sporadically Occurring

I have set up a server using IIS on Windows Service 2016 using python FastAPI with uvicorn. Sending individual querying works well, but I had been moving to testing the API with parallel queries using k6. I had been sending queries with 30 VU across 1min with a random sleep between, resulting in around 2.1 requests/sec. However I noticed that the service had been having sporadic 502.3 errors about 15% of the time.
The error codes tagged to it were: 12030 and 12152. According to https://learn.microsoft.com/en-us/windows/win32/winhttp/error-messages:
ERROR_WINHTTP_CONNECTION_ERROR
12030
The connection with the server has been reset or terminated, or an incompatible SSL protocol was encountered. For example, WinHTTP version 5.1 does not support SSL2 unless the client specifically enables it.
ERROR_WINHTTP_INVALID_SERVER_RESPONSE
12152
The server response cannot be parsed.
The failure percentage seems to scale with higher number of requests per second.
I had checked the httperr logs under C:\Windows\System32\LogFiles\HTTPERR, but only saw Timer_ConnectionIdle which I read to be not an issue.
How else can I troubleshoot these error 502.3 to see what the issue is?
UPDATE 2022/12/20:
Managed to get the FRT for one of the occurrences. How to proceed to troubleshoot? It does seem to be just indicate 502.3 error
Event: MODULE_SET_RESPONSE_ERROR_STATUS
ModuleName: httpPlatformHandler
Notification: EXECUTE_REQUEST_HANDLER
HttpStatus: 502
HttpReason: Bad Gateway
HttpSubStatus: 3
ErrorCode: 2147954552

Dash App on IIS Webserver Timing Out with HTTP500 SC258 Error

I have ISS Server hosting Dash app.
What I am noticing in the browser is that app starts but doesn't return anything back. Looking at the HTTP logs, it shows HTTP 500 error with sc-win32-status 258. Timetaken value from HTTP logs is around 100seconds.
There are HTTP 200 with same time taken value...so that tells me its not only timeout issue. I didn't change the default timeout value in IIS (which I think is more than 5 mins?)
Trying to figure out what is causing this intermittent timeout issue.
Thanks!
The issue was in the timeout duration in IIS > FastCGI Settings > Process Model > Activity Timeout.
The operation was taking long time then it was set to communication with IIS.
Increasing the timeout to 180seconds seemed to fix the issue for me.

Server not receiving a response from the application - IIS Log shows error 1236

Server 1 is sending an xml message via IIS to Server 2.
Server 2 receives it, and send back an acknowledgment message to Server 1.
Upon receipt of that message, Server 1 sends the next message in the queue.
However, Server 1 intermittently (4/5 times a week) does not receive the acknowledgment message (we tested the issue and proved that Server 1 is sending the acknowledgment message).
The IIS logs for the time is is occurring tells us there's an error 1236 (sc-win32-status 1236 - which means "The network connection was aborted by the local system").
We're at a loss as to what is causing this or how to fix it. Interested to see if anyone has come across an issue like this before...
How did you prove that Server 2 is sending the acknowledgement message -- through network tracing on Server 1, or some other means? Logs within the software may not be enough. Barring anything bad going on at the networking level, it is possible that one of the sides is having an exception, and aborting the connection as a result. The application pools may be auto-recycling due to IIS recycle rules, and although IIS should properly handle it a pool re-start, maybe something did not occur as expected. When one pool starts, and the other one is processing the final requests on shutdown, maybe there is some locking going on, not expecting two processes running at the same time.

Slow response times from free web app server every day at same time

Every day at about 3:00PM-4:00PM GMT the response times start to increase (no memory increase or CPU increase)
There is a azure availability test going to server every 10 minutes.
As this is a dev site there is no traffic to it other than me (at the odd time) and the availability test
I log to a variable internally the startup time and this shows that the site is not restarting
The first request via a browser when this starts happening is very slow (2 minutes - probably some timeout).
After that it runs perfectly. That seems like the site is shutting down and then starting up on first request, but the pings are keeping it alive so the site is not shutting down (as far as I know)
On the odd log entry I get - I seem to be getting 502 errors - but I can't confirm this as the FEEB logs are usually off at this time.
FREB logs turn off automatically after 1 hour and as this is the middle of the night for me (NZDT) - I don't get a chance to turn on.
See attached images - as you can see the response times just increase at same time
Ignore the requests where they are above 20 - thats me going to it via browser
I always check the azure dashboard BEFORE viewing site in browser
Just got this error (from web browser randomly - keep accessing the same page:
502: The specified CGI application encountered an error and the server terminated the process.
Other relevant Info (Perhaps):
I initially had the availability test ping going to a ping endpoint /ping that only returned a 200 and empty string when I noticed this happening
It now points to the sites homepage to see if it changed anything - still the same.
Assuming the database is not the issue as the /ping endpoint doesn't touch the database - just a straight controller return.
Internal Exception handling is catching nothing
Service: Azure Free Web App (Development)
There are no web jobs or timed events on this site
Azure Dashboard Initial
Current tests:
Uploading as new site to a Basic 1 Small
Restarting dev site 12 hours before issues (usually 20 hours before)
Results:
Restarting free web-app 12ish hours before issue - same result at same time - so its not the app slowly overloading or it would me much later
Basic 1 Small: no problems - could it be something with the dev server ?
Azure Dashboard From Today
Observations:
Same behavior with /ping endpoint (just return empty string 200 Ok) and Main home page endpoint (database lookups [w/caching] / razer)
If anyone has any ideas what might be going on - I would very much appreciate it
:-)
Update:
It seems to of stopped (on its own) about 11/1/2016 1:50:49 AM GMT - my internal timestamp says it restarted - and then the errors started again same time as usual. Note: no-one is using the app. The basic 1 Small Server is still going fine.
Sorry I can't add anymore images (not enough rep)
By default, web apps are unloaded if they are idle for some period of time, which could cause the web site slow response during this period of time. Besides, this article is about troubleshooting HTTP "502 Bad Gateway" error or a HTTP "503 Service Unavailable" error in Azure web apps, you could read it. And from the article we could know scaling the web app could mitigate the issue.

Azure web app random 502 errors. No crash no slow requests

This week we have started getting 502 errors on our Web App, these are random; some times they happen when there is consistent load other times they happen with even a single request.
I have checked event viewer and there is no application crash, also don't have any really slow requests in IIS logs. I had auto-heal enabled, which is now disabled. I have also enabled auto scale and even with 4 instances running I get 502 error every once in a while. There is no log entry for this 502 in IIS logs, so I am guessing something upstream is returning this response, I just don't know why its doing that and why its so random.

Resources