ASP.NET Core randomly restarts under IIS - iis

We found a problem, ASP.NET Core application on some servers randomly restarts every day at random times.
Windows Server 2012 R2 Standard, IIS 8.5.9600.16384
Hosting bundle Microsoft.NETCore.App 2.1.4, Microsoft.AspNetCore.App 2.1.4
Out of process hosting model
Application pool recycling is disabled
In Event Viewer
In Application Section
Application 'MACHINE/WEBROOT/APPHOST/APPLICATIONNAME' started process
'xxx' successfully and is listening on port 'xxx'. (Event ID 1001)
Sent shutdown HTTP message to process '6860' and received http status '202'. (Event ID 1006)
App_offline file 'app_offline.htm' was detected (Event ID 1012)
And in System Section at same time
The TCP/IP NetBIOS Helper service entered the running state. (Event ID 7036)
The TCP/IP NetBIOS Helper service entered the stopped state. (Event ID 7036)
The TCP/IP NetBIOS Helper service was successfully sent a stop control.
The reason specified was: 0x40030011 [Operating System: Network Connectivity (Planned)]
Comment: None (Event ID 7042)
All events happen in 2-3 seconds

It sounds like you have periodicRestart configured on IIS. https://learn.microsoft.com/en-us/iis/configuration/system.applicationhost/applicationpools/add/recycling/periodicrestart/ I think the default interval is 29 hours. Disable this setting.

Very strange behavior IIS and AspNetCoreModule when our application node have problems with DHCP when lease time has expired and LAN adapter refreshes DHCP leases.
At this point, the aspnet module starts to see the app_offline.htm file (we have no idea why this is happening) and then immediately restarts the application.
And only servers with this problem, DHCP was turned on.
We specified a static IP address in the adapter settings to solve this problem.

I had a similar problem, where IIS stop the application pool randomly with the event log :
Sent shutdown HTTP message to process with Id '11076' and received http status '202'.
follow by
Application '/LM/W3SVC/82/ROOT/<MyApp>' with physical root 'Z:\wwwroot\<MySite>\<MyApp>\' shut down process with Id '11076' listening on port '42901'
IIS has a IDLE functionality, that stop the application pool when it is idle (no request during a period).
By default, the period is 20 minutes. I see the application pool start on the first request and it is stopped 20 minutes after.
To disable the IDLE, you can specify a interval time to 0 (corresponding to infinity) in the default pool application settings :

Related

Azure AspNetCore WebApp under high load returns "The specified CGI application encountered an error and the server terminated the process"

I'm hosting my AspNetCore app in Azure (Windows hosting plan P3v2 plan). It works perfectly fine under normal load (5-10 requests/sec) but under high load (100-200 requests/sec) starts to hang and requests return the following response:
The specified CGI application encountered an error and the server terminated the process.
And from the event logs I can get even more details:
An attempt was made to access a socket in a way forbidden by its access permissions aaa.bbb.ccc.ddd
I have to scale instance count to 30 instances, and while each instance getting just 3-5 requests per sec, it works just fine. I beleive that 30 hosts is too much to process this high load, beleive that the resource is underutilized and trying to find the real bottleneck. If I set instance count to 10 - everything crashes and every request starts to return the error above. Resources utilization metrics for the high load case with 30 instances enabled:
The service plan CPU usage is low, about 10-15% for each host
The service plan memory usage is around 30-40%
Dependency responses quickly, 50-200 ms
Azure SQL DTU usage is about 5%
I discovered this useful article on current tier limits and after an Azure TCP connections diagnostics I figured out a few possible issues:
High outbound TCP connection
High TCP Socket handle count - High TCP Socket handle count was detected on the instance .... During this period, the process dotnet.exe of site ... with ProcessId 8144 had the maximum open handle count of 17004.
So I dig more and found the following information:
Per my service plan tier, my tcp connections limit should be 8064 which is far from the displayed above. Next I've checked the socket state:
Even though I see that number of active TCP connections is below the limit, I'm wondering if open socket handles count could be an issue here. What can cause this socket handle leak (if any)? How can I troubleshoot and debug it?
I see that you have tried to isolate the possible cause for the error, just highlighting some of the reasons to revalidate/remediate:
1- On Azure App Service - Connection attempts to local addresses (e.g. localhost, 127.0.0.1) and the machine's own IP will fail, except if another process in the same sandbox has created a listening socket on the destination port. Rejected connection attempts, normally returns the above socket forbidden error (above).
For peered VNet/On_premise, kindly ensure that the IP address used is in the ranges listed for routing to VNet/Incorrect routing.
2.On Azure App service - If the outbound TCP connections on the VM instance are exhausted. limits are enforced for the maximum number of outbound connections that can be made for each VM instance.
Other causes as highlighted in this blog
Using client libraries which are not implemented to re-use TCP connections.
Application code or the client library is leaking TCP socket handles.
Burst load of requests opening too many TCP socket connections at once.
In case of higher level protocol like HTTP this is encountered if the Keep-Alive option is not leveraged.
I'm unusure if you have already tried the App Service Diagonstic to fetch more details, kindly give that a shot:
Navigate to the Diagnose and solve problems blade in the Azure portal.
In the Azure portal, open the app in App Services.
Select Diagnose and solve problems > "TCP Connections"
Consider optimizing the application to implement Connection Pooling for your .Net/Observe the behavior locally. If feasible restart the WebApp and then check to see if that helps.
If the issue still persists, kindly file a support ticket for a detailed/deeper investigation of the backend logs.

IIS Zero Downtime Update ARR / Reverse Proxy

I have a C# console application / Windows sevice that uses the HttpListener stuff to handle requests, IIS is setup to reverse proxy to this via ARR.
My problem is that when I update this application there is a short downtime between the old instance being shut down and the new one being ready.
The approach I'm thinking about would be to add 2 servers to the server farm via local hostnames with 2 ports and on update I'd start the new instance which would listen on the unused port, stop listening for new requests on the old instance and gracefully shut it down (ie process the current requests). Those last 2 steps would be started by the new instance to ensure that it is ready to handle the requests.
Is IIS ARR load balancing smart enough to try the other instance and mark the now shut down one as unavailable without losing any requests until the new one is updated or do I have to add health checks etc (would that again lead to a short downtime period?)
One idea that I believe could work (especially if your IIS is only being used for this purpose) is to leverage the IIS overlapped recycling capabilities that are built-in when you make a configuration change. In this case what you could do is:
start a new instance of your app running listening in a different
port,
edit the configuration in ARR to point to the new port.
IIS should allow any existing requests running in the application pool within the recycling timeout to drain successfully while new requests will be sent to the new application pool.
Maybe if you share a bit more on the configuration you are using in ARR (like a snippet of %windir%\system32\inetsrv\config\applicationHost.config and the webFarms section)

Launch application before server completes startup

I have 2 apps on server: "Websphere Commerce" and "myapp". While Myapp inits, it needs to receive some data from WC using SOAP, however, until both apps are started, the common http port 9060 isn't listening.
There's a flag:
Enterprise Applications > * > Startup behavior
Startup order
Launch application before server completes startup
It's cleared for both apps. I thought, WAS would first report:
TCP Channel TCP_2 is listening on host * (IPv6) port 9060.
Server server1 open for e-business
then start the apps, but it first starts them, then opens the port.
Then what does this flag do?
Check this page Startup behavior settings
Launch application before server completes startup
Specifies whether the application must initialize fully before the
server starts. The default setting of false indicates that server
startup will not complete until the application starts.
A setting of true informs the product that the application might start
on a background thread and thus server startup might continue without
waiting for the application to start. Thus, the application might
not be ready for use when the application server starts.
So it is other way around, server first ensures that applications are started and then opens port to allow traffic to them.

Azure Cloud Service Worker Role not running after reboot or publish

I have azure cloud service worker role running, only 1 role instance
The worker role acts as a TCP server listening on port a port which is configured in the service definition file.
So after the role instance is running, my tcp client program is able to connect to the work role.
But, every time when I reboot the role instance, or publish a new version within the visual studio, i wait the reboot or publish finish, the azure portal says it's status is running, the tcp client program is still not able to connect the server, BUT, without doing nothing, about 10 mins later, it fixed itself, the tcp client is able to connect again.
Where does this 10 min delay come from?
I thought as soon as the role instance's status becomes Running, it should work again.
First, I thought it is because of the Load balancer. But, I remote in the that role instance, and use command line netstat -A , the port is not even listening. So, seems my code for the worker role is not running?
When 10 min later, it is good for connect, I went to remote desktop, and use netstat -A again, now that port is listening.
So, after the reboot/publish, I have to wait for 10 mins to have my worker role code running?
Or I am missing something here?
Hard to say, but the following references should help:
http://blogs.msdn.com/b/kwill/archive/2011/05/05/windows-azure-role-architecture.aspx. This gives you the architecture of the processes running inside your service. When you RDP and netstat shows the port is not listening, what do you see as far as processes? Is WaWorkerHost.exe running?
http://blogs.msdn.com/b/kwill/archive/2013/08/09/windows-azure-paas-compute-diagnostics-data.aspx. This walks through all of the diagnostic data typically used to troubleshoot an issue in an Azure PaaS VM. If you check those logs and event logs do you see anything that stands out between the time when you can't connect and the time that you can?
You can check the Windows Azure event log to see when your OnStart() and Run() methods are started and stopped. If you see that Run() has started but netstat still shows the port as not listening then you know the problem is in your code and you may need to step through with a debugger (you can setup remote debugger so you can use Visual Studio on your desktop to debug the Azure VM - http://blogs.msdn.com/b/cie/archive/2014/01/24/windows-azure-remote-debugging.aspx).

IIS Application seems hang and throws 503 error

Symptoms: The IIS application throws a 503 error, but the application pool is still running. The CPU of the application is around 0.23 percent and the memroy is 4G. It seems that the application The application does not process any Http request. It lasted up to one hour util we terminate the process and restart it.
Environment: Windows Server 2008 R2
IIS 7.0
.Net Framework 3.5
Another point, our application uses session state server, and state server throws an exception "The state server has closed an expired TCP / IP connection. The IP address of the client is 127.0.0.1. The expired Read operation began at 08/30 / 2013 10:02:00. " However there is no other evidence in the Event Viewer of Web Server and Application.
What happened? Does the application hang?
William

Resources