How to Diagnose App Pool Issues - iis

I have a large web app that runs on our two live servers. Part of our server side C# code calls a third party app to do a task for us.
That task works most of the time, but at a certain point it stops working until the AppPool is recycled.
This all happens in w3wp.exe, so I can see it running in process monitor like this (when it is not working),
Thread Create
Access the file PreviewGenerator.exe
Hive unloaded (this is the registry)
Thread Exit
And like this when it is working,
Thread Create
Access the file PreviewGenerator.exe
Process Start
Does heaps of stuff with PreviewGenerator.exe including reading / writing / registry, etc.
Process Exit
Hive unloaded
Thread Exit
How can I debug what is going on in my AppPool and why starting a separate process is not working some of the time?

I found the best thing to do was to create a separate app pool for my application in IIS and set an upper limit for the amount of RAM it could use. Also I found it useful to turn on the 'Generate Recycle Event Log Entry' items under the app pool settings.
You can then go to the system event log and filter out the items with a source of 'WAS' to understand what is going on in the app pools, when they are restarting and when they stop from being idle etc.
I think the main problem in our case is that the IIS box was running out of memory. Tuning the app pools and adding some extra RAM seems to have solved it.

Related

Diagnosing Sporadic Lockups in Website Running on IIS

Goal
Determine the cause of the sporadic lock ups of our web application running on IIS.
Problem
An application we are running on IIS sporadically locks up throughout the day. When it locks up it will lock up on all workers and on all load balanced instance.
Environment and Application
The application is running on 4 different Windows Server 2016 machines. The machines are load balanced using ha-proxy using a round robin load balancing scheme. The IIS application pools this website is hosted in are configured to have 4 workers each and the application it hosts is a 32-bit application. The IIS instances are not using a shared configuration file but the application pools for this application are all configured the same.
This application is the only application in the IIS application pool. The application is an ASP.NET web API and is using .NET 4.6.1. The application is not creating threads of its own.
Theory
My theory for why this is happening is that we have requests that are coming in that are taking ~5-30 minutes to complete. Every machine gets tied up servicing these requests so they look "locked up". The company rolled their own logging mechanism and from that I can tell we have requests that are taking ~5-30 minutes to complete. The team responsible for the application has cleaned up many of these but I am still seeing ~5 minute requests in the log.
I do not have access to the machines personally so our systems team has gotten memory dumps of the application when this happens. In the dumps I generally will see ~50 threads running and all of them are in our code. These threads will be all over our application and do not seem to be stopped on any common piece of code. When the application is running correctly the dumps have 3-4 threads running. Also I have looked at performance counters like the ASP.NET\Requests Queued but it never seems to have any requests queued. During these times the CPU, Memory, Disk and Network usage look normal. Using windbg none of the threads seem to have a high CPU time other than the finalizer thread which as far as I know should live the entire time.
Conclusion
I am looking for a means to prove or disprove my theory as to why we are locking up as well as any metrics or tools I should look at.
So this issue came down to our application using query in stitch on a table with 2,000,000 records in it to another table. Memory would become so fragmented that the Garbage Collector was spending more time trying to find places to put objects and moving them around than it was running our code. This is why it appeared that our application was still working and why their was no exceptions. Oddly IIS would time out the requests but would continue processing the threads.

IIS 8 Application pool recycle vs SignalR 2.3.0

I have a website in IIS 8.5.9600.16384, we communicate with thousands of mobile devices through cyclic synchronisation and through SignalR 2.3.0.
This morning we had an application pool reset during working hours, which caused the SignalR to call "OnReconnect" of all our mobile devices at the same time.
I though that IIS started new processes first and then killed the old, not having downtime.
Can somebody tell me exactly what happens when IIS recycles it's application pool on the SignalR side? And in which cases can there be a connection downtime? (ex : if the server is busy?)
Edited : The application pool was recycled by IIS because of the "time limit". The IT team will change this setting so that the application pools reset every day at night time when it will have a lower impact on our applications.
A worker process with process id of '8720' serving application pool 'DefaultAppPool' has requested a recycle because the worker process reached its allowed processing time limit.
Also confirmed that disallowOverlappingRotation is not set to True. Any hint would help.
A few years later, I'm still getting some problems with the application pool recycle and SignalR. We are occasionally seeing thousands of re-connections of SignalR while the application pool recycle occurs, opening more than 60k TCPIP ports and causing a crash in IIS.
We managed to have it run "okay" for quite some time but it still crashes. Any hint would help. thanks
I'd first identify how IIS was reset. If you experienced a crash or performed an IISReset, the processes would be down before a new one stood back up. If on the other hand you configured AppPool recycling, then the overlapping processes should occur as you mention. I would check the System Event Log for recycling messages. Note that not all recycle reasons are logged by default.
You may also check to make sure disallowOverlappingRotation is not set to True.
Specifies whether the WWW Service should start another worker process to replace the existing worker process while that process is shutting down. The value of this property should be set to true if the worker process loads any application code that does not support multiple worker processes.
https://learn.microsoft.com/en-us/iis/configuration/system.applicationhost/applicationpools/add/recycling/

Web Api Requests Queueing up forever on IIS (in state: ExecuteRequestHandler)

I'm currently experiencing some hangs on production environment, and after some investigation I'm seeing a lot of request queued up in the worker process of the Application Pool. The common thing is that every request that is queued for a long time is a web api request, I'm using both MVC and Web API in the app.
The requests are being queued for about 3 hours, when the application pool is recycled they immediately start queueing up.
They are all in ExecuteRequestHandler state
Any ideas for where should I continue digging?
Your requests can be stalled for a number of reasons:
they are waiting on I/O operation e.g database, web service call
they are looping or performing operations on a large data set
cpu intensive operations
some combination of the above
In order to find out what your requests are doing, start by getting the urls of the requests taking a long time.
You can do this in the cmd line as follows
c:\windows\system32\inetsrv\appcmd list requests
If its not obvious from the urls and looking at the code, you need to do a process dump of the w3wp.exe on the server. Once you have a process dump you will need to load it into windbg in order to analyze what's taking up all the cpu cycles. Covering off windbg is pretty big, but here's briefly what you need to do:
load the SOS dll (managed debug extension)
call the !runaway command
to get list of long running threads dive into a long running thread
by selecting it and calling !clrstack command
There are many blogs on using windbg. Here is one example. A great resource on analyzing these types of issues is Tess Ferrandez's blog.
This is a really long shot without having first hand access to your system but try and check the Handler mappings in IIS Manager gui for your WebApi. Compare it with IIS settings of your DEV or any other Env where it works.
IF this isnt the issue then do a comparison of all other IIS settings for that App.
Good luck.

Idle Time-out Settings for an Application Pool

can someone explain for me this sentence from msdn ?
Idle time-out can be helpful in the following situations:
The server has a heavy processing load.
is the idle timeout for the w3wp.exe process or each user that's connected to the website has a nested process inside the w3wp and this is the idle time for this nested process? if it's the idle process for w3wp as a whole , then what does msdn sentence mean??
It prevents a application pool worker process from hanging around when no users are hitting the web pages hosted by the worker process. When the users stop hitting the web site for a while, IIS stops the process.
This can be annoying if you have an expensive setup/teardown process for the application, such as populating a cache.
The idle timeout is per application pool. You can observe the running pools as well as in IIS in VS by Tools > Attach to Process then I click the Process column header to bring any running w3wp.exe's to the top.
You can set the timeout for each app pool in IIS7+ in Advanced Settings... Process Model section.

Sharepoint W3WP.EXE Process Consuming 100% CPU - Corrective Action?

We have a Windows Server 2003 web server, and on that server runs about 5-6 top level Sharepoint sites, with a different application pool for each one.
There is one W3WP process that keeps pegging 100% for most of the day (happened yesterday and today) and it's connected (found by doing "Cscript iisapp.vbs" at the command line and matching ProcessID) to a particular Sharepoint site...which is nearly unusable.
What kind of corrective action can I take? These are the following ideas I had
1) Stopping and restarting the Web Site in IIS - For some reason this didn't stop the offending W3WP process??? Any ideas why not?
2) Stopping and restarting the associated Application Pool.
3) Recycling the associated Application Pool.
Any of those sound like the right idea? If not what are some good things to try? I can't do an iisreset since I don't want to alter service to the other, much more heavily used, Sharepoint sites.
If I truly NEED to do some diagnostic work please point me in the right direction. I'm not the Sharepoint admin guy (he's out of town so I'm filling in even though I'm just a developer) but I'll do my best.
If you need any information just let me know and I'll look it up (slowly though, as that one process is pegging the entire machine).
It's not an IISReset that you need. You have a piece of code that is running amok with your memory. Most likely it's not actually a CPU problem but a paging problem. I've encountered this a few times with data structures in memory that grow too large to page in/out effectively and eventually the attempt to page data just begins consuming everything. The steps I would recommend are:
1) Go get the IIS Debug Diagnostics tools. And learn how to use them.
2) If possible, remove the session state from InProc to a state server or a sql server (since this requires serialization of all classes that go into session this may not be possible). This will help alleviate some process related memory issues.
3) Go to your application pool and adjust the number of worker processes upward. Remove Rapid fail protection (this will allow the site to continue serving pages even if rapid catastrophic errors occur).
The IIS debug diagnostics will record a LOT of data, but you can specify specific "catch" alerts that will detect hangs, excessive cpu usage etc. It will capture gigs of data, so be ready for a long wait when attempting to view the logs.
Turns out someone tried to install some features that went haywire.
So he wrote a stsadm script to uninstall those features
Processor was still pegging.
I restarted the IIS Application Pool for that IIS process, didn't fix it.
So then I restarted IIS for that site and that resolved the processor issue.

Resources