WAS stops after iisreset after enabling startMode and preloadEnabled on appPools

WAS stops after iisreset after enabling startMode and preloadEnabled on appPools - iis

Following the steps in this article I am trying to turn on preloadEnabled and startMode in my applications (I have about 20 WCF services in their own app pools and then 3 client sites) https://learn.microsoft.com/en-us/iis/get-started/whats-new-in-iis-8/iis-80-application-initialization
Now when I do an iisreset, WAS dies immediately. If I go back and start it manually and then the W3PS, everything starts like I want it to.
Is there any hope for being able to do an iisreset ever again?
Doing it with /stop and /start works fine. It's literally just the bang turnaround of no flags that's an issue.
Event code for WAS fail was 5011 and the code was 8007006d
IIS v 10.0.15063.0
Windows 10 1703 (15063.726)

For the future internet visitor to this question, what we discovered in our testing and working through the bug was that we have a background thread related to our pubsub internally that isn't shutting down cleanly, and is causing the process to hang before the reload has a chance to happen, causing corruption in WAS.
Starting the dependency stopped the issue for us during the next round of testing, and now we get to go research a new bug internally. Fun times.

Related

why did Windows 10 kill my worker thread in my Windows Service?

I have implemented a Windows Service in .NET 4 and C# using Visual Studio 2015. In the OnStart method, my service creates a "Worker Thread" which loops, blocking on an AutoResetEvent (connected to a Timer). The "Worker Thread" also has a CancellationTokenSource that it manually polls to check whether it should stop working. The OnStop method first Cancels with the CancellationTokenSource and then signals with the AutoResetEvent, waking up the Worker thread which notices that "cancel" has happened, and which then exits.
This Windows Service works fine on Win7. I've installed it on Win10 a few times and it worked fine - until yesterday where I saw a logfile which said the Worker Thread had, as expected, blocked on the AutoResetEvent, but: that worker Thread never got to the next statement in my program. It never woke up, didn't execute another statement. 94 seconds after it blocked, the service ended.
Note that in this scenario my OnStop method was not called.
The process as a whole had not been killed - there's a thread which comes in (threadID 1) presumably from the SCM or the OS, and which blocks until my service dies - that thread logged a few messages on the way out.
Here's my question: are there conditions where the Windows Service Control Manager (SCM) feels it necessary to actually kill off application threads? Can anyone explain why my thread died?

Short Answer to the Original Question
I discovered that restarting the machine caused my thread to die (interesting that the Windows-allocated thread didn't die, it just exited - I guess that's how the SCM cleans up services when it is told to shut down).
Longer Answer - Now I have a New Question
I didn't realize that I was barking up the wrong tree. My client app was saying the service was unavailable and I assumed it was because of what I described above - because it was getting killed off.
Not the problem - the problem was that once the service was stopped (because of shutdown), it was not getting restarted again!
So why didn't my .NET 4 Automatic service get automatically started, in Win10? That's a different question.
The New Answer
I have since posting my original question, found several postings that answer the new question:
Automatic Windows 10 service doesn't start up
Automatic Services don't start automatically after windows restart
Automatic Services not starting on Win10 upgrade machine
So the diagnosis was that Win10 was doing some kind of combination of slowing down the startup of my .NET 4 service, and then punishing the service for its slow startup by killing it before it can start executing my code.
I did find System Event log entries that support this.
The solution for me turned out to be changing my "start type" from "Automatic" to "Automatic (Delayed)".
Some relevant links for me:
Automatic vs Automatic Delayed Start
Set service StartType in WiX
I hope all this helps someone else out there.

Application pool disabling

I have an application in the Production environment which is Windows Server 2012/IIS 8 and is load balanced.
Recently out of nowhere, the website app pool suddenly started gettig disabled. The System Windows Logs logged the following error message by the Resource-Exhaustion-Detector ...
Application Pool 'x' is being automatically disabled due to a series of failures in the process(es) serving that application pool.
Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: w3wp.exe (6604) consumed 5080641536 bytes, w3wp.exe (1572) consumed 477335552 bytes, and w3wp.exe (352) consumed 431423488 bytes.
Anyone got any idea how I figure out what is happening? We've never come across this issue before and the application has been running for a good couple of years.
Also, this isn't something that happens regularly but instead seems to happen one every day or so, and even that is at a random time. The Virtual Memory was initially 4GB but because of the issue above, we increased it to 8GB. Recently it spiked at using about 6.8GB out of 8GB, which it has no reason to do so.
Any help would be really appreciated!

The answer is easy here, obviously and certainly you have two issues here
1- You have a serious bug in your process/code that happens intermittently "you need to debug it to find how/when that happens" or at least run a ProcDump
such that you keep it listening on the server on the process W3WP till an exception happens and then analyze this dump to find where the code get stuck and consume that memory/otherwise just debug the code and see what changes were made in last few months "not days"
2- the application get stopped because you have configured/it is configured by default to get disabled break after a certain number of failure repeats, and that's a normal behavior but the main issue as I said is not the application pool itself, its inside the process
please let me know if you need a further explanation or help on this matter

IIS 7.5 connection problems

In the last couple of weeks my site has hung. Connections increase but seem to never release. Once it hits 700+ current connections (using the performance tool) the whole site hangs and I have no choice but to do iisreset to get it working again. Normally it's around 100 concurrent connections at peak time. No errors or warnings in the event log when it stops releasing so that isn't helpful. This problem has been happening after copying new DLLs over the old ones to do a site update. I have a single server so no choice but to copy over live. But in the case of today it was fine after the update but then the problem happened two hours later. It's .NET 4.5 site on Windows 2008 R2 64bit. Is there a way I can find out what's causing this, like other log files some place or something I should try doing when it happens? What I have tried is recycling the app pool (doesn't help), turning off the app pool and back on (doesn't turn on, gives exception), restarting the site (doesn't help), and iisreset (works every time).

Usually some requests are "stuck" when you see these syndromes, as app.pool recycle and site restart will wait for the pending requests to finish in general (graceful exit), while iisreset will actually kill the w3wp.exe process after 20 (or 30?) seconds if it doesn't exit on its own (non-graceful exit), this why iisreset works for you.
Listing the active requests (appcmd.exe list request /?) should give you a clue about why is this happening.

IIS7: Faulting application w3wp.exe, what is the root cause of these crashes?

Our Website is in .NET but with some old ASP and 32bits libraries too in it. It had been working fine for a while (2 years). But for the past month, we have seen the following error on our IIS7 server, which we have been unable to track down and fix:
"Faulting application w3wp.exe, version 7.0.6001.18000, time stamp 0x47919413, faulting module kernel32.dll, version 6.0.6001.18215, time stamp 0x4995344f, exception code 0xe053534f, fault offset 0x0002f328, process id 0x%9, application start time 0x%10."
We are able to reproduce the error:
One of our .ASPX pages starts loading, executing code and queries (we have response.flush() all over the page to track where the code breaks), then it suddenly stops and we get the above error in IIS.
The page stops loading and, without the response.flush(), it's not redirecting to our error.aspx page (as configured in web.config)
The error does NOT happen all the time. Sometimes, it happens 3 times in a row, then it's working fine for 15 minutes non-stop with a proper redirection to error.aspx.
The error we get then is a classic: "Either BOF or EOF is True, or the current record has been deleted."
When the error occurs, the page hangs and all other session on the same computer from any browsers have hanging web pages as well (BTW, we only allow 1 worker process while we are testing). From other computers, the site loads fine.
I can recycle the Application Pool, kill w3wp.exe, restart IIS. Nothing will do. The only way to successfully load the page again is to Restart MS SQL which handles our Session States. I don't know why this is, but we guessed that the Session Cookies on the users browsers points to a thread which was not terminated properly (due to the above crash) and IIS is waiting for it to terminate to process more code (?). If someone can explain this better, that would be really helpful. Is there a timeout which we can set to "terminate" threads? Is it a MS SQL related issue?
I have also looked at the Private and Virtual Memory usages, because I think our code is not the most effective and I am certain we have remaining memory leaks. However, I saw the page crash even though both Private and Virtual Memories were still quite low (under 100MB each).
I have used Debug Diag and WinDbg as indicated here: http://blogs.msdn.com/b/tess/archive/2009/03/20/debugging-a-net-crash-with-rules-in-debug-diag.aspx, but we are not able to make windbg work, this is what we are trying to do at the moment.
If someone could help us or point us toward the right direction that would be really great, thank you.

"Either BOF or EOF is True, or the current record has been deleted" means the table is empty and you are attempting to do a MoveNext. So check for eof before you do any moves.
IIS is notorious for throwing kernel errors in w3wp.exe like this one. All your errors in session state are just symptoms of the crashed process. Multiple APP pools won't help much - they just spread the error around.
I''d wager it is SQL deadlocks due to your user environment changing. This will cause a 10-second lag as SQL tries to determine which query to kill off. One wins, one loses. The loser gets back a pointer to an unexpectedly empty table and you try a move and subsequent crash. You maybe could point your DB to an ODBC connection and turn on tracing, or figure out a way to get SQL to log it.
I had all the same symptoms as above in Perl. I was able to make a wrapper fn() to do all SQL queries and log all sql, + params and any errors to disk to track down the problem. It was deadlocks, then we were able to code in auto-retry, and eventually we recoded the query order and scanned columns to eliminate the deadlocks.

It's entirely possible one of your referenced/linked assemblies somewhere has randomly gone corrupt (it can happen) on disk. Can you try a replicate the problem on a new, clean machine with the same stats, fresh installs of the latest xyz drivers you're using?
I solved a mysterious problem that took me months to isolate this way. It seemed clean, new machines with the same specs and prerequired drivers would work just fine - only some older machines with the same specs were failing consistently. I ended up uninstalling everything (IIS, ASP.NET, .NET, database and client) and starting from scratch. The end cause when I isolated it was that the db client driver was corrupt on the older machines (and all the older machines were clones of each other, so I assume they were cloned after the corruption occured), and it seemed to be messing with the .NET memory space even when I wasn't calling it directly. I have yet to even reply to my "help me debug this monster" post with this answer because I doubted it would ever help anyone.

We started receiving this error after installing windows updates on a Windows Server 2008R2 machine. Windows Process Activation Service (WAS) installs some additional site bindings that caused issues for our setup.
We removed net.tcp, net.pipe, net.msmq, and msmq.formatname bindings from our website and no longer got the faulting application exception.

This is probably an edge case, but just in case someone is coming here and they are using MVCMailer , I was getting this same error due to the .SendAsync() method on the mailers.
I switched them all to .Send() and the crashing stopped.
See this SO answer for ways to use the mailer async and avoid the crash (allegedly, I did not personally implement it)

Sharepoint W3WP.EXE Process Consuming 100% CPU - Corrective Action?

We have a Windows Server 2003 web server, and on that server runs about 5-6 top level Sharepoint sites, with a different application pool for each one.
There is one W3WP process that keeps pegging 100% for most of the day (happened yesterday and today) and it's connected (found by doing "Cscript iisapp.vbs" at the command line and matching ProcessID) to a particular Sharepoint site...which is nearly unusable.
What kind of corrective action can I take? These are the following ideas I had
1) Stopping and restarting the Web Site in IIS - For some reason this didn't stop the offending W3WP process??? Any ideas why not?
2) Stopping and restarting the associated Application Pool.
3) Recycling the associated Application Pool.
Any of those sound like the right idea? If not what are some good things to try? I can't do an iisreset since I don't want to alter service to the other, much more heavily used, Sharepoint sites.
If I truly NEED to do some diagnostic work please point me in the right direction. I'm not the Sharepoint admin guy (he's out of town so I'm filling in even though I'm just a developer) but I'll do my best.
If you need any information just let me know and I'll look it up (slowly though, as that one process is pegging the entire machine).

It's not an IISReset that you need. You have a piece of code that is running amok with your memory. Most likely it's not actually a CPU problem but a paging problem. I've encountered this a few times with data structures in memory that grow too large to page in/out effectively and eventually the attempt to page data just begins consuming everything. The steps I would recommend are:
1) Go get the IIS Debug Diagnostics tools. And learn how to use them.
2) If possible, remove the session state from InProc to a state server or a sql server (since this requires serialization of all classes that go into session this may not be possible). This will help alleviate some process related memory issues.
3) Go to your application pool and adjust the number of worker processes upward. Remove Rapid fail protection (this will allow the site to continue serving pages even if rapid catastrophic errors occur).
The IIS debug diagnostics will record a LOT of data, but you can specify specific "catch" alerts that will detect hangs, excessive cpu usage etc. It will capture gigs of data, so be ready for a long wait when attempting to view the logs.

Turns out someone tried to install some features that went haywire.
So he wrote a stsadm script to uninstall those features
Processor was still pegging.
I restarted the IIS Application Pool for that IIS process, didn't fix it.
So then I restarted IIS for that site and that resolved the processor issue.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string