IIS7: Faulting application w3wp.exe, what is the root cause of these crashes? - iis

Our Website is in .NET but with some old ASP and 32bits libraries too in it. It had been working fine for a while (2 years). But for the past month, we have seen the following error on our IIS7 server, which we have been unable to track down and fix:
"Faulting application w3wp.exe, version 7.0.6001.18000, time stamp 0x47919413, faulting module kernel32.dll, version 6.0.6001.18215, time stamp 0x4995344f, exception code 0xe053534f, fault offset 0x0002f328, process id 0x%9, application start time 0x%10."
We are able to reproduce the error:
One of our .ASPX pages starts loading, executing code and queries (we have response.flush() all over the page to track where the code breaks), then it suddenly stops and we get the above error in IIS.
The page stops loading and, without the response.flush(), it's not redirecting to our error.aspx page (as configured in web.config)
The error does NOT happen all the time. Sometimes, it happens 3 times in a row, then it's working fine for 15 minutes non-stop with a proper redirection to error.aspx.
The error we get then is a classic: "Either BOF or EOF is True, or the current record has been deleted."
When the error occurs, the page hangs and all other session on the same computer from any browsers have hanging web pages as well (BTW, we only allow 1 worker process while we are testing). From other computers, the site loads fine.
I can recycle the Application Pool, kill w3wp.exe, restart IIS. Nothing will do. The only way to successfully load the page again is to Restart MS SQL which handles our Session States. I don't know why this is, but we guessed that the Session Cookies on the users browsers points to a thread which was not terminated properly (due to the above crash) and IIS is waiting for it to terminate to process more code (?). If someone can explain this better, that would be really helpful. Is there a timeout which we can set to "terminate" threads? Is it a MS SQL related issue?
I have also looked at the Private and Virtual Memory usages, because I think our code is not the most effective and I am certain we have remaining memory leaks. However, I saw the page crash even though both Private and Virtual Memories were still quite low (under 100MB each).
I have used Debug Diag and WinDbg as indicated here: http://blogs.msdn.com/b/tess/archive/2009/03/20/debugging-a-net-crash-with-rules-in-debug-diag.aspx, but we are not able to make windbg work, this is what we are trying to do at the moment.
If someone could help us or point us toward the right direction that would be really great, thank you.

"Either BOF or EOF is True, or the current record has been deleted" means the table is empty and you are attempting to do a MoveNext. So check for eof before you do any moves.
IIS is notorious for throwing kernel errors in w3wp.exe like this one. All your errors in session state are just symptoms of the crashed process. Multiple APP pools won't help much - they just spread the error around.
I''d wager it is SQL deadlocks due to your user environment changing. This will cause a 10-second lag as SQL tries to determine which query to kill off. One wins, one loses. The loser gets back a pointer to an unexpectedly empty table and you try a move and subsequent crash. You maybe could point your DB to an ODBC connection and turn on tracing, or figure out a way to get SQL to log it.
I had all the same symptoms as above in Perl. I was able to make a wrapper fn() to do all SQL queries and log all sql, + params and any errors to disk to track down the problem. It was deadlocks, then we were able to code in auto-retry, and eventually we recoded the query order and scanned columns to eliminate the deadlocks.

It's entirely possible one of your referenced/linked assemblies somewhere has randomly gone corrupt (it can happen) on disk. Can you try a replicate the problem on a new, clean machine with the same stats, fresh installs of the latest xyz drivers you're using?
I solved a mysterious problem that took me months to isolate this way. It seemed clean, new machines with the same specs and prerequired drivers would work just fine - only some older machines with the same specs were failing consistently. I ended up uninstalling everything (IIS, ASP.NET, .NET, database and client) and starting from scratch. The end cause when I isolated it was that the db client driver was corrupt on the older machines (and all the older machines were clones of each other, so I assume they were cloned after the corruption occured), and it seemed to be messing with the .NET memory space even when I wasn't calling it directly. I have yet to even reply to my "help me debug this monster" post with this answer because I doubted it would ever help anyone.

We started receiving this error after installing windows updates on a Windows Server 2008R2 machine. Windows Process Activation Service (WAS) installs some additional site bindings that caused issues for our setup.
We removed net.tcp, net.pipe, net.msmq, and msmq.formatname bindings from our website and no longer got the faulting application exception.

This is probably an edge case, but just in case someone is coming here and they are using MVCMailer , I was getting this same error due to the .SendAsync() method on the mailers.
I switched them all to .Send() and the crashing stopped.
See this SO answer for ways to use the mailer async and avoid the crash (allegedly, I did not personally implement it)

Related

Application pool disabling

I have an application in the Production environment which is Windows Server 2012/IIS 8 and is load balanced.
Recently out of nowhere, the website app pool suddenly started gettig disabled. The System Windows Logs logged the following error message by the Resource-Exhaustion-Detector ...
Application Pool 'x' is being automatically disabled due to a series of failures in the process(es) serving that application pool.
Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: w3wp.exe (6604) consumed 5080641536 bytes, w3wp.exe (1572) consumed 477335552 bytes, and w3wp.exe (352) consumed 431423488 bytes.
Anyone got any idea how I figure out what is happening? We've never come across this issue before and the application has been running for a good couple of years.
Also, this isn't something that happens regularly but instead seems to happen one every day or so, and even that is at a random time. The Virtual Memory was initially 4GB but because of the issue above, we increased it to 8GB. Recently it spiked at using about 6.8GB out of 8GB, which it has no reason to do so.
Any help would be really appreciated!
The answer is easy here, obviously and certainly you have two issues here
1- You have a serious bug in your process/code that happens intermittently "you need to debug it to find how/when that happens" or at least run a ProcDump
such that you keep it listening on the server on the process W3WP till an exception happens and then analyze this dump to find where the code get stuck and consume that memory/otherwise just debug the code and see what changes were made in last few months "not days"
2- the application get stopped because you have configured/it is configured by default to get disabled break after a certain number of failure repeats, and that's a normal behavior but the main issue as I said is not the application pool itself, its inside the process
please let me know if you need a further explanation or help on this matter

IIS 6 App Pools not responding to multiple requests / not running multi-threaded

I have a classic ASP application that has been stable for years and now we're having all kinds of problems with it. After moving the app between machines and wiping the original so we could have a fresh install of windows, we've come to the following "symptom". The app pools do not appear to allow for multiple simultaneous requests. Here's what we are seeing:
The app runs normally for most people, but when someone within one of the app pools accesses a long-running script (usually one with lots of DB access) all of the other users in the pool must wait for that script to complete. Once the script completes, everyone else's requests run. This initially made us suspect the DB connection string or something.
UNTIL... we noticed also that large file uploads into our system also cause the app pool to stop responding. What's interesting about this is that we're using the SAFileup COM+ object to do our uploads, which has a progress display in a pop-up window. When you go to upload the file, the progress display comes up, but then never refreshes to show upload progress. If you wait it out, however, the file will eventually upload and the other pending requests will process as normal.
Our app pools are in the default configuration, using the IWAM account to launch. I checked to ensure that the IWAM account has all the appropriate permissions. It does.
We've tried a variety of DB connection strings, none solved the problem (though I'm thinking it's not the DB connection string). Just in case someone thinks it is, here's our connection string: "Provider=SQLNCLI;Trusted_Connection=yes;Server=(local);Database=demo;". It couldn't be simpler. This string was previously not a problem.
I fussed with the web gardens thing and it does, indeed, make the system respond to multiple requests, but each worker thread in the garden has its own session state which causes our users to get booted when their request gets randomly assigned to a new worker thread. Only having a single worker process in the garden was never an issue before anyway.
I've used SQL Profiler and sp_who2 to see if during the long-running scripts there are any deadlocks or blocks on the SQL Server. There are not.
The issues initially started after we had installed some patches from Microsoft. We wiped a machine clean and installed Win2k3 server, then SP2, and then didn't patch anymore after that. The problem remained, so it doesn't appear to have been a patch.
I'm pretty much at a loss now... does anyone have any experience with similar issues? If so, how were they fixed?
Check that you don't have ASP debugging enabled on the server. This will force the ASP script engine to run on a single thread.
Sounds like an limit on the number of concurrent incoming requests to the IIS or the Windows Server..
Check out http://blogs.msdn.com/b/david.wang/archive/2006/04/12/howto-maximize-the-number-of-concurrent-connections-to-iis6.aspx and http://forums.iis.net/p/1152112/1880908.aspx#1880908 on how to tweak the settings.

Creating objects suddenly begins failing after they have been loaded in memory successfully

Behavior:
Application is loaded and being used as expected.
Suddenly, a particular DLL can no longer be loaded. The error message is:
ActiveX component cannot create object.
In each case, the object had been created successfully many times before failure. All objects are marked for "retain in memory".
This error is cleared when the application pool is recycled. It may be hours or months before it is seen again.
Issue has happened within two hours of a refresh, as well as never happened in months of uptime.
Issue has happened with hundreds of simultaneous users (heavy usage) and also with 1-3 users.
While the issue is occurring, the process running that application pool cannot create the object that is failing. However it can create any other objects. Memory, CPU, and other resources all remain at normal usage. In addition, other processes (such as a stand-alone exe) can successfully create the object.
The first instance of the issue appeared in mid 2008. There have been less than fifty instances since then, despite a pool of hundreds of servers for it to occur on. All instances except one have failed on the same DLL.
DLL Failure Info:
most common - generic data structure implementing a b-tree, has no references other than to its interface. Code consists of arrays and one use of the vb6 Event functionality. The object has not been changed in any way since 2005.
one-time - interop to a .NET module. the failure is occurring when trying to create the interop object, not the .NET object. This object is updated a few times each year.
Application Environment:
IIS hosted application
VB6, classic ASP, some interop to minor .NET components
Windows Server 2003 / Windows Server 2008 (both have independently had the problem)
Attempts to Reproduce:
Using scripts (and real-life humans) to run the same end-user workflows that our logs reported the days before the issue occurred.
Using scripts to create/destroy suspected objects as fast as possible from multiple simultaneous sessions.
Wild speculation.
No intentional success, but it does manifest randomly on the servers on its own.
Troubleshooting:
Code reviews
Test harnesses to investigate upper limits of object creation / destruction
Verification of ability to create object outside of the process experiencing the issue
Monitoring of resources over time on servers under load
Review of IIS, error, and event logs to determine events leading up to issue
Questions:
Any ideas on how to reproduce the issue?
What could cause this behavior?
Ideas for bypassing the first two questions in favor of a fast solution?
The DLL isn't on a network drive is it? You can get "glitches" where the drive is not available momentarily that then means COM can't do what it needs and could then fail to notice the drive is available again.
I used Process Monitor to debug similar problem when accessing ADO/OLEDB stack. Turned out environment got corrupted at some point and ADO classes are registered with InprocServer32 being REG_EXPAND_SZ pointing to %CommonProgramFiles%\System\ado\msado15.dll or similar ot x64 OSes.
Also when you register an application with Restart Manager, on failure the process gets restarted by winlogon process whose environment is different than explorer's one and unfortunately is missing %CommonProgramFiles% -- ouch!
This seems like a random failure; some race condition.
Try VMWARE to record the state of the machine you run this dll on. When the error happens you can then replay the record and inspect the memory contents. That why you won't have to play try and catch the error. At least you will have a solid record of it.
While I can't provide a solution, try catching the error and retry loading the dll when this happens after a refresh to the environment.

Sharepoint W3WP.EXE Process Consuming 100% CPU - Corrective Action?

We have a Windows Server 2003 web server, and on that server runs about 5-6 top level Sharepoint sites, with a different application pool for each one.
There is one W3WP process that keeps pegging 100% for most of the day (happened yesterday and today) and it's connected (found by doing "Cscript iisapp.vbs" at the command line and matching ProcessID) to a particular Sharepoint site...which is nearly unusable.
What kind of corrective action can I take? These are the following ideas I had
1) Stopping and restarting the Web Site in IIS - For some reason this didn't stop the offending W3WP process??? Any ideas why not?
2) Stopping and restarting the associated Application Pool.
3) Recycling the associated Application Pool.
Any of those sound like the right idea? If not what are some good things to try? I can't do an iisreset since I don't want to alter service to the other, much more heavily used, Sharepoint sites.
If I truly NEED to do some diagnostic work please point me in the right direction. I'm not the Sharepoint admin guy (he's out of town so I'm filling in even though I'm just a developer) but I'll do my best.
If you need any information just let me know and I'll look it up (slowly though, as that one process is pegging the entire machine).
It's not an IISReset that you need. You have a piece of code that is running amok with your memory. Most likely it's not actually a CPU problem but a paging problem. I've encountered this a few times with data structures in memory that grow too large to page in/out effectively and eventually the attempt to page data just begins consuming everything. The steps I would recommend are:
1) Go get the IIS Debug Diagnostics tools. And learn how to use them.
2) If possible, remove the session state from InProc to a state server or a sql server (since this requires serialization of all classes that go into session this may not be possible). This will help alleviate some process related memory issues.
3) Go to your application pool and adjust the number of worker processes upward. Remove Rapid fail protection (this will allow the site to continue serving pages even if rapid catastrophic errors occur).
The IIS debug diagnostics will record a LOT of data, but you can specify specific "catch" alerts that will detect hangs, excessive cpu usage etc. It will capture gigs of data, so be ready for a long wait when attempting to view the logs.
Turns out someone tried to install some features that went haywire.
So he wrote a stsadm script to uninstall those features
Processor was still pegging.
I restarted the IIS Application Pool for that IIS process, didn't fix it.
So then I restarted IIS for that site and that resolved the processor issue.

Web Server slows down (ASP.NET)

We have a really strange problem. One of the servers in the server farm becomes really slow. We see a number of timeouts in the logs and overall response time is not where it should be (and is on other servers in the farm).
What is also strange is that it is not just the web app - Just logging into the server takes up to 1.5 min to show you the desktop. Once you are in, the system is as responsive as ever - unless you try to launch something, i.e. notepad - it takes another minute to launch and after launch it works fine.
I checked a number of things - memory utilization is reasonable, CPU is below 15%, windows handles, event logs do not show anything.
Recycling the aps.net process does not fix it - it still takes over a minute to log in. Rebooting the server helped, but now it started to slow down again.
After a closer look we found out that Windows Temp directory is full of temp files - over 65k files. This is certainly something to take care of. But my question is could it be the root cause of the sluggishness, or there is still something else lurking in the shadows?
Edit
After more digging I am zeroing in on the issue related to the size of temp directories. This article: describes something very similar. I am still not too sure because the fact that the server is slow to open even Notepad remains unexplained.
Is it possible that under such conditions creating a new temp file takes over a minute?
You might want to check how many threads your using in the ASP.NET thread pool when the timeouts occur. Another idea might be to look at the GC information in perfmon and see if the GC is running a gen2 collection?
Ok, It is official, all of this was grief caused by this issue. When one of our servers was again behaving badly we cleaned the temp directory and it fixed the problem, including the slow login.
This last part still baffles me - I do not understand how excessive number of files in a temp directory can cause login to take over 1 min, leave alone launching a program, but whatever it is clearing the directory fixed it and I can live with it.
Did you check virtual memory as well ? paging ? does you app logs a lot of data in different files ? also - check - maybe the utilization happens in kernel mode and not user mode.

Resources