In the last couple of weeks my site has hung. Connections increase but seem to never release. Once it hits 700+ current connections (using the performance tool) the whole site hangs and I have no choice but to do iisreset to get it working again. Normally it's around 100 concurrent connections at peak time. No errors or warnings in the event log when it stops releasing so that isn't helpful. This problem has been happening after copying new DLLs over the old ones to do a site update. I have a single server so no choice but to copy over live. But in the case of today it was fine after the update but then the problem happened two hours later. It's .NET 4.5 site on Windows 2008 R2 64bit. Is there a way I can find out what's causing this, like other log files some place or something I should try doing when it happens? What I have tried is recycling the app pool (doesn't help), turning off the app pool and back on (doesn't turn on, gives exception), restarting the site (doesn't help), and iisreset (works every time).
Usually some requests are "stuck" when you see these syndromes, as app.pool recycle and site restart will wait for the pending requests to finish in general (graceful exit), while iisreset will actually kill the w3wp.exe process after 20 (or 30?) seconds if it doesn't exit on its own (non-graceful exit), this why iisreset works for you.
Listing the active requests (appcmd.exe list request /?) should give you a clue about why is this happening.
Related
Following the steps in this article I am trying to turn on preloadEnabled and startMode in my applications (I have about 20 WCF services in their own app pools and then 3 client sites) https://learn.microsoft.com/en-us/iis/get-started/whats-new-in-iis-8/iis-80-application-initialization
Now when I do an iisreset, WAS dies immediately. If I go back and start it manually and then the W3PS, everything starts like I want it to.
Is there any hope for being able to do an iisreset ever again?
Doing it with /stop and /start works fine. It's literally just the bang turnaround of no flags that's an issue.
Event code for WAS fail was 5011 and the code was 8007006d
IIS v 10.0.15063.0
Windows 10 1703 (15063.726)
For the future internet visitor to this question, what we discovered in our testing and working through the bug was that we have a background thread related to our pubsub internally that isn't shutting down cleanly, and is causing the process to hang before the reload has a chance to happen, causing corruption in WAS.
Starting the dependency stopped the issue for us during the next round of testing, and now we get to go research a new bug internally. Fun times.
I'm running iis7 under windows 2008 R2
The "w3wp.exe" process (the iis site) use 100%+- CPU
Is there any why I can check which part of the site cause the problem ?
In IIS 7 you can open IIS Manager and Use the Worker Processes feature, in that you will see the processes and the CPu they are consuming. If you double click the worker process that is consuming 100% cpu it will show you the list of requests that are running in that precise time, including how long they have been running and in which state they are. Usually that will show you the offending page.
You could try attaching a debugger to w3wp, and hitting the Pause button in Visual Studio. You should, in theory, land at the place that's takin the longest to complete. Other than that, you'll have to implement some tracing.
This happens every time we deploy one of our web sites. We recycle the App Pool that it runs in and it fixes the processor peg. Sometimes it takes a couple of times recycling but it works.
I have a classic ASP application that has been stable for years and now we're having all kinds of problems with it. After moving the app between machines and wiping the original so we could have a fresh install of windows, we've come to the following "symptom". The app pools do not appear to allow for multiple simultaneous requests. Here's what we are seeing:
The app runs normally for most people, but when someone within one of the app pools accesses a long-running script (usually one with lots of DB access) all of the other users in the pool must wait for that script to complete. Once the script completes, everyone else's requests run. This initially made us suspect the DB connection string or something.
UNTIL... we noticed also that large file uploads into our system also cause the app pool to stop responding. What's interesting about this is that we're using the SAFileup COM+ object to do our uploads, which has a progress display in a pop-up window. When you go to upload the file, the progress display comes up, but then never refreshes to show upload progress. If you wait it out, however, the file will eventually upload and the other pending requests will process as normal.
Our app pools are in the default configuration, using the IWAM account to launch. I checked to ensure that the IWAM account has all the appropriate permissions. It does.
We've tried a variety of DB connection strings, none solved the problem (though I'm thinking it's not the DB connection string). Just in case someone thinks it is, here's our connection string: "Provider=SQLNCLI;Trusted_Connection=yes;Server=(local);Database=demo;". It couldn't be simpler. This string was previously not a problem.
I fussed with the web gardens thing and it does, indeed, make the system respond to multiple requests, but each worker thread in the garden has its own session state which causes our users to get booted when their request gets randomly assigned to a new worker thread. Only having a single worker process in the garden was never an issue before anyway.
I've used SQL Profiler and sp_who2 to see if during the long-running scripts there are any deadlocks or blocks on the SQL Server. There are not.
The issues initially started after we had installed some patches from Microsoft. We wiped a machine clean and installed Win2k3 server, then SP2, and then didn't patch anymore after that. The problem remained, so it doesn't appear to have been a patch.
I'm pretty much at a loss now... does anyone have any experience with similar issues? If so, how were they fixed?
Check that you don't have ASP debugging enabled on the server. This will force the ASP script engine to run on a single thread.
Sounds like an limit on the number of concurrent incoming requests to the IIS or the Windows Server..
Check out http://blogs.msdn.com/b/david.wang/archive/2006/04/12/howto-maximize-the-number-of-concurrent-connections-to-iis6.aspx and http://forums.iis.net/p/1152112/1880908.aspx#1880908 on how to tweak the settings.
Our Website is in .NET but with some old ASP and 32bits libraries too in it. It had been working fine for a while (2 years). But for the past month, we have seen the following error on our IIS7 server, which we have been unable to track down and fix:
"Faulting application w3wp.exe, version 7.0.6001.18000, time stamp 0x47919413, faulting module kernel32.dll, version 6.0.6001.18215, time stamp 0x4995344f, exception code 0xe053534f, fault offset 0x0002f328, process id 0x%9, application start time 0x%10."
We are able to reproduce the error:
One of our .ASPX pages starts loading, executing code and queries (we have response.flush() all over the page to track where the code breaks), then it suddenly stops and we get the above error in IIS.
The page stops loading and, without the response.flush(), it's not redirecting to our error.aspx page (as configured in web.config)
The error does NOT happen all the time. Sometimes, it happens 3 times in a row, then it's working fine for 15 minutes non-stop with a proper redirection to error.aspx.
The error we get then is a classic: "Either BOF or EOF is True, or the current record has been deleted."
When the error occurs, the page hangs and all other session on the same computer from any browsers have hanging web pages as well (BTW, we only allow 1 worker process while we are testing). From other computers, the site loads fine.
I can recycle the Application Pool, kill w3wp.exe, restart IIS. Nothing will do. The only way to successfully load the page again is to Restart MS SQL which handles our Session States. I don't know why this is, but we guessed that the Session Cookies on the users browsers points to a thread which was not terminated properly (due to the above crash) and IIS is waiting for it to terminate to process more code (?). If someone can explain this better, that would be really helpful. Is there a timeout which we can set to "terminate" threads? Is it a MS SQL related issue?
I have also looked at the Private and Virtual Memory usages, because I think our code is not the most effective and I am certain we have remaining memory leaks. However, I saw the page crash even though both Private and Virtual Memories were still quite low (under 100MB each).
I have used Debug Diag and WinDbg as indicated here: http://blogs.msdn.com/b/tess/archive/2009/03/20/debugging-a-net-crash-with-rules-in-debug-diag.aspx, but we are not able to make windbg work, this is what we are trying to do at the moment.
If someone could help us or point us toward the right direction that would be really great, thank you.
"Either BOF or EOF is True, or the current record has been deleted" means the table is empty and you are attempting to do a MoveNext. So check for eof before you do any moves.
IIS is notorious for throwing kernel errors in w3wp.exe like this one. All your errors in session state are just symptoms of the crashed process. Multiple APP pools won't help much - they just spread the error around.
I''d wager it is SQL deadlocks due to your user environment changing. This will cause a 10-second lag as SQL tries to determine which query to kill off. One wins, one loses. The loser gets back a pointer to an unexpectedly empty table and you try a move and subsequent crash. You maybe could point your DB to an ODBC connection and turn on tracing, or figure out a way to get SQL to log it.
I had all the same symptoms as above in Perl. I was able to make a wrapper fn() to do all SQL queries and log all sql, + params and any errors to disk to track down the problem. It was deadlocks, then we were able to code in auto-retry, and eventually we recoded the query order and scanned columns to eliminate the deadlocks.
It's entirely possible one of your referenced/linked assemblies somewhere has randomly gone corrupt (it can happen) on disk. Can you try a replicate the problem on a new, clean machine with the same stats, fresh installs of the latest xyz drivers you're using?
I solved a mysterious problem that took me months to isolate this way. It seemed clean, new machines with the same specs and prerequired drivers would work just fine - only some older machines with the same specs were failing consistently. I ended up uninstalling everything (IIS, ASP.NET, .NET, database and client) and starting from scratch. The end cause when I isolated it was that the db client driver was corrupt on the older machines (and all the older machines were clones of each other, so I assume they were cloned after the corruption occured), and it seemed to be messing with the .NET memory space even when I wasn't calling it directly. I have yet to even reply to my "help me debug this monster" post with this answer because I doubted it would ever help anyone.
We started receiving this error after installing windows updates on a Windows Server 2008R2 machine. Windows Process Activation Service (WAS) installs some additional site bindings that caused issues for our setup.
We removed net.tcp, net.pipe, net.msmq, and msmq.formatname bindings from our website and no longer got the faulting application exception.
This is probably an edge case, but just in case someone is coming here and they are using MVCMailer , I was getting this same error due to the .SendAsync() method on the mailers.
I switched them all to .Send() and the crashing stopped.
See this SO answer for ways to use the mailer async and avoid the crash (allegedly, I did not personally implement it)
We have a Windows Server 2003 web server, and on that server runs about 5-6 top level Sharepoint sites, with a different application pool for each one.
There is one W3WP process that keeps pegging 100% for most of the day (happened yesterday and today) and it's connected (found by doing "Cscript iisapp.vbs" at the command line and matching ProcessID) to a particular Sharepoint site...which is nearly unusable.
What kind of corrective action can I take? These are the following ideas I had
1) Stopping and restarting the Web Site in IIS - For some reason this didn't stop the offending W3WP process??? Any ideas why not?
2) Stopping and restarting the associated Application Pool.
3) Recycling the associated Application Pool.
Any of those sound like the right idea? If not what are some good things to try? I can't do an iisreset since I don't want to alter service to the other, much more heavily used, Sharepoint sites.
If I truly NEED to do some diagnostic work please point me in the right direction. I'm not the Sharepoint admin guy (he's out of town so I'm filling in even though I'm just a developer) but I'll do my best.
If you need any information just let me know and I'll look it up (slowly though, as that one process is pegging the entire machine).
It's not an IISReset that you need. You have a piece of code that is running amok with your memory. Most likely it's not actually a CPU problem but a paging problem. I've encountered this a few times with data structures in memory that grow too large to page in/out effectively and eventually the attempt to page data just begins consuming everything. The steps I would recommend are:
1) Go get the IIS Debug Diagnostics tools. And learn how to use them.
2) If possible, remove the session state from InProc to a state server or a sql server (since this requires serialization of all classes that go into session this may not be possible). This will help alleviate some process related memory issues.
3) Go to your application pool and adjust the number of worker processes upward. Remove Rapid fail protection (this will allow the site to continue serving pages even if rapid catastrophic errors occur).
The IIS debug diagnostics will record a LOT of data, but you can specify specific "catch" alerts that will detect hangs, excessive cpu usage etc. It will capture gigs of data, so be ready for a long wait when attempting to view the logs.
Turns out someone tried to install some features that went haywire.
So he wrote a stsadm script to uninstall those features
Processor was still pegging.
I restarted the IIS Application Pool for that IIS process, didn't fix it.
So then I restarted IIS for that site and that resolved the processor issue.