Classic ASP: response time periodically slows down extremely - iis

I have a problem with a bunch (around 50) of classic ASP-Sites on Win2012R2 with Access-Databases, which drives us crazy.
All asp-pages of all sites on this server run smoothly for around 45 seconds, after that period they (all) completely stop responding to any click for 15 to 20 seconds, then this delay disappears again for the next 45 seconds like it never existed before, it re-appears again - and so on. This effect started out of nothing a few weeks ago, after several months without any problems.
Static HTML-pages are not affected, and it seems, even asp-pages without connecting to their database run fine. We, therefore, tried testing to convert from Access to SQLExpress, but this didn't change anything - even the converted site was affected in the same way (so it seems not to be Access).
We then tried to stop all sites in IIS and re-activating just one single site with very few visitors to see if it only appears, when many requests are sent to the server. But the effect still showed up, even after Restarting IIS and even after restarting the whole machine with just one website activated in IIS. It seems to be completely independent from the number of effects, just like the server (rather: the asp-engine of IIS) being busy with itself in a periodical pattern.
What we can see in performace monitor (see screenshot): while requests/sec goes down to 0 at some moment, when the effect starts, the number of requests executed continuosly accumultes from a normal level (which looks "logical" to me, but only describes the effect, not justifies, where it comes from). A few seconds before the effect vanishes, request/sec again grow and these counters revert to normal values.
We had a similar problem a year ago on a Windows 2008-Server, where the sites ran without any problems for several years and then it started out of nothing. After testing some of the sites on a server of another hoster, we found out, the problems didn't appear on his server with Windows 2012 R2 (and still don't for a full year, while hosting 3 of our sites there). At another hosters virtual Windows 2012R2-Server we have another single site hosted with more traffic than most of our others and even there the problem didn't appear since a full year now. So we our hoster switched over to WinServer2012R2 and - bingo - all the problems were gone. All sites performed like a charm again from that moment on without changing anything but the OS.
We then stopped investigating the issue, thinking the problem relates to the OS. But around 9 months later, it re-appeared and after hours and hours of investigating we have no idea, what to search for and what to do (beside of moving all our sites to the other hosters server, which isn't a real solution to the problem and we cannot guarantee, the effect will not re-appear on this machine sometimes in the future).

I definitively found a solution by myself, but in a totally random way. After weeks of searching for a solution to the problem, I worked on cleaning up the server's hard disk and deleted all files in Windows/temp-folder (> 18.000 files!). And since this moment (4 days ago), the described response lag never showed up any more! But a small bunch of new .tmp-files were created in the folder.
My theory is: maybe every time a user visits one of the websites (which opens connections to its Access database, causing a .ldb-file in the database folder), a randomly(?) named .tmp-file (like: jet12f0.tmp) is created in Windows/temp folder in parallel. These files are "normally" deleted again, as database connection closes and the .ldb disappears. Maybe some of the connections are not closed correctly, therefore the corresponding .tmp-file in the Windows/temp folder resides there as an "orphan", literally forever. As time goes by, the folder fills up with these orphaned files. And then it comes, that a new .tmp-file should be generated, but with a name of a still existing "orphaned" .tmp-file. This now causes the server to stop all actions, because it is not possible to establish the new file, named like an existing. After 15 to 20 seconds the conflict is solved by some mechanism (unknown to me) and all runs perfect again, until the next conflict arises around 45 seconds later. And so on...
I must assume: this is only a "amateur" theory, I'm not a server "Guru".
Cleaning up this temp-folder from time to time seems to prevent the server from getting into this situation, because there are no file/naming conflicts.
I agree: The real solution would be finding the problem in the code (if there is one), but we can live with that situation, comparing the effort to find the problem with just cleaning up the temp-folder once in a month or so ;-)

Related

Issue with malicious files showing up in IIS site folders

If this is something in another thread I'm happy to read but I've been keeping google hot without a result.
I have an IIS site that is getting hammered with some search engine exploit file creations. Basically trying to direct traffic to videos on YouTube by getting them indexed. I've disabled all FTP accounts to start and then went through confirming everything is updated including things like PHP. I'm having a bear of a time figuring out where these are coming from. I have started resetting the passwords on the anonymous user credential for each website but I don't think this is a password exploit.
The info I have:
They all seem to be created by Network Service but I can't tell which site they are coming in through.
I'm using up to date and active AV and active malware scanning.
All windows updates are in place
I have attempted to use process monitor and the only thing I can see is that the file creation comes from NETWORK SERVICE
The only thing I've thought of at this point is an exploit somewhere but without any viruses, missing updates or malware I'm at a loss of where to go with this.
I have tried to put auditing in place but this server has a pretty immense load overall so there is a number of completely safe temp files created and deleted constantly. This is compounded by the files being created at random times. I may keep a clean deck for a day or two and then get hit for a few hours. I did try auditing anyways and stopped at 15gb of security events without a hit.
I am considering containerizing into credentialed app pools but that will be a pretty significant time and resource overhead for a guess.
Any ideas or suggestions? I'm nearing the point of visiting a fortune teller down the road to see if they have any idea. :D

Content staging extremely slow

Recently, content staging became extremely slow for our Kentico 8.2 application (to move a page, it's taking 30 minutes or more). Similar staging tasks before took seconds to complete. We have restarted the website, and that had no effect.
Before, we just had the one website in the Kentico instance. We recently deployed another website to the same instance. This could be a coincidence, but it is the only thing we can think of that might be affecting the staging performance. However, we do not understand why. Why would adding a second website slow down the content staging of a different website? How do we fix it? Also, if the addition of another website is just a coincidence, what are other things to check in the event of slow content staging? We don't really know where to start with this one.
EDIT
Sites are hosted on premise (not Azure) on same server.
Look into the table index fragmentation. It grows over a period of time and make the staging application slow.
Another thing to check, make a frequent sync of tasks / changes to higher environment to reduce the number of records to keep a track.
Hope this helps in resolving your issue.

Intermittent Microsoft Azure Web Site access failure

I have a number of small MVC apps deployed as Microsoft Windows Azure websites. This has been working for several months.
Yesterday I rolled out a new one, and the deployment was unremarkable, everything worked fine. But a couple of hours later, access to the site was unavailable. The symptoms were that when the browser tried to navigate to the URL for that site, it would try to load for several minutes and then just give up with a completely blank page.
I attempted to stop and restart the site, and it worked once, but the symptoms came back several minutes later. Then I tried to stop and restart, and it didn't work.
I deployed the identical app to three additional URLs. Again, immediately on deployment, they all work fine, however, they fail at some interval in the future. They seem to not all fail at once. Sometimes restarting the site will fix the problem, and sometimes not.
IMPORTANT: If I wait for some period of time, the site may start to work again on its own.
However, deploying four versions of the app so that our users can go to a backup one if the primary one is not working is not optimal.
Any words of wisdom as to how I might go about debugging this?
ADDITIONAL INFO NOV 25, 2013:
When sites are failing, the IIS logs show either 500 or 502 Internal Service Errors. Our own MVC code is never hit, not even app_start.
You can start by checking the logs and remote debugging
http://www.drdobbs.com/windows/azure-sdk-22-supports-visual-studio-2013/240163499
Are the apps working locally?
Might not be the same problem, but from time to time our Azure instances will get the blue question mark of death as a status.
The reason we found out was that Microsoft will do upgrades on instances from time to time. If you have just one instance in a cloud service/role, then from time to time they will do maintenance and during that time it will be dead.
I have confirmed this with their support.
The only way to get around this that I know of is to create two instances. Then Microsoft guarantees ~99% availability.
Of course I also confirmed with them that this means twice the cost. =/
If that's not the issue I would enable RDP and get onto the machine to see what the problem is. Microsoft has these tools to help debug problems: http://blogs.msdn.com/b/kwill/archive/2013/08/26/azuretools-the-diagnostic-utility-used-by-the-windows-azure-developer-support-team.aspx
First, you should always run multiple instances of your web role with more than 1 upgrade domain. This is configurable in the service definition (CSDEF). Without this, you don't get an SLA from Microsoft, so you can't really complain that the VMs go down.
Second, to figure out what might be going on with these boxes, you should have both logs (my preference is to roll my own with page blobs or table storage), AND you should always have RDP access to a pre-production environment (production as well if you're not too fussed about security). Once on the box, look through the event viewer for errors.
Third, when an outage occurs check out the azure service dashboard (http://www.windowsazure.com/en-us/support/service-dashboard/) for outages.
Lastly, contact Microsoft support. It may take a few hours, but they are pretty good.
That it is happening repeatedly and for extended periods of time (more than 5 minutes), I would be there's something wrong with your hosted service. Again, RDP in and poke around. Good luck.
To debug your sites try to enable diagnostic logs:
http://www.windowsazure.com/en-us/develop/net/common-tasks/diagnostics-logging-and-instrumentation/
Another nice way to look around your site is using the debug console:
https://github.com/projectkudu/kudu/wiki/Kudu-console

IIS 7.5 connection problems

In the last couple of weeks my site has hung. Connections increase but seem to never release. Once it hits 700+ current connections (using the performance tool) the whole site hangs and I have no choice but to do iisreset to get it working again. Normally it's around 100 concurrent connections at peak time. No errors or warnings in the event log when it stops releasing so that isn't helpful. This problem has been happening after copying new DLLs over the old ones to do a site update. I have a single server so no choice but to copy over live. But in the case of today it was fine after the update but then the problem happened two hours later. It's .NET 4.5 site on Windows 2008 R2 64bit. Is there a way I can find out what's causing this, like other log files some place or something I should try doing when it happens? What I have tried is recycling the app pool (doesn't help), turning off the app pool and back on (doesn't turn on, gives exception), restarting the site (doesn't help), and iisreset (works every time).
Usually some requests are "stuck" when you see these syndromes, as app.pool recycle and site restart will wait for the pending requests to finish in general (graceful exit), while iisreset will actually kill the w3wp.exe process after 20 (or 30?) seconds if it doesn't exit on its own (non-graceful exit), this why iisreset works for you.
Listing the active requests (appcmd.exe list request /?) should give you a clue about why is this happening.

Web Server slows down (ASP.NET)

We have a really strange problem. One of the servers in the server farm becomes really slow. We see a number of timeouts in the logs and overall response time is not where it should be (and is on other servers in the farm).
What is also strange is that it is not just the web app - Just logging into the server takes up to 1.5 min to show you the desktop. Once you are in, the system is as responsive as ever - unless you try to launch something, i.e. notepad - it takes another minute to launch and after launch it works fine.
I checked a number of things - memory utilization is reasonable, CPU is below 15%, windows handles, event logs do not show anything.
Recycling the aps.net process does not fix it - it still takes over a minute to log in. Rebooting the server helped, but now it started to slow down again.
After a closer look we found out that Windows Temp directory is full of temp files - over 65k files. This is certainly something to take care of. But my question is could it be the root cause of the sluggishness, or there is still something else lurking in the shadows?
Edit
After more digging I am zeroing in on the issue related to the size of temp directories. This article: describes something very similar. I am still not too sure because the fact that the server is slow to open even Notepad remains unexplained.
Is it possible that under such conditions creating a new temp file takes over a minute?
You might want to check how many threads your using in the ASP.NET thread pool when the timeouts occur. Another idea might be to look at the GC information in perfmon and see if the GC is running a gen2 collection?
Ok, It is official, all of this was grief caused by this issue. When one of our servers was again behaving badly we cleaned the temp directory and it fixed the problem, including the slow login.
This last part still baffles me - I do not understand how excessive number of files in a temp directory can cause login to take over 1 min, leave alone launching a program, but whatever it is clearing the directory fixed it and I can live with it.
Did you check virtual memory as well ? paging ? does you app logs a lot of data in different files ? also - check - maybe the utilization happens in kernel mode and not user mode.

Resources