I have a webserver that is pegged and I've been able to isolate it to a particular website instance. I'd like to dig deeper and isolate the particular page/process that is causing the issue.. Any tips?
You can take a memory dump of the process and poke around with windbg.
There are posts on this issue from Tess Ferrandez blog. Just do as she say.
Which version of IIS are you using? Some of the higher ones allow for a separation of which process gets used to handle requests such as a worker process that you could isolate a bit more that way. I'd also suggest reading through the IIS logs to see what requests were being handled, how long they took, etc.
There are many different quirks to each IIS version. The really low ones just had a start/stop functionality, but the newer ones have really given administrators much more control and power, IMO.
You should try using a profiler to identify what is using up the most resources. I've used dotTrace Profiler, although that can be expensive if you're on a tight budget.
It allows you to see exactly what processes and method calls use of the most processing time of a request really well so you can isolate the most resource intensive operations.
You should really be able to use any profiler to do this, not just dotTrace. I just happen to only have experience with this one in particular.
Change your web garden setting to 10 or greater. Then watch your CPU and memory utilization on the web server.
Continue to increase the web garden setting until either the app is completely responsive with less than 5% average utilization OR you have actually maxed your web server's memory.
UPDATE
It's not about diagnosing, it's about properly configuring the IIS server. Web Gardens are one of the top misunderstood features of IIS. By increasing the available threads to handle new requests you remove the appearance of contention at the web server level and place it squarely where it belongs. In this case at your database. Instead of masking a problem it actually highlights exactly where the problem is.
This turned out to be a SQL problem (sql 2005). The solution was found by using SQL activity monitor to identify a suspended process with a Async_network_io wait type. We then ran SQL profiler to narrow it down to two massive queries which were returning an over abundance of results.
Related
We host about 150 websites (possibly scaling to 300+) that we are considering migrating to node.js. Most of the sites are fairly low traffic <1mil pageviews per month.
Should each website be it's own node.js process, or should we serve all websites using the same node.js process (or small set of load balanced processes). Is there a technical limit or a reasonable limit to the number of node processes per server?
Process per site: Feels inefficient, but I don't know if it actually is inefficient. Would ensure one buggy site doesn't affect other sites.
Process per core/small set of processes: Likely higher performance, but what happens when I need to update a sites codebase, won't it take down other sites? Also, code failures in one site would affect other sites.
Ideally, I would prefer one process per site so that we could host all sites from each worker server. That way when load increases we can just spin up another identical worker server and load balance between the two without having to arbitrarily say SiteA goes to ServerA and SiteB goes to ServerB. Any node.js gurus available to offer some wisdom?
All static file requests will be handled likely by Nginx or something like Varnish.
There are a lot of issues at play here. The big picture answer is, it depends... as it always does when you bring in the whole "performance" discussion. That being said, the simplest way to get a solid Node set up is to note the following basic facts about NodeJS, and I will also comment on their implications as they pertain to your questions.
The concurrency you get with Node works really good in certain situations, namely IO heavy operations. What we're really talking about here is minimizing the amount of downtime to wait for the next request. Because of this, Node works really well in an environment where there is one process per core on a machine. Node does really well at maximizing the amount of CPU available to serve requests under heavy load. This being said, if you have literally ZERO other work going on in your even loop, you can see minor performance increases (in terms of max requests/second/processor core) by having multiple node processes per core. But, I've never seen any benefit from increasing this number past 3. Even under circumstances where the entire event loop was literally just a file server.
On the process per site comment. This is a bad idea for many reasons. For one, a well put together node server can process thousands of requests per second. Our (company name omitted) servers, hosted through Amazon EC2 on medium clusters (lots of ram, mid CPU clock, 4 cores), typically fail around 3000 requests per second per cluster. Our servers do a fair bit of CPU work, for simple file servers I'm sure you can do much better. Strictly speaking, sure, per site, you will be able to serve more requests by launching each site in its own process/core/escalating quickly here! But it's not necessary from a cost and over complication of your architecture point of view. What I WOULD recommend, is investing in a setup with a lot of RAM. The ability for your server to cache often requested files will effect your performance infinitely more than launching an abundance of processes for a given machine.
On the whole RAM thing. The number of processes you want to launch for a given core is dependant on two things. One is how much synchronous work done in your event loop. The more synchronous work, the more time between a given request coming in and the event loop being ready to adress the next one. If you have a busy event loop, you will be in a situation where you require more processes/CPU Core. The other thing that can effect this, particularly relevant for file servers, is the amount of RAM. Node runs much better in a high ram environment, but you can say this about ANY file server really... What this has to do with, is the number of active asynchronous operations. One downside of the way node works, is under heavy loads, you can get a large number of event handlers active at once. This is great for concurrency/simplicity, however, if your server is busy waiting around for a lot of async disk/IO to happen it will slow down and crash much sooner than if you had plenty of RAM. If you don't have enough RAM to handle all of these event handlers, you will want to keep to the 1 process/core arrangement. Otherwise, it is easier for Node to spin up many event handlers simultaneously, and again cause you to crash sooner than you would otherwise.
I don't really have enough information to tell you what you SHOULD do. This depends entirely too much on the architecture of your specific server, sites, size of your sites, amount of data... etc. But these three pieces of knowledge are the basic things that help you get the most out of your Node server. To be honest, your idea about load balancing mixed with the considerations above, should do nicely for you. Surely, microoptimizations are possible, but if you do these things, you should easily see requests/second in the thousands before you start experiencing crashes because of DDOS type of conditions.
No, don't do it. Keep it simple! And check out http://12factor.net/.
A few hundred processes is nothing compared to the simplicity you otherwise lose. It would be a terrible decision, on so many levels, to have more than one site (or, "logical application unit") served by a single Node process.
If you're asking this question, you may want to explore Node more before you "migrate" to Node. Error handling and separation of concerns are more complicated in Node than in other situations. Specifically, neither the domain nor cluster APIs are mature. But really it's the philosophy of clean and simple application deployment that you'd be violating. I could go on and on.
Application pools in IIS are recycled very frequently and I can't figure out why. I remember reading about a possible issue in IIS6 that meant you were forced to recycle but a quick search now turns up empty. On IIS6 or 7 you can turn off the idle time, duration and specific time recycle options so no problems there.
So why does every .net site recycle the application pool? If a site didn't have any memory leaks could you set up a site that never needed to recycle?
Also failing this, what would be the best way to ensure background tasks are called, is there any auto restart modules for IIS or should an external service be used to make those calls?
It sounds like it is possible to do if you really wanted/needed to?
Websites are intended to keep running (albeit in a stateless nature). There are a myriad of reasons why app pool recycling can be beneficial to the hosting platform to ensure both the website and the server run at optimum. These include (but not limited to) dynamically compiled assemblies remaining in the appdomain, use of session caching (with no guarantee of cleanup), other websites running amok and resources getting consumed over time etc. An app pool can typically serve more than one website, so app pool recycling can be beneficial to ensure everything runs smoothly.
Besides the initial boot when the app fires up again, the effect should be minimal. Http.sys holds onto requests while a new worker process is started so no requests should be dropped.
From https://weblogs.asp.net/owscott/why-is-the-iis-default-app-pool-recycle-set-to-1740-minutes
You may ask whether a fixed recycle is even needed. A daily recycle is
just a band-aid to freshen IIS in case there is a slight memory leak
or anything else that slowly creeps into the worker process. In theory
you don’t need a daily recycle unless you have a known problem. I used
to recommend that you turn it off completely if you don’t need it.
However, I’m leaning more today towards setting it to recycle once per
day at an off-peak time as a proactive measure.
My reason is that, first, your site should be able to survive a
recycle without too much impact, so recycling daily shouldn’t be a
concern. Secondly, I’ve found that even well behaving app pools can
eventually have something sneak in over time that impacts the app
pool. I’ve seen issues from traffic patterns that cause excessive
caching or something odd in the application, and I’ve seen the very
rare IIS bug (rare indeed!) that isn’t a problem if recycled daily. Is
it a band-aid? Possibly, but if a daily recycle keeps a non-critical
issue from bubbling to the top then I believe that it’s a good
proactive measure to save a lot of troubleshooting effort on something
that probably isn’t important to troubleshoot. However, if you think
you have a real issue that is being suppressed by recycling then, by
all means, turn off the auto-recycling so that you can track down and
resolve your issue. There’s no black and white answer. Only you can
make the best decision for your environment.
There's a lot more useful/interesting info for someone relatively unlearned in the IIS world (like me), I recommend you read it.
I have a web application that hangs under high loads. I'm not going to go into the specifics of the code because I really just want some troubleshooting advice and tooling recommendations.
It's a web app, so each request get's a thread. Under a high load test, the app begins to consume all of the cpu, while becoming unresponsive. I suspect that the request threads are hanging in the new code that we are testing. Due to the fact of the cpu consumption, I'm assuming this must be on my app side. My understanding, which could be wrong, is that total cpu consumption indicated my first troubleshooting efforts should be in looking at the code that's consuming those cycles.
What are some tools and/or methods for inspecting which threads are hanging and on what lines of code? Again, I can easily force the app into the problematic behavior.
I've found and been trying out visualvm. Seems like the perfect tool. Still open for suggestions though. I looked at eclipse TPTP and it seems to be end-of-life-ing as well as requiring a more heavy weight deployment.
You can insert logging messages at starting a thread and closing a thread. Then you start the application and inspect the output while penetrating the code.
Another way is to look for memory leaks. If you are sure you haven't one, you can extend the virtual memory of your JVM.
#chad: do you have Database in whole picture...you may want to start by looking what is happening at DB side...you can very well look into DB locks, current sessions etc.
We are experiencing some performance issues with one of our websites and one of the things I am looking at is optimising the application pool. Are there any recommended methods of calculating maximum virtual/used memory to allow the pool?
The default settings of IIS are pretty well thought out. I'd first start by placing the application in its own application pool to make sure there aren't any other websites that might be causing the problem.
From there, I'd work at analyzing the code to make sure it is the resources the application has access to that are limiting the website. If the resources are what is limiting, it should be easy to figure out what needs priority in the application pool.
If your application is running massive memory software, it would be apparent by the spike and eventual ceiling hit of no more memory available, thus, you'd up your memory for that pool. Likewise, if the site just bogs down over time, you might check how long your sessions are staying alive and possibly shorten them. Just some things to start out with.
As far as hard and fast rules, I haven't come across any and usually it is so 'per-application' specific, that it is hard to define. I would just pay attention to performance counters and then turn one thing on, check them again, and then try another thing, making sure to keep all tests agnostic so you know what is working and what is not.
We have a Windows Server 2003 web server, and on that server runs about 5-6 top level Sharepoint sites, with a different application pool for each one.
There is one W3WP process that keeps pegging 100% for most of the day (happened yesterday and today) and it's connected (found by doing "Cscript iisapp.vbs" at the command line and matching ProcessID) to a particular Sharepoint site...which is nearly unusable.
What kind of corrective action can I take? These are the following ideas I had
1) Stopping and restarting the Web Site in IIS - For some reason this didn't stop the offending W3WP process??? Any ideas why not?
2) Stopping and restarting the associated Application Pool.
3) Recycling the associated Application Pool.
Any of those sound like the right idea? If not what are some good things to try? I can't do an iisreset since I don't want to alter service to the other, much more heavily used, Sharepoint sites.
If I truly NEED to do some diagnostic work please point me in the right direction. I'm not the Sharepoint admin guy (he's out of town so I'm filling in even though I'm just a developer) but I'll do my best.
If you need any information just let me know and I'll look it up (slowly though, as that one process is pegging the entire machine).
It's not an IISReset that you need. You have a piece of code that is running amok with your memory. Most likely it's not actually a CPU problem but a paging problem. I've encountered this a few times with data structures in memory that grow too large to page in/out effectively and eventually the attempt to page data just begins consuming everything. The steps I would recommend are:
1) Go get the IIS Debug Diagnostics tools. And learn how to use them.
2) If possible, remove the session state from InProc to a state server or a sql server (since this requires serialization of all classes that go into session this may not be possible). This will help alleviate some process related memory issues.
3) Go to your application pool and adjust the number of worker processes upward. Remove Rapid fail protection (this will allow the site to continue serving pages even if rapid catastrophic errors occur).
The IIS debug diagnostics will record a LOT of data, but you can specify specific "catch" alerts that will detect hangs, excessive cpu usage etc. It will capture gigs of data, so be ready for a long wait when attempting to view the logs.
Turns out someone tried to install some features that went haywire.
So he wrote a stsadm script to uninstall those features
Processor was still pegging.
I restarted the IIS Application Pool for that IIS process, didn't fix it.
So then I restarted IIS for that site and that resolved the processor issue.