I am encountering an issue with my asp.net c# web application where the server is hitting very high cpu useage eg. 80%+ on w3wp process.
This has only happened recently after I made numerous changes to my application. I am fairly sure that it may be one of the changes that I have made is causing this issue with high cpu useage on the iis 7 web server.
Is it possible to analyze the process and find what exactly is causing this high useage? Or what is the mechanism for debugging such an issue.
ASP.NET Case Study: High CPU in GC - Large objects and high allocation rates:
A high CPU issue is generally one of 3 things
An infinite loop
Too much load (i.e. too many requests doing lots of small things, so
no one single thing is a culprit)
Too much churning in the Garbage Collector.
Try monitoring the % Time in GC counter and the .NET CLR Memory / # Gen 0 Collections, # Gen 1 Collections and # Gen 2 Collections.
Are you calling GC.Collect() anywhere in your code?
The problem you are describing also sounds symptomatic of High CPU in .NET app using a static Generic.Dictionary, caused by multiple threads hitting the dictionary. If that's the problem:
To resolve this timing issue you should take special care to
synchronize (lock) around access to the dictionary if there is a
possibility that you may have multiple writers working at the same
time or if there is a possibility that you write while someone else is
reading/enumerating through the same dictionary.
Related: ASP.NET Performance Monitoring, and When to Alert Administrators
The quickest way is to enable Page Tracing. This will tell you exactly how long the page took to generate, and in which methods the server spent most of it's time. This should highlight any particularly slow-running methods, allowing you to focus your troubleshooting on the problematic sections.
Simply add the following to any pages you suspect of being troublesome:
<%# Page Trace="true" %>
Now, when you navigate to that page in your browser, you will get a detailed trace at the bottom of it.
Related
I have a web site and I am using iis as my web server. I noticed that on production server, the cpu reaches 95% usage pretty fast with very little users. this behaviour I don't see on my developement server. I am using visual studio to develop and iis as my local web server as well.
How much big traffic you have on production comparing to development server? How their parameters compare? Before starting a deep analysis of the application itself, I would identify all the infrastructure and environmental differences. Sometime such problems happens because of some other software, like antivirus software running in the background...
Nevertheless, because it sounds rather as a application problem, I would first check Event Viewer for errors. Then I would start from monitoring a few Performance Counters to correlate % Processor Time counter with Current Connections, Available Memory, # of Exceps Thrown / sec, % Time in GC and so on. This kind of behavior usually has a reason from the list:
excessive loops usage due to some logic error, like calling the same service again and again, trying to load or parse malfunctioned file etc. This can be analyzed with dump analysis (look below).
high CPU usage due to Garbage Collector - when memory usage is extensive (or there is a memory leak even) GC may start to consume more and more CPU fighting with the memory shortage. You will see this with memory-related performance counters.
a considerable amount of exceptions thrown (for example due to some environmental problems like network unavailability, production data difference) can also consume a lot of CPU. Event Viewer and exception-related performance counters (as they can be handled silently by your application) should be a indicator here.
To further analyze your application, I suggest to make a full memory dump during high CPU usage. You can do that with Debug Diag tool. Please refer this IIS troubleshooting guide for details.
First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
QUESTION ONE
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
QUESTION TWO
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
QUESTION THREE
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
I have other questions about how databases play into the equation, but I think this is enough for now...
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
Few facts.
node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
My suggestions.
use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
restart instances often, at least weekly
write poller that prints out memory stats with process module one per minute
Use supervisord and fabric for easy process management
Monitor cpu, reported memory stats and restart on threshold
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
How can you estimate the amount of users you expect?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
Even if I know the number of users, how can I estimate actual load?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
So the formula to work out daily server load is simple:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
And...
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.
I have a web application that hangs under high loads. I'm not going to go into the specifics of the code because I really just want some troubleshooting advice and tooling recommendations.
It's a web app, so each request get's a thread. Under a high load test, the app begins to consume all of the cpu, while becoming unresponsive. I suspect that the request threads are hanging in the new code that we are testing. Due to the fact of the cpu consumption, I'm assuming this must be on my app side. My understanding, which could be wrong, is that total cpu consumption indicated my first troubleshooting efforts should be in looking at the code that's consuming those cycles.
What are some tools and/or methods for inspecting which threads are hanging and on what lines of code? Again, I can easily force the app into the problematic behavior.
I've found and been trying out visualvm. Seems like the perfect tool. Still open for suggestions though. I looked at eclipse TPTP and it seems to be end-of-life-ing as well as requiring a more heavy weight deployment.
You can insert logging messages at starting a thread and closing a thread. Then you start the application and inspect the output while penetrating the code.
Another way is to look for memory leaks. If you are sure you haven't one, you can extend the virtual memory of your JVM.
#chad: do you have Database in whole picture...you may want to start by looking what is happening at DB side...you can very well look into DB locks, current sessions etc.
Maybe someone can shed some light on this simple question:
I have a .NET web application that has been thoroughly vetted. It loads a cache per appdomain (process) whenever one starts and can not fully reply to requests until it completes this cache loading.
I have been examining the settings on my application pools and have started wondering why I was even recycling so often (once every 1,000,000 calls or 2 hours).
What would prevent me from setting auto-recycles to being once every 24 hours or even longer? Why not completely remove the option and just recycle if memory spins out of control for the appdomain?
If your application runs reliably for longer then the threshold set for app pool recycling, then by all means increase the threshold. There is no downside if your app is stable.
For us, we have recycling turned off altogether, and instead have a task that loads a test page every minute and runs an iisreset if it fails to load five times in a row.
You should probably be looking at recycling from the point of view of reliability. Based on historical data, you should have an idea how much memory, CPU and so on your app uses, and the historical patterns and when trouble starts to occur. Knowing that, you can configure recycling to counter those issues. For example, if you know your app has an increasing memory usage pattern* that leads to the app running out of memory after a period of several days, you could configure it to recycle before that would have happened.
* Obviously, you would also want to resolve this bug if possible, but recycling can be used to increase reliability for the customer
The reason they do it is that an application can be "not working" even though it's CPU and memory are fine (think deadlock). The app recycling is a final failsafe measure which can protect flawed code from dying.
Also any code which has failed to implement IDisposable would run finalizers on the recycle which will possibly release held resources.
I have a webserver that is pegged and I've been able to isolate it to a particular website instance. I'd like to dig deeper and isolate the particular page/process that is causing the issue.. Any tips?
You can take a memory dump of the process and poke around with windbg.
There are posts on this issue from Tess Ferrandez blog. Just do as she say.
Which version of IIS are you using? Some of the higher ones allow for a separation of which process gets used to handle requests such as a worker process that you could isolate a bit more that way. I'd also suggest reading through the IIS logs to see what requests were being handled, how long they took, etc.
There are many different quirks to each IIS version. The really low ones just had a start/stop functionality, but the newer ones have really given administrators much more control and power, IMO.
You should try using a profiler to identify what is using up the most resources. I've used dotTrace Profiler, although that can be expensive if you're on a tight budget.
It allows you to see exactly what processes and method calls use of the most processing time of a request really well so you can isolate the most resource intensive operations.
You should really be able to use any profiler to do this, not just dotTrace. I just happen to only have experience with this one in particular.
Change your web garden setting to 10 or greater. Then watch your CPU and memory utilization on the web server.
Continue to increase the web garden setting until either the app is completely responsive with less than 5% average utilization OR you have actually maxed your web server's memory.
UPDATE
It's not about diagnosing, it's about properly configuring the IIS server. Web Gardens are one of the top misunderstood features of IIS. By increasing the available threads to handle new requests you remove the appearance of contention at the web server level and place it squarely where it belongs. In this case at your database. Instead of masking a problem it actually highlights exactly where the problem is.
This turned out to be a SQL problem (sql 2005). The solution was found by using SQL activity monitor to identify a suspended process with a Async_network_io wait type. We then ran SQL profiler to narrow it down to two massive queries which were returning an over abundance of results.