What is consuming memory in my Node JS application? - node.js

Background
I have a relatively simple node js application (essentially just expressjs + mongoose). It is currently running in production on an Ubuntu Server and serves about 20,000 page views per day.
Initially the application was running on a machine with 512 MB memory. Upon noticing that the server would essentially crash every so often I suspected that the application might be running out of memory, which was the case.
I have since moved the application to a server with 1 GB of memory. I have been monitoring the application and within a few minutes the application tends to reach about 200-250 MB of memory usage. Over longer periods of time (say 10+ hours) it seems that the amount keeps growing very slowly (I'm still investigating that).
I have been since been trying to figure out what is consuming the memory. I have been going through my code and have not found any obvious memory leaks (for example unclosed db connections and such).
Tests
I have implemented a handy heapdump function using node-heapdump and I have now enabled --expore-gc to be able to manually trigger garbage collection. From time to time I try triggering a manual GC to see what happens with the memory usage, but it seems to have no effect whatsoever.
I have also tried analysing heapdumps from time to time - but I'm not sure if what I'm seeing is normal or not. I do find it slightly suspicious that there is one entry with 93% of the retained size - but it just points to "builtins" (not really sure what the signifies).
Upon inspecting the 2nd highest retained size (Buffer) I can see that it links back to the same "builtins" via a setTimeout function in some Native Code. I suspect it is cache or https related (_cache, slabBuffer, tls).
Questions
Does this look normal for a Node JS application?
Is anyone able to draw any sort of conclusion from this?
What exactly is "builtins" (does it refer to builtin js types)?

Related

Why does w3wp memory keeps increasing?

I am on a medium instance which has 3GB of RAM. When I start my webapp the w3wp process starts with say 80MB. I notice that the more time passes this goes up and up.... Now I took a memory dump of the process when it was 570MB and the site was running for 5 days, to see whether there were any .NET objects which were consuming a lot but found out that the largest object was 18MB which were a set of string objects.
I am not using any cache objects since I'm using redis for my session storage, and in actual fact the dump showed that there was nothing in the cache.
Now my question is the following... I am thinking that since I have 3GB of memory IIS will retain some pages in memory (cached) so the website is faster whenever there are requests and that is the reason why the memory keeps increasing. What I'm concerned is that I am having some memory leak in some way, even if I am disposing all EntityFramework objects when being used, or any other appropriate streams which need to be disposed. When some specific threshold is reached I am assuming that old cached data which was in memory gets removed and new pages are included. Am I right in saying this?
I want to point out that in the past I had been on a small instance and the % never went more than 70% and now I am on medium instance and the memory is already 60%.... very very strange with the same code.
I can send memory dump if anyone would like to help me out.
There is an issue that is affecting a small number of Web Apps, and that we're working on patching.
There is a workaround if you are hitting this particular issue:
Go to Kudu Console for your app (e.g. https://{yourapp}.scm.azurewebsites.net/DebugConsole)
Go into the LogFiles folder. If you are running into this issue, you will have a very large eventlog.xml file
Make that file readonly, by running attrib +r eventlog.xml
Optionally, restart your Web App so you have a clean w3wp
Monitor whether the usage still goes up
The one downside is that you'll no longer get those events generated, but in most cases they are not needed (and this is temporary).
The problem has been identified, but we don't have an ETA for the deployment yet.

Why does Node.js have incremental memory usage?

I have a gameserver.js file that is well over 100 KB in size. And I kept checking my task manager after each refresh on my browser and kept seeing my node.exe memory usage keep rising for every refresh. I'm using the ws module here: https://github.com/websockets/ws and figured, you know what, there is most likely some memory leak in my code somewhere...
So to double check and isolate the issue I created a test.js file and put in the default ws code block:
var WebSocketServer = require('ws').Server
, wss = new WebSocketServer({ port: 9300 });
wss.on('connection', function connection(ws) {
ws.on('message', function incoming(message) {
console.log('received: %s', message);
});
});
And started it up:
Now, I check node.exe's memory usage:
The incremental part that makes me confused is:
If I refresh my browser that makes the connection to this port 9300 websocket server and then look back at my task manager.. it shows:
Which is now at: 14,500 K.
And it keeps on rising upon each refresh, so theoretically if I keep just refreshing it will go through the roof. Is this intended? Is there a memory leak in the ws module somewhere maybe? The whole reason I ask is because I thought maybe in a few minutes or when the user closes the browser it will go back down, but it doesn't.
And the core reason why I wanted to do this test because I figured I had a memory leak issue in my personal code somewhere and just wanted to check if it wasn't me, or vice versa. Now I'm stumped.
Seeing an increased memory footprint by a Node.js application is completely normal behaviour. Node.js constantly analyses your running code, generates optimised code, reverts to unoptimised code (if needed), etc. All this requires quite a lot of memory even for the most simple of applications (Node.js itself is from a large part written in JavaScript that follows the same optimisations/deoptimisations as your own code).
Additionally, a process may be granted more memory when it needs it, but many operating systems remove that allocated memory from the process only when they decide it is needed elsewhere (i.e. by another process). So an application can, in peaks, consume 1 GB of RAM, then garbage collection kicks in, usage drops to 500 MB, but the process may still keep the 1 GB.
Detecting presence of memory leaks
To properly analyse memory usage and memory leaks, you must use Node.js's process.memoryUsage().
You should set up an interval that dumps this memory usage into a file i.e. every second, then apply some "stress" on your application over several seconds (i.e. for web servers, issue several thousand requests). Then take a look at the results and see if the memory just keeps increasing or if it follows a steady pattern of increasing/decreasing.
Detecting source of memory leaks
The best tool for this is likely node-heapdump. You use it with the Chrome debugger.
Start your application and apply initial stress (this is to generate optimised code and "warm-up" your application)
While the app is idle, generate a heapdump
Perform a single, additional operation (i.e. one more request) that you suspect will likely cause a memory leak - this is probably the trickiest part especially for large apps
Generate another heapdump
Load both heapdumps into Chrome debugger and compare them - if there is a memory leak, you will see that there are some objects that were allocated during that single request but were not released afterwards
Inspect the object to determine where the leak occurs
I had the opportunity to investigate a reported memory leak in the Sails.js framework - you can see detailed description of the analysis (including pretty graphs, etc.) on this issue.
There is also a detailed article about working with heapdumps by StrongLoop - I suggest to have a look at it.
The garbage collector is not called all the time because it blocks your process. So V8 launches GC when it thinks it's necessary.
To find if you have a memory leak I propose to fire up the GC manually after every request just to see if your memory is still going up. Normally if you don't have a memory leak your memory should not increase. Because the GC will clean all non-used objects. If your memory is still going up after a GC call you have a memory leak.
To launch GC manually you can do that, but attention! Don't use this in production; this is just a way to cleanup your memory and see if you have a memory leak.
Launch Node.js like this:
node --expose-gc --always-compact test.js
It will expose the garbage collector and force it to be aggressive. Call this method to run the GC:
global.gc();
Call this method after each hit on your server and see if the GC clean the memory or not.
You can also do two heapdumps of your process before and after request to see the difference.
Don't use this in production or in your project. It is just a way to see if you have a memory leak or not.

What are some effective strategies to track down native memory leaks in a node.js process?

I've been trying to track down a very slow, but persistent, native memory leak in a node.js app, and I've run out of strategies.
The process has what appears to be a level heap, but as the hours and days roll on, the RSS of the node.js process slowly grows. The process is a job handler that runs the same type of job for different parameters, over and over. The growth of the RSS of the process takes the same shape as the line plotting the cumulative number of jobs run, so each job run is somehow leaking a bit of memory.
Since the heap is more or less constant, the standard heap inspection tools don't seem to be much help.
Here's an example of what the memory consumption looks like:
Currently running on node 0.8.7. Each job does a number of database reads/writes, communicates with a redis instance, and does some web requests using mikael/request.
Have you updated to the newest release?
I know everyone says that :), I just felt like I should join the band wagon of updating my version of node.js on my production servers every two weeks when I think I have an issue. Sounds like a great idea doesn't it?
So I have been wondering the same thing, I have several node.js projects that I have been managing for a few months now (and also that I wrote last year). It seems that very slowly the V8 engine, or my node application, just eats memory and never frees it. (its slow enough that I only have to restart them every now and then)
Which is very stressful, especially considering that it should free up the RSS memory, or eventually peak out.
If you are interested in tracking objects being leaked inside of the runtime (and by that i mean javascript objects, functions, etc), mozilla has a very complete blog post about tracking down memory leaks and a few links to projects that can be used to do this.
For what ever reason they don't have this one on the list though. (it seems simple enough, I'm trying it out now on my own projects to see if it works, I tend to not get any of the V8 based ones to compile correctly)
heapdump and here is a link to a how to guide.
From my own experience the V8 engine seems to allocate memory, and hold onto it just incase it needs that exact same memory chunk later. Also my brother who has been using Node.js heavily for about 3 years has seen the same thing.
Also just for completeness (I know you already have), if any one would like to verify that you are not leaking memory inside of V8, an engineer from joyent has a pretty decent write up of how to track V8 memory leaks down.

What are the most important statistics to look at when deploying a Node.js web-application?

First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
QUESTION ONE
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
QUESTION TWO
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
QUESTION THREE
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
I have other questions about how databases play into the equation, but I think this is enough for now...
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
Few facts.
node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
My suggestions.
use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
restart instances often, at least weekly
write poller that prints out memory stats with process module one per minute
Use supervisord and fabric for easy process management
Monitor cpu, reported memory stats and restart on threshold
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
How can you estimate the amount of users you expect?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
Even if I know the number of users, how can I estimate actual load?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
So the formula to work out daily server load is simple:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
And...
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.

Node.js app has periodic slowness and/or timeouts (does not accept incoming requests)

This problem is killing the stability of my production servers.
To recap, the basic idea is that my node server(s) sometimes intermittently slow down, sometimes resulting in Gateway Timeouts. As best as I can tell from my logs, something is blocking the node thread (meaning that the incoming request is not accepted), but I cannot for the life of me figure out what.
The problem ranges in severity. Sometimes what should be <100ms requests take ~10 seconds to complete; sometimes they never even get accepted by the node server at all. In short, it is as though some random task is working and blocking the node thread for a period of time, thus slowing down (or even blocking) incoming requests; the one thing I can say for sure is that the need-to-fix-symptom is a "Gateway Timeout".
The issue comes and goes without warning. I have not been able to correlate it against CPU usage, RAM usage, uptime, or any other relevant statistic. I've seen the servers handle a large load fine, and then have this error with a small load, so it does not even appear to be load-related. It is not unusual to see the error around 1am PST, which is the smallest load time of the day! Restarting the node app does seem to maybe make the problem go away for a while, but that really doesn't tell me much. I do wonder if it might be a bug in node.js... not very comforting, considering it is killing my production servers.
The first thing I did was to make sure I had upgraded node.js to the latest (0.8.12), as well as all my modules (here they are). Of course, I also have plenty of error catchers in place. I'm not doing anything funky like printing out lots to the console or writing to lots of files.
At first, I thought it was outbound HTTP requests blocking the incoming socket, because the express middleware was not even picking up the inbound request, but I gave up the theory because it looks like the node thread itself became busy.
Next, I went through all my code with JSHint and fixed literally every single warning, including a few accidental globals (forgetting to write "var") but this didn't help
After that, I assumed that perhaps I was running out of memory. But, my heap snapshots via nodetime are looking pretty good now (described below).
Still thinking that memory might be an issue, I took a look at garbage collection. I enabled the --nouse-idle-notification flag and did some more code optimization to NULL objects when they were not needed.
Still convinced that memory was the issue, I added the --expose-gc flag and executed the gc(); command every minute. This did not change anything, except to occasionally make requests a bit slower perhaps.
In a desperate attempt, I setup the "cluster" module to use 2 workers and automatically restart them every 30 min. Still, no luck.
I increased the ulimit to over 10,000 and kept an eye on the open files. There seem to be < 300 open files (or sockets) per node.js app, and increasing the ulimit thus had no impact.
I've been logging my server with nodetime and here's the jist of it:
CentOS 5.2 running on the Amazon Cloud (m1.large instance)
Greater than 5000 MB free memory at all times
Less than 150 MB heap size at all times
CPU usage is less than 60% at all times
I've also checked my MongoDB servers, which have <5% CPU usage and no requests are taking > 100ms to complete, so I highly doubt there's a bottleneck.
I've wrapped (almost) all my code using Q-promises (see code sample), and of course have avoided Sync() calls like the plague. I've tried to replicate the issue on my testing server (OSX), but have had little luck. Of course, this may be just because the production servers are being used by so many people in so many unpredictable ways that I simply cannot replicate via stress tests...
Many months after I first asked this question, I found the answer.
In a nutshell, the problem was that I was not piping a big asset when transferring it from one server to another. In other words, I was downloading an image from one server, before uploading it to a S3 bucket. Instead of streaming the download into the upload, I downloaded the file into memory, and then uploaded it.
I'm not sure why this did not show up as a memory spike, or elsewhere in my statistics.
My guess is Mongoose. If you are storing large payloads in Mongo, Mongoose can be pretty slow due to how it builds the Mongoose objects. See https://github.com/LearnBoost/mongoose/issues/950 for more details on the problem. If this is the problem you wouldn't see it in Mongo itself since the query returns quickly, but object instantiation could take 75x the query time.
Try setting up timers around (process.hrtime()) before and after you the Mongoose objects are being created to see if that might be the problem. If this is the problem, I would switch to using the node Mongo driver directly instead of going through Mongoose.
You are heavily leaking memory, try setting every object to null as soon as you don't need it anymore! Read this.
More information about hunting down memory leaks can be found here.
Give special attention to having multiple references to the same object and check if you have circular references, those are a pain to debug but will help you very much.
Try invoking the garbage collector manually every minute or so (I don't know if you can do this in node.js cause I'm more of a c++ and php coder). From my years of experience working with c++ I can tell you the most likely cause of your application slowing down over time is memory leaks, find them and plug them, you'll be ok!
Also assuming you're not caching and/or processing images, audio or video in memory or anything like that 150M heap is a lot! Those could be hundreds of thousands or even millions of small objects.
You don't have to be running out of memory for your application to slow down... just searching for free memory with that many objects already allocated is a huge job for the memory allocator, it takes a lot of time to allocate each new object and as you leak more and more memory that time only increases.
Is "--nouse-idle-connection" a mistake? do you really mean "--nouse_idle_notification".
I think it's maybe some issues about gc with too many tiny objects.
node is single process, so watch the most busy cpu core is much important than the load.
when your program is slow, you can execute "gdb node pid" and "bt" to see what node is busy doing.
What I'd do is set up a parallel node instance on the same server with some kind of echo service and test that one. If it runs fine, you narrow down your problem to your program code (and not a scheduler/OS-level problem). Then, step by step, include the modules and test again. Certainly this is a lot of work, takes long and I dont know if it is doable on your system.
If you need to get this working now, you can go the NASA redundancy route:
Bring up a second copy of your production servers, and put a proxy in front of them which routes each request to both stacks and returns the first response. I don't recommend this as a perfect long-term solution but it should help significantly reduce issues in production now, and help you gather log data that you could replay to recreate the issues on non-production servers.
Obviously, this is straight-forward for read requests, but more complex for commands which write to the db.
We have a similar problem with our Node.js server. It didn't scale well for weeks and we had tried almost everything as you had. Our problem was in the implicit backlog value which is set very low for high-concurrent environments.
http://nodejs.org/api/http.html#http_server_listen_port_hostname_backlog_callback
Setting the backlog to a significantly higher value (e.g. 10000) as well as tune networking in our kernel (/etc/sysctl.conf on Linux) as described in manual section helped a lot. From this time forward we don't have any timeouts in our Node.js server.

Resources