Quickly running into high memory usage Node.js - node.js

I am running a small Node.js server on a Heroku Hobby tier dyno. This tier allocates 512mb of RAM. This service takes in a moderate size of JSON data (~5000 JSON objects, each is ~5kb, 5000*5kb = 25mb), runs some analysis on it and outputs about 20 metrics. It's not a trivial amount of input data but it's also definitely not GBs of files. I'm confused where I'm running into the memory limits here. I run one continuous request for all 20 metrics. Does garbage collection not happen until the end of the request? I'm creating a lot of Date objects probably around 2 per JSON object so about 10,000 total and I do this upfront and save them for the whole process so I'm not constantly re-creating these dates. Could this be the issue I'm running into? Any suggestions for how to optimize there?
By the way, I know Node.js isn't the best tool for data processing in general and we're already looking to move to a Python based server for the libraries and multithreaded environment. But, until that's up and running, I'd love to be able to understand and improve the situation I'm currently running into on Node.js.
Thanks!

Related

How to check performance of downloading a file in nodejs

I've write a video streaming server with NodeJS recently.
I wanna to check the performance of streaming with a virtualizing scenario
for example I can run a thousand curl command to fetch a video and check cpu usage of the running nodejs process. but I don't know how to run curl parallel for virtualizing what happened when 1000 users stream my videos.
help me if you have another solution for this. I don't know how to check the performance of my server for many users.
You can kick off 1000 simultaneous curl commands using i.e. GNU parallel however I don't think it's the best way to do a proper load test because you won't have any performance metrics which can be analysed and correlated
Also sending 1000 requests is a good example of a spike test while "classic" load test would be:
Starting with 1 thread (virtual user)
Gradually increasing the load up to 1000 (or whatever is the anticipated number of users of your application)
Holding the load for a certain amount of time
Gradually decreasing the load to 0
This way you will be able to correlate the increasing load with increasing throughput (number of requests per second), response time, error rate, see whether application gets back to normal when the load decreases, are there memory leaks, etc.
So I would recommend going for a dedicated load testing tool which provides possibility to define flexible workload scenarios and outputs nice tables and charts allowing you to perform the test results analysis

What is consuming memory in my Node JS application?

Background
I have a relatively simple node js application (essentially just expressjs + mongoose). It is currently running in production on an Ubuntu Server and serves about 20,000 page views per day.
Initially the application was running on a machine with 512 MB memory. Upon noticing that the server would essentially crash every so often I suspected that the application might be running out of memory, which was the case.
I have since moved the application to a server with 1 GB of memory. I have been monitoring the application and within a few minutes the application tends to reach about 200-250 MB of memory usage. Over longer periods of time (say 10+ hours) it seems that the amount keeps growing very slowly (I'm still investigating that).
I have been since been trying to figure out what is consuming the memory. I have been going through my code and have not found any obvious memory leaks (for example unclosed db connections and such).
Tests
I have implemented a handy heapdump function using node-heapdump and I have now enabled --expore-gc to be able to manually trigger garbage collection. From time to time I try triggering a manual GC to see what happens with the memory usage, but it seems to have no effect whatsoever.
I have also tried analysing heapdumps from time to time - but I'm not sure if what I'm seeing is normal or not. I do find it slightly suspicious that there is one entry with 93% of the retained size - but it just points to "builtins" (not really sure what the signifies).
Upon inspecting the 2nd highest retained size (Buffer) I can see that it links back to the same "builtins" via a setTimeout function in some Native Code. I suspect it is cache or https related (_cache, slabBuffer, tls).
Questions
Does this look normal for a Node JS application?
Is anyone able to draw any sort of conclusion from this?
What exactly is "builtins" (does it refer to builtin js types)?

What are some effective strategies to track down native memory leaks in a node.js process?

I've been trying to track down a very slow, but persistent, native memory leak in a node.js app, and I've run out of strategies.
The process has what appears to be a level heap, but as the hours and days roll on, the RSS of the node.js process slowly grows. The process is a job handler that runs the same type of job for different parameters, over and over. The growth of the RSS of the process takes the same shape as the line plotting the cumulative number of jobs run, so each job run is somehow leaking a bit of memory.
Since the heap is more or less constant, the standard heap inspection tools don't seem to be much help.
Here's an example of what the memory consumption looks like:
Currently running on node 0.8.7. Each job does a number of database reads/writes, communicates with a redis instance, and does some web requests using mikael/request.
Have you updated to the newest release?
I know everyone says that :), I just felt like I should join the band wagon of updating my version of node.js on my production servers every two weeks when I think I have an issue. Sounds like a great idea doesn't it?
So I have been wondering the same thing, I have several node.js projects that I have been managing for a few months now (and also that I wrote last year). It seems that very slowly the V8 engine, or my node application, just eats memory and never frees it. (its slow enough that I only have to restart them every now and then)
Which is very stressful, especially considering that it should free up the RSS memory, or eventually peak out.
If you are interested in tracking objects being leaked inside of the runtime (and by that i mean javascript objects, functions, etc), mozilla has a very complete blog post about tracking down memory leaks and a few links to projects that can be used to do this.
For what ever reason they don't have this one on the list though. (it seems simple enough, I'm trying it out now on my own projects to see if it works, I tend to not get any of the V8 based ones to compile correctly)
heapdump and here is a link to a how to guide.
From my own experience the V8 engine seems to allocate memory, and hold onto it just incase it needs that exact same memory chunk later. Also my brother who has been using Node.js heavily for about 3 years has seen the same thing.
Also just for completeness (I know you already have), if any one would like to verify that you are not leaking memory inside of V8, an engineer from joyent has a pretty decent write up of how to track V8 memory leaks down.

What are the most important statistics to look at when deploying a Node.js web-application?

First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
QUESTION ONE
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
QUESTION TWO
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
QUESTION THREE
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
I have other questions about how databases play into the equation, but I think this is enough for now...
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
Few facts.
node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
My suggestions.
use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
restart instances often, at least weekly
write poller that prints out memory stats with process module one per minute
Use supervisord and fabric for easy process management
Monitor cpu, reported memory stats and restart on threshold
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
How can you estimate the amount of users you expect?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
Even if I know the number of users, how can I estimate actual load?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
So the formula to work out daily server load is simple:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
And...
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.

Node.js app has periodic slowness and/or timeouts (does not accept incoming requests)

This problem is killing the stability of my production servers.
To recap, the basic idea is that my node server(s) sometimes intermittently slow down, sometimes resulting in Gateway Timeouts. As best as I can tell from my logs, something is blocking the node thread (meaning that the incoming request is not accepted), but I cannot for the life of me figure out what.
The problem ranges in severity. Sometimes what should be <100ms requests take ~10 seconds to complete; sometimes they never even get accepted by the node server at all. In short, it is as though some random task is working and blocking the node thread for a period of time, thus slowing down (or even blocking) incoming requests; the one thing I can say for sure is that the need-to-fix-symptom is a "Gateway Timeout".
The issue comes and goes without warning. I have not been able to correlate it against CPU usage, RAM usage, uptime, or any other relevant statistic. I've seen the servers handle a large load fine, and then have this error with a small load, so it does not even appear to be load-related. It is not unusual to see the error around 1am PST, which is the smallest load time of the day! Restarting the node app does seem to maybe make the problem go away for a while, but that really doesn't tell me much. I do wonder if it might be a bug in node.js... not very comforting, considering it is killing my production servers.
The first thing I did was to make sure I had upgraded node.js to the latest (0.8.12), as well as all my modules (here they are). Of course, I also have plenty of error catchers in place. I'm not doing anything funky like printing out lots to the console or writing to lots of files.
At first, I thought it was outbound HTTP requests blocking the incoming socket, because the express middleware was not even picking up the inbound request, but I gave up the theory because it looks like the node thread itself became busy.
Next, I went through all my code with JSHint and fixed literally every single warning, including a few accidental globals (forgetting to write "var") but this didn't help
After that, I assumed that perhaps I was running out of memory. But, my heap snapshots via nodetime are looking pretty good now (described below).
Still thinking that memory might be an issue, I took a look at garbage collection. I enabled the --nouse-idle-notification flag and did some more code optimization to NULL objects when they were not needed.
Still convinced that memory was the issue, I added the --expose-gc flag and executed the gc(); command every minute. This did not change anything, except to occasionally make requests a bit slower perhaps.
In a desperate attempt, I setup the "cluster" module to use 2 workers and automatically restart them every 30 min. Still, no luck.
I increased the ulimit to over 10,000 and kept an eye on the open files. There seem to be < 300 open files (or sockets) per node.js app, and increasing the ulimit thus had no impact.
I've been logging my server with nodetime and here's the jist of it:
CentOS 5.2 running on the Amazon Cloud (m1.large instance)
Greater than 5000 MB free memory at all times
Less than 150 MB heap size at all times
CPU usage is less than 60% at all times
I've also checked my MongoDB servers, which have <5% CPU usage and no requests are taking > 100ms to complete, so I highly doubt there's a bottleneck.
I've wrapped (almost) all my code using Q-promises (see code sample), and of course have avoided Sync() calls like the plague. I've tried to replicate the issue on my testing server (OSX), but have had little luck. Of course, this may be just because the production servers are being used by so many people in so many unpredictable ways that I simply cannot replicate via stress tests...
Many months after I first asked this question, I found the answer.
In a nutshell, the problem was that I was not piping a big asset when transferring it from one server to another. In other words, I was downloading an image from one server, before uploading it to a S3 bucket. Instead of streaming the download into the upload, I downloaded the file into memory, and then uploaded it.
I'm not sure why this did not show up as a memory spike, or elsewhere in my statistics.
My guess is Mongoose. If you are storing large payloads in Mongo, Mongoose can be pretty slow due to how it builds the Mongoose objects. See https://github.com/LearnBoost/mongoose/issues/950 for more details on the problem. If this is the problem you wouldn't see it in Mongo itself since the query returns quickly, but object instantiation could take 75x the query time.
Try setting up timers around (process.hrtime()) before and after you the Mongoose objects are being created to see if that might be the problem. If this is the problem, I would switch to using the node Mongo driver directly instead of going through Mongoose.
You are heavily leaking memory, try setting every object to null as soon as you don't need it anymore! Read this.
More information about hunting down memory leaks can be found here.
Give special attention to having multiple references to the same object and check if you have circular references, those are a pain to debug but will help you very much.
Try invoking the garbage collector manually every minute or so (I don't know if you can do this in node.js cause I'm more of a c++ and php coder). From my years of experience working with c++ I can tell you the most likely cause of your application slowing down over time is memory leaks, find them and plug them, you'll be ok!
Also assuming you're not caching and/or processing images, audio or video in memory or anything like that 150M heap is a lot! Those could be hundreds of thousands or even millions of small objects.
You don't have to be running out of memory for your application to slow down... just searching for free memory with that many objects already allocated is a huge job for the memory allocator, it takes a lot of time to allocate each new object and as you leak more and more memory that time only increases.
Is "--nouse-idle-connection" a mistake? do you really mean "--nouse_idle_notification".
I think it's maybe some issues about gc with too many tiny objects.
node is single process, so watch the most busy cpu core is much important than the load.
when your program is slow, you can execute "gdb node pid" and "bt" to see what node is busy doing.
What I'd do is set up a parallel node instance on the same server with some kind of echo service and test that one. If it runs fine, you narrow down your problem to your program code (and not a scheduler/OS-level problem). Then, step by step, include the modules and test again. Certainly this is a lot of work, takes long and I dont know if it is doable on your system.
If you need to get this working now, you can go the NASA redundancy route:
Bring up a second copy of your production servers, and put a proxy in front of them which routes each request to both stacks and returns the first response. I don't recommend this as a perfect long-term solution but it should help significantly reduce issues in production now, and help you gather log data that you could replay to recreate the issues on non-production servers.
Obviously, this is straight-forward for read requests, but more complex for commands which write to the db.
We have a similar problem with our Node.js server. It didn't scale well for weeks and we had tried almost everything as you had. Our problem was in the implicit backlog value which is set very low for high-concurrent environments.
http://nodejs.org/api/http.html#http_server_listen_port_hostname_backlog_callback
Setting the backlog to a significantly higher value (e.g. 10000) as well as tune networking in our kernel (/etc/sysctl.conf on Linux) as described in manual section helped a lot. From this time forward we don't have any timeouts in our Node.js server.

Resources