I have what seems to be a huge memory leak in my node.js app, but when I use node-webkit-agent to examine the heap, it seems quite small. I suspect that there may be a whole ton of async operations queued up on the node.js event queue, but I'm not sure how to examine that. Is there any way to do so?
You can check open handles and requests by inspecting the return values of the undocumented functions process._getActiveHandles() and process._getActiveRequests() respectively. That won't get you everything that may be waiting in the event loop, but it should help.
Related
I'm pretty new to node and am trying to setup an express server. The server will handle various requests and if any of them fail call a common failure function that will send e-mail. If I was doing this in something like Java I'd likely use something like a synchronized block and a boolean to allow the first entrance into the code to send the mail.
Is there anything like a synchronized block in Node? I believe node is single threaded and has a few helper threads to handle asyncronous/callback code. Is it at all possible that the same line of code could run at exactly the same time in Node?
Thanks!
Can the same line of Node.js code run at the same time? Is it at all possible that the same line of code could run at exactly the same time in Node?
No, it is not. Your Javascript in node.js is entirely single threaded. An event is pulled from the event queue. That calls a callback associated with that event. That callback runs until it returns. No other events can be processed until that first one returns. When it returns, the interpreter pulls the next event from the event queue and then calls the callback associated with it.
This does not mean that there are not concurrency issues in node.js. There can be, but it is caused not by code running at the same physical time and creating conflicting access to shared variables (like can happen in threaded languages like Java). Concurrency issues can be caused by the asynchronous nature of I/O in node.js. In the asynchronous case, you call an asynchronous function, pass it a callback (or expect a promise in return). Your code then continues on and returns to the interpreter. Some time later an event will occur inside of node.js native code that will add something to the event queue. When the interpreter is free from running other Javascript, it will process that event and then call your callback which will cause more of your code to run.
While all this is "in process", other events are free to run and other parts of your Javascript can run. So, the exposure to concurrency issues comes, not from simultaneous running of two pieces of code, but from one piece of your code running while another piece of your code is waiting for a callback to occur. Both pieces of code are "in process". They are not "running" at the same time, but both operations are waiting from something else to occur in order to complete. If these two operations access variables in ways that can conflict with each other, then you can have a concurrency issue in Javascript. Because none of this is pre-emptive (like threads in Java), it's all very predictable and is much easier to design for.
Is there anything like a synchronized block in Node?
No, there is not. It is simply not needed in your Javascript code. If you wanted to protect something from some other asynchronous operation modifying it while your own asynchronous operation was waiting to complete, you can use simple flags or variables in your code. Because there's no preemption, a simple flag will work just fine.
I believe node is single threaded and has a few helper threads to handle asyncronous/callback code.
Node.js runs your Javascript as single threaded. Internally in its own native code, it does use threads in order to do its work. For example, the asynchronous file system access code internal to node.js uses threads for disk I/O. But these threads are only internal and the result of these threads is not to call Javascript directly, but to insert events in the event queue and all your Javascript is serialized through the event queue. Pull event from event queue, run callback associated with the event. Wait for that callback to return. Pull next event from the event queue, repeat...
The server will handle various requests and if any of them fail call a common failure function that will send e-mail. If I was doing this in something like Java I'd likely use something like a synchronized block and a boolean to allow the first entrance into the code to send the mail.
We'd really have to see what your code looks like to understand what exact problem you're trying to solve. I'd guess that you can just use a simple boolean (regular variable) in node.js, but we'd have to see your code to really understand what you're doing.
I'm using process.on('uncaughtException') to catch any exceptions that would unexpectedly come up. In the function I write the data to a file, send an email, in the future it will likely do more.
Is there a way I can encapsulate the process.on() event in a file, then somehow require it in all the files that make up the application so I don't need to add that chunk of code to each file?
Node normally runs in a single process, so you only need process.on('uncaughtException') in one place.
The exception is if you use the cluster module or otherwise spawn other node processes, in which case you need the process.on('uncaughtException') loaded once for each process, but still not once for each file.
(Be careful doing too much in this handler, because by this point the process is considered unstable. I'm also not sure if async work is guaranteed to be run. The docs say that the correct use of 'uncaughtException' is to perform synchronous cleanup of allocated resources.)
I've been doing a fair amount of work with Node lately, trying to build a system which has certain characteristics, one of which is non-blocking / parallelism - a Node strong suit, as I understand it.
What I don't fully understand is when a separate thread is spun off to handle some processing. I'm pretty sue this happens on a function call/call back, but certainly not all of them.
In my specific case, it's an Express based app. At app start-up it does several things including instantiating a RabbitMQ based "bus", an object with a method which will write to the bus (objA) and object which will subscribe to the bus and process messages coming across it (objB).
objA will write to the bus inside an express callback
app.put((req,res) => {
objA.methodWhichWritesToBus();
});
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
Is that the only point at which this sort of thing happens? methodWhichWritesToBus is IO instensive (it calls an elastic search service on another box and brings back 10's to 100's of thousands of records) with lots of chained promises etc., but none of that gets split off, does it?
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Finally, are the ways to effect/force a method etc to "run in the background"?
I've been noodling this, testing it, for awhile now but all on one machine so it's difficult to tell what's going on.
Who can clarify this for me?
Pre-answer: this is a topic best learned by going and reading, doing coding exercises to solidify your understanding, and working with the technology in a significant way. You're not going to "get it" based on a Q&A format. That said...
What I don't fully understand is when a separate thread is spun off to handle some processing.
Never, sort of. "Processing" as in the computation that happens in your javascript program, happens in the main event loop thread. End of story. However, waiting on I/O to come back from the OS is not considered "processing" so there are various queues managed by node and the OS to track pending I/O requests and invoke callbacks when data is ready. There are a handful of threads node uses internally to manage this stuff with the OS, but from your program's perspective, those threads are irrelevant. Your program can ask node to do some IO, then your program keeps running in parallel, and when the I/O is done, node will eventually invoke the callback in the main event loop and you can process the results.
I believe at this point, that objA.methodWhichWritesToBus is executed in a background/worker thread - whatever you call it, not on the main event loop.
You call it "asynchronously" and it happens whenever you do IO, including filesystem calls, networking, or child processes. Which is to say, quite a lot.
How about the fact that the obj on which the method is called is instantiated outside the Express callback - does that affect the parallel-ism?
Nope.
Finally, are the ways to effect/force a method etc to "run in the background"?
Generally I/O is done asynchronously by default, so no you don't normally need to force anything to run in the background. It's baked into the node design by way of the node core APIs themselves. However, there are ways to delay synchronous processing to a future event loop using setImmediate, setTimeout, or process.nextTick. I explain these in some detail in my blog post setTimeout and friends.
More precisely, all networking is asynchronous. End of story. Specifically, the APIs in node core that are available are all asynchronous, and there's simply no synchronous API available in node. For filesystem IO and child processes, there are both synchronous and asynchronous APIs, but the synchronous APIs must only be used under special limited circumstances, and if you don't know confidently that it's OK in this specific case to make a synchronous IO API call, you should use the asynchronous API so you don't break the lynchpin that makes node perform as it does.
Imagine you want to download an image or a file, this would be the first way the internet will teach you to go ahead:
request(url, function(err, res, body) {
fs.writeFile(filename, body);
});
But doesn't this accumulate all data in body, filling the memory?
Would a pipe be totally more efficient?
request(url).pipe(fs.createWriteStream(filename));
Or is this handled internally in a similar matter, buffering the stream anyway, making this irrelevant?
Furthermore, if I want to use the callback but not the body (because you can still pipe), will this memory buffer still be filled?
I am asking because the first (callback) method allows me to chain downloads in stead of launching them in parallel(*), but I don't want to fill a buffer I'm not gonna use either. So I need the callback if I don't want to resort to something fancy like async just to use queue to prevent this.
(*) Which is bad because if you just request too many files before they are complete, the async nature of request will cause node to choke to death in an overdose of events and memory loss. First you'll get these:
"possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit."
And when stretching it, 500 piped requests will fill your memory up and crash node. That's why you need the callback in stead of the pipe, so you know when to start the next file.
But doesn't this accumulate all data in body, filling the memory?
Yes, many operations such as your first snippet buffer data into memory for processing. Yes this uses memory, but it is at least convenient and sometimes required depending on how you intend to process that data. If you want to load an HTTP response and parse the body as JSON, that is almost always done via buffering, although it's possible with a streaming parser, it is much more complicated and usually unnecessary. Most JSON data is not sufficiently large such that streaming is a big win.
Or is this handled internally in a similar matter, making this irrelevant?
No, APIs that provide you an entire piece of data as a string use buffering and not streaming.
However, multimedia data, yes, you cannot realistically buffer it to memory and thus streaming is more appropriate. Also that data tends to be opaque (you don't parse it or process it), which is also good for streaming.
Streaming is nice when circumstances permit it, but that doesn't mean there's anything necessarily wrong with buffering. The truth is buffering is how the vast majority of things work most of the time. In the big picture, streaming is just buffering 1 chunk at a time and capping them at some size limit that is well within the available resources. Some portion of the data needs to go through memory at some point if you are going to process it.
Because if you just request too many files one by one, the async nature of request will cause node to choke to death in an overdose of events and memory loss.
Not sure exactly what you are stating/asking here, but yes, writing effective programs requires thinking about resources and efficiency.
See also substack's rant on streaming/pooling in the hyperquest README.
I figured out a solution that renders the questions about memory irrelevant (although I'm still curious).
if I want to use the callback but not the body (because you can still pipe), will this memory buffer still be filled?
You don't need the callback from request() in order to know when the request is finished. The pipe() will close itself when the stream 'ends'. The close emits an event and can be listened for:
request(url).pipe(fs.createWriteStream(filename)).on('close', function(){
next();
});
Now you can queue all your requests and download files one by one.
Of course you can vacuum the internet using 8 parallel requests all the time with libraries such as async.queue, but if all you want to do is get some files with a simple script, async is probably overkill.
Besides, you're not gonna want to max out your system resources for a single trick on a multi-user system anyway.
So, after reading a little about non-blocking code, does...
response.write(thisWillTakeALongTime());
...block the process? If so, do we need to pass the response into pretty much every slow function call we make, and have that function handle the response?
Thanks for helping to clarify!
Yes, it will block the event loop. And passing the response object into the slow function won't help, no matter where you call the slow function you will be blocking the event loop.
As to how to fix it, we will need more information.
What is making your slow function slow? Are you performing large calculations?
Are you doing sync versions of file/database calls?
It depends on what you mean by process. The web server has already finished serving the page at this point you js will execute however the request is synchronous so the javascript will continue to devote its in your function until it returns regardless even if take years. (hopefully by this point the browser will detect your script is taking too long and give you the opportunity to kill it). Even still you suffer the embarrassment of the user having to kill your javascript functionality and them not being able to use the page.
So how do you solve the problem. The time when this gets particular important is when your js is making the problem because at the point a whole host of things of things can go wrong. Imagine that your user is on the other side of the earth. the network latency could make your js painfully slow. when using ajax its preferable to use Asynchronous requests which get around this. I personally recommend using jquery as it makes async ajax calls really easy and the documentation on the side is quite straight forward. The other thing I recommend is making the return output small. It made be better to return json output and build the needed html from that.