"Spawn a thread"-like behaviour in node.js - multithreading

I want to add some admin utilities to a little Web app, such as "Backup Database". The user will click on a button and the HTTP response will return immediately, although the potentially long-running process has been started in the background.
In Java this would probably be implemented by spawning an independent thread, in Scala by using an Actor. But what's an appropriate idiom in node.js? (code snippet appreciated)
I'm now re-reading the docs, this really does seem a node 101 question but that's pretty much where I am on this...anyhow, to clarify this is the basic scenario :
function onRequest(request, response) {
doSomething();
response.writeHead(202, headers);
response.end("doing something");
}
function doSomething(){
// long-running operation
}
I want the response to return immediately, leaving doSomething() running in the background.
Ok, given the single-thread model of node that doesn't seem possible without spawning another OS-level ChildProcess. My misunderstanding.
In my code what I need for backup is mostly I/O based, so node should handle that in a nice async fashion. What I think I'll do is shift the doSomething to after the response.end, see how that behaves.

As supertopi said you could have a look at Child process. But I think it will hurt the performance of your server, if this happens a lot sequentially. Then I think you should queue them instead. I think you should have a look at asynchronous message queues to process your jobs offline(distributed). Some(just to name two) example of message queues are beanstalkd, gearman.

I don't see the problem. All you need to do is have doSomething() start an asynchronous operation. It'll return immediately, your onRequest will write the response back, and the client will get their "OK, I started" message.
function doSomething() {
openDatabaseConnection(connectionString, function(conn) {
// This is called some time later, once the connection is established.
// Now you can tell the database to back itself up.
});
}
doSomething won't just sit there until the database connection is established, or wait while you tell it to back up. It'll return right away, having registered a callback that will run later. Behind the scenes, your database library is probably creating some threads for you, to make the async work the way it should, but your code doesn't need to worry about it; you just return right away, send the response to the client right away, and the asynchronous code keeps running asynchronously.
(It's actually more work to make this run synchronously -- you would have to pass your response object into doSomething, and have doSomething do the response.end call inside the innermost callback, after the backup is done. Of course, that's not what you want to do here; you want to return immediately, which is exactly what your code will do.)

Related

NodeJS -- cost of promise chains in recurssion

I am trying to implement a couple of state handler funcitons in my javascript code, in order to perform 2 different distinct actions in each state. This is similar to a state design pattern of Java (https://sourcemaking.com/design_patterns/state).
Conceptually, my program need to remain connected to an elasticsearch instance (or any other server for that matter), and then parse and POST some incoming data to el. If there is no connection available to elasticsearch, my program would keep tring to connect to el endlessly with some retry period.
In a nutshell,
When not connected, keep trying to connect
When connected, start POSTing the data
The main run loop is calling itself recurssively,
function run(ctx) {
logger.info("run: running...");
// initially starts with disconnected state...
return ctx.curState.run(ctx)
.then(function(result) {
if (result) ctx.curState = connectedSt;
// else it remains in old state.
return run(ctx);
});
}
This is not a truly recursive fn in the sense that each invocation is calling itself in a tight loop. But I suspect it ends up with many promises in the chain, and in the long run it will consume more n more memory and hence eventually hang.
Is my assumption / understanding right? Or is it OK to write this kinda code?
If not, should I consider calling setImmediate / process.nextTick etc?
Or should I consider using TCO (Tail Cost Optimization), ofcourse I am yet to fully understand this concept.
Yes, by returning a new promise (the result of the recursive call to run()), you effectively chain in another promise.
Neither setImmediate() nor process.nextTick() are going to solve this directly.
When you call run() again, simply don't return it and you should be fine.

Node.js nested callbacks

I have a very basic question in node.js programming .
I am finding it interesting as I have started understanding it deeper.
I came across the following code :
Abc.do1(‘operation’,2, function(err,res1)
{
If(err) {
Callback(err);
}
def.do2(‘operation2’,function(l) {
}
}
My question is :
since def.do2 is written in the callback of abc.do1 ,
is it true that def.do2 will be executed after the ‘operation’ of abc.do1 is completed and the callback function gets called . If yes, is this a good programming practice , because we speak only asynchronous and non blocking code in node.js
Yes, you're correct. def.do2() is executed after abc.do1() is done. However, this done on purpose to make sure that do1() is done before do2() can start. If it's meant to be done in parallel, then do2() would have been outside of do1()'s callback. This code is not exactly blocking. after do1() is started, the code is still continuing on to execute everything below do1() functon (outside of do1's callback), just not do2() until do1() is done which is meant to be.
Yes you are absolutely correct and you have given a correct example of callback function.
The main browser process is a single threaded event loop. If you execute a long-running operation within a single-threaded event loop, the process "blocks". This is bad because the process stops processing other events while waiting for your operation to complete. 'alert' is one of the few blocking browser methods: if you call alert('test'), you can no longer click links, perform ajax queries, or interact with the browser UI.
In order to prevent blocking on long-running operations, the XMLHttpRequest provides an asynchronous interface. You pass it a callback to run after the operation is complete, and while it is processing it cedes control back to the main event loop instead of blocking.
There's no reason to use a callback unless you want to bind something to an event handler, or your operation is potentially blocking and therefore requires an asynchronous programming interface.
See this video
http://www.yuiblog.com/blog/2010/08/30/yui-theater-douglas-crockford-crockford-on-javascript-scene-6-loopage-52-min/

Why does Node.js have both async & sync version of fs methods?

In Node.js, I can do almost any async operation one of two ways:
var file = fs.readFileSync('file.html')
or...
var file
fs.readFile('file.html', function (err, data) {
if (err) throw err
console.log(data)
})
Is the only benefit of the async one custom error handling? Or is there really a reason to have the file read operation non-blocking?
These exist mostly because node itself needs them to load your program's modules from disk when your program starts. More broadly, it is typical to do a bunch a synchronous setup IO when a service is initially started but prior to accepting network connections. Once the program is ready to go (has it's TLS cert loaded, config file has been read, etc), then a network socket is bound and at that point everything is async from then on.
Asynchronous calls allow for the branching of execution chains and the passing of results through that execution chain. This has many advantages.
For one, the program can execute two or more calls at the same time, and do work on the results as they complete, not necessarily in the order they were first called.
For example if you have a program waiting on two events:
var file1;
var file2;
//Let's say this takes 2 seconds
fs.readFile('bigfile1.jpg', function (err, data) {
if (err) throw err;
file1 = data;
console.log("FILE1 Done");
});
//let's say this takes 1 second.
fs.readFile('bigfile2.jpg', function (err, data) {
if (err) throw err;
file2 = data;
console.log("FILE2 Done");
});
console.log("DO SOMETHING ELSE");
In the case above, bigfile2.jpg will return first and something will be logged after only 1 second. So your output timeline might be something like:
#0:00: DO SOMETHING ELSE
#1:00: FILE2 Done
#2:00: FILE1 Done
Notice above that the log to "DO SOMETHING ELSE" was logged right away. And File2 executed first after only 1 second... and at 2 seconds File1 is done. Everything was done within a total of 2 seconds though the callBack order was unpredictable.
Whereas doing it synchronously it would look like:
file1 = fs.readFileSync('bigfile1.jpg');
console.log("FILE1 Done");
file2 = fs.readFileSync('bigfile2.jpg');
console.log("FILE2 Done");
console.log("DO SOMETHING ELSE");
And the output might look like:
#2:00: FILE1 Done
#3:00: FILE2 Done
#3:00 DO SOMETHING ELSE
Notice it takes a total of 3 seconds to execute, but the order is how you called it.
Doing it synchronously typically takes longer for everything to finish (especially for external processes like filesystem reads, writes or database requests) because you are waiting for one thing to complete before moving onto the next. Sometimes you want this, but usually you don't. It can be easier to program synchronously sometimes though, since you can do things reliably in a particular order (usually).
Executing filesystem methods asynchronously however, your application can continue executing other non-filesystem related tasks without waiting for the filesystem processes to complete. So in general you can continue executing other work while the system waits for asynchronous operations to complete. This is why you find database queries, filesystem and communication requests to generally be handled using asynchronous methods (usually). They basically allow other work to be done while waiting for other (I/O and off-system) operations to complete.
When you get into more advanced asynchronous method chaining you can do some really powerful things like creating scopes (using closures and the like) with a little amount of code and also create responders to certain event loops.
Sorry for the long answer. There are many reasons why you have the option to do things synchronously or not, but hopefully this will help you decide whether either method is best for you.
The benefit of the asynchronous version is that you can do other stuff while you wait for the IO to complete.
fs.readFile('file.html', function (err, data) {
if (err) throw err;
console.log(data);
});
// Do a bunch more stuff here.
// All this code will execute before the callback to readFile,
// but the IO will be happening concurrently. :)
You want to use the async version when you are writing event-driven code where responding to requests quickly is paramount. The canonical example for Node is writing a web server. Let's say you have a user making a request which is such that the server has to perform a bunch of IO. If this IO is performed synchronously, the server will block. It will not answer any other requests until it has finished serving this request. From the perspective of the users, performance will seem terrible. So in a case like this, you want to use the asynchronous versions of the calls so that Node can continue processing requests.
The sync version of the IO calls is there because Node is not used only for writing event-driven code. For instance, if you are writing a utility which reads a file, modifies it, and writes it back to disk as part of a batch operation, like a command line tool, using the synchronous version of the IO operations can make your code easier to follow.

How does the stack work on NodeJs?

I've recently started reading up on NodeJS (and JS) and am a little confused about how call backs are working in NodeJs. Assume we do something like this:
setTimeout(function(){
console.log("World");
},1000);
console.log("Hello");
output: "Hello World"
So far on what I've read JS is single threaded so the event loop is going though one big stack, and I've also been told not to put big calls in the callback functions.
1)
Ok, so my first question is assuming its one stack does the callback function get run by the main event loop thread? if so then if we have a site with that serves up content via callbacks (fetches from db and pushes request), and we have 1000 concurrent, are those 1000 users basically being served synchronously, where the main thread goes in each callback function does the computation and then continues the main event loop? If this is the case, how exactly is this concurrent?
2) How are the callback functions added to the stack, so lets say my code was like the following:
setTimeout(function(){
console.log("Hello");
},1000);
setTimeout(function(){
console.log("World");
},2000);
then does the callback function get added to the stack before the timeout even has occured? if so is there a flag that's set that notifies the main thread the callback function is ready (or is there another mechanism). If this is infact what is happening doesn't it just bloat the stack, especially for large web applications with callback functions, and the larger the stack the longer everything will take to run since the thread has to step through the entire thing.
The event loop is not a stack. It could be better thought of as a queue (first in/first out). The most common usage is for performing I/O operations.
Imagine you want to read a file from disk. You start by saying hey, I want to read this file, and when you're done reading, call this callback. While the actual I/O operation is being performed in a separate process, the Node application is free to keep doing something else. When the file has finished being read, it adds an item to the event loop's queue indicating the completion. The node app might still be busy doing something else, or there may be other items waiting to be dequeued first, but eventually our completion notification will be dequeued by node's event loop. It will match this up with our callback, and then the callback will be called as expected. When the callback has returned, the event loop dequeues the next item and continues. When there is nothing left in node's event loop, the application exits.
That's a rough approximation of how the event loop works, not an exact technical description.
For the specific setTimeout case, think of it like a priority queue. Node won't consider dequeuing the item/job/callback until at least that amount of time has passed.
There are lots of great write-ups on the Node/JavaScript event loop which you probably want to read up on if you're confused.
callback functions are not added to caller stack. There is no recursion here. They are called from event loop. Try to replace console.log in your example and watch result - stack is not growing.

What is a good way to exit a node.js script after "everything has been done"

My node.js script reads rows from a table in database 1, does some processing and writes the rows to database 2.
The script should exit after everything is done.
How can I know if everything has been done and exit node then?
If I have a callback function like this:
function exit_node() {
process.exit();
}
(Edit: in the meantime it became obvious that process.exit() could be also replaced with db.close() - but this is not the question what to put in there. The question is at which time exactly to do this, i.e. how and where to execute this callback.)
But it is not easy to attach it somewhere. After the last read from db1 is not correct, because the processing and writing still has to happen.
Attaching it to the write to db2 is not easy, because it has to be attached after the last write, but each write is indepenent and does not know if it is the last write.
It could also theoretically happen, that the last write finished, but another write before that is still executing.
Edit: Sorry, I can see the explanation of the question is not complete and probably confusing, but some people still understood and there are good answers below. Please continue reading the comments and the answers and it should give you the whole picture.
Edit: I could think of some "blocking" controller mechanism. The different parts of the script add blockers to the controller for each open "job" and release them after the job is finished, and the controller exits the script when no more bockers are present. Maybe async could help: https://github.com/caolan/async
I also fear this would blow up the code and the logic unreasonable.
JohnnyHK gives good advice; except Node.js already does option 2 for you.
When there is no more i/o, timers, intervals, etc. (no more work expected), the process will exit. If your program does not automatically exit after all its work is done, then you have a bug. Perhaps you forgot to close a DB connection, or to clearTimeout() or clearInterval(). But instead of calling process.exit() you might take this opportunity to identify your leak.
The two main options are:
Use an asynchronous processing coordination module like async (as you mentioned).
Keep your own count of outstanding writes and then exit when the count count reaches 0.
Though using counter could be tempting (simple, easily implementable idea), it will pollute your code with unrelated logic.
db1.query(selector, function(err, results) {
db1.close();
// let's suppose callback gets array of results;
// once all writes are finished, `db2.close` will be called by async
async.forEach(results, processRow, db2.close);
});
// async iterator
function processRow(row, cb) {
// modify row somehow
var newRow = foo(row);
// insert it in another db;
// once write is done, call `cb` to notify async
db2.insert(newRow, cb);
}
Although, using process.exit here feels like C++ new without delete for me. Bad comparison maybe, but can't help it :)

Resources