Does async.parallel also parallelise blocking code? - node.js

I'm trying to understand the behaviour of the async package in relation to parallelisation of code. From what I understand, it is helpful to structure code in order to avoid callback hell/soup, but is that the only advantage?
async.parallel([
function(next) {
apiCallTo3rdParty(next);
},
function(next) {
apiCallTo3rdParty(next);
},
function(next) {
apiCallTo3rdParty(next);
}
], function(err, res) {
// do something else with returned data
});
In the code above, all three api calls would be made without waiting for the other to complete. As per spec. The final callback would be called after the longest API call returns, as the rest would have completed prior.
However, if I changed the code so that it made a couple blocking operations, what would happen?:
async.parallel([
function(next) {
sleep(5);
},
function(next) {
sleep(5);
},
function(next) {
sleep(5);
}
], function(err, res) {
// do something else with returned data
});
Taking "Node.js is Single Threaded" at face value, we'd take it to mean that the async block would execute its final callback after fifteen seconds, but knowing that "Node.js maintains its own internal thread pool", can we assume that Node would run each callback in its own thread, truly parallelise the functions, and call the final callback after 5 seconds?

Short answer: your code would take 15 seconds to execute. Why?
There's no way to run parallel code in javascript. The only way to run synchronous code without blocking the event loop would be to spawn a Worker. Since the async module doesn't care about workers, it would have to wait for your functions to be done before it can call the next one.
Consider the following:
sleep(5)
sleep(5);
sleep(5);
Obviously it would take 15 seconds. Now even if you make this "parallel" the way async does:
function parallel () {
process.nextTick(function(){
sleep(5);
});
process.nextTick(function(){
sleep(5);
});
}
parallel();
console.log("this is called immediately");
process.nextTick(function(){
console.log("this is called 10 secs later");
});
The code is executed immediately, but as soon as the event loop yields to those queued up methods, they are gonna block other code from executing.

Related

node js callback - Will the order always be the same?

I wrote the following function that takes a callback. I always thought that the content of a callback might be executed later. In this case it doesn't...
doesSomething(function(){
console.log("1");
var i = 0;
while (i < 10000)
{
console.log("hello");
i = i + 1;
}
});
console.log("2");
console.log("3");
Whatever I do, "2" and "3" always comes after "1" and a thousand of "hello".
Like this:
1
hello
hello
...
hello
hello
2
3
What I thought it would do:
2
3
1
hello
hello
...
hello
hello
Even if this behaviour makes my life simpler I don't really understand why the execution is procedural.
Do you think that in some case it might go reverse ?
It all depends upon how doSomething() calls its callback. If it calls the callback synchronously (e.g. before it returns), then everything in that function will execute before doSomething() returns. If it calls it asynchronously (sometime after it returns), then you will get a different order.
So, the order is determined by the code that you do not show us in doSomething().
Based on the order you observe, doSomething() must be calling its callback synchronously and thus it will execute in order just like any other synchronous function call.
For example, here are two scenarios:
function doSomething(callback) {
callback();
}
This calls the callback passed to it synchronously and thus it will execute before doSomething() returns and thus it will execute before code that follows.
Whereas something like this;
function doSomething(callback) {
fs.writeFile('foo.txt', callback);
}
or:
function doSomething(callback) {
setTimeout(callback, 50);
}
Will execute the callback asynchronously sometime later after the function has already returned and you will see a different execution order with your console.log() statements.
From the output that you mention, I can say that there is nothing asynchronous happening in your code inside doesSomething and is probably looking like this:
function doesSomething(callback) {
<maybe some synchronous operations, e.g. no ajax or filesystem calls>
callback();
}
, so the order of the calling functions will be always the same, like you posted above:
doesSomething(callback)
console.log(2)
console.log(3)
If you want the order you mention you should call them like this:
console.log(2)
console.log(3)
doesSomething(callback)
This does not have to do something with node.js in particular. It is the way methods are executed in JavaScript, which is synchronous since it is single threaded.
When you call the method doesSomething(callback) it immediately executes it and nothing else until it completes
javascript is a single threaded language, but not the node.js runtime or the browser. There are certain functions provided by node.js or the browser that would trigger for a function to be assigned to a task queue to be processed in a separate thread
lets take your example and make dosomething async
this is your synchronous code that would print, 1 hello...,2,3
function doesSomething(callback) {
callback();
}
doesSomething(function(){
console.log("1");
var i = 0;
while (i < 3)
{
console.log("hello");
i = i + 1;
}
});
console.log("2");
console.log("3");
and this is the async code, that prints out 3, 2, 1, hello ...
function doesSomething(callback) {
setTimeout(callback, 0);
}
doesSomething(function(){
console.log("1");
var i = 0;
while (i < 3)
{
console.log("hello");
i = i + 1;
}
});
console.log("2");
console.log("3");
to explain why the second code is async, you have to understand that setTimeout is not part of javascript, its an api provided by node.js and the browsers. setTimeout puts the callback function into a queue to be processed. At this time a separate thread runs the setTimeout and when the timer ends it puts the callback into a callback queue, and when the call stack is clear, whatever is in the callback queue will be moved to the call stack and processed.
in node you can use process.nextTick(yourFunction); to make it async, this is just another function that nodejs provides you which is better than using setTimeout (which I wont get into here) you can checkout https://howtonode.org/understanding-process-next-tick to understand more about process.nextTick
for more info check out this video https://youtu.be/8aGhZQkoFbQ?t=19m25s

how to use async.parallelLimit to maximize the amount of (parallel) running processes?

Is it possible to set a Limit to parallel running processes with async.parallelLimit ? I used following very simple code to test how it works.
var async = require("async");
var i = 0;
function write() {
i++;
console.log("Done", i);
}
async.parallelLimit([
function(callback) {
write();
callback();
}
], 10, function() {
console.log("finish");
});
Of course everything what I got back was this:
Done 1
finish
In my case I have a function wich is called very often, but I only want it to run only 5 times simultaneously. The other calls should be queued. (Yes I know about async.queue but, this is not what I want in this question).
So now the question is, is this possible with async.parallelLimit?
//EDIT:
I actually have something similar to this:
for(var i = 0; i < somthing.length; i++) { //i > 100 for example
write();
}
And 'cause of the non synchronicity of node.js this could run 100 times at the same time. But how shell I limit the parallel running processes in this case?
Very short answer; Yes. That's exactly what asyncParallelLimit does.
In your case, you are passing only one function to parallelLimit. That's why it only get's called once. If you were to pass an array with this same function many times, it will get executed as many times as you put it in the array.
Please note that your example function doesn't actually do any work asynchronously. As such, this example function will always get executed in series. If you have a function that does async work, for example a network request or file i/o, it will get executed in parallel.
A better example-function for a async workload would be:
function(callback) {
setTimeout(function(){
callback();
}, 200);
}
For completion, to add to the existing answer, if you want to run the same function multiple times in parallel with a limit, here's how you do it:
// run 'my_task' 100 times, with parallel limit of 10
var my_task = function(callback) { ... };
var when_done = function(err, results) { ... };
// create an array of tasks
var async_queue = Array(100).fill(my_task);
async.parallelLimit(async_queue, 10, when_done);

node.js setImmediate unexpected behavior

As I understand, the differences between process.nextTick() and setImmediate() are the followings:
callbacks scheduled by process.nextTick() will ALL be executed before entering the next event loop, while callbacks scheduled by setImmediate() will only be executed ONE per event loop.
Base on the characteristics stated above, it can be said that: recursive call of process.nextTick() can cause the program to hang up, while recursive call of setImmediate() will not.
Then I've written some testing code to verify the statements above, here is my code:
process.nextTick(function() {
console.log('nextTick1');
});
process.nextTick(function() {
console.log('nextTick2');
});
setImmediate(function() {
console.log('setImmediate1');
process.nextTick(function() {
console.log('nextTick3');
});
});
setImmediate(function() {
console.log('setImmediate2');
});
My expected result should be
nextTick1, nextTick2, setImmediate1, nextTick3, setImmediate2
, but what I actually got is
nextTick1, nextTick2, setImmediate1, setImmedate2, nextTick3
Then I've run another test to study the behavior of setImmediate():
//the following code is using express as the framework
app.get('/test', function (req, res) {
res.end('The server is responding.');
});
app.get('/nexttick', function (req, res) {
function callback() {
process.nextTick(callback);
}
callback();
res.end('nextTick');
});
app.get('/setimmediate', function (req, res) {
function callback() {
setImmediate(callback);
}
callback();
res.end('setImmediate');
});
Step1: I accessed http://localhost:3000/nexttick on my browser and I got the text nexttick.
Step2: I accessed http://localhost:3000/test to test if my server is still responding to requests, and I didn't get any response, the browser kept waiting for response. This is nothing surprising because the recursive call of process.nextTick() had hanged up the server.
Step3: I restarted my server.
Step4: I accessed http://localhost:3000/setimmediate and got the text setimmediate
Step5: I accessed http://localhost:3000/test again to test if my server is still responding. And the result is as the same as Step2, I didn't get any response and the browser kept waiting for response.
This means the behavior of process.nextTick() and setImmediate() is pretty much the same but as I know they are not.
I was wondering why is this happening or have I misunderstood sth. Thank you.
P.S. I am running Node v0.12.7 under Windows 10.
Actually your test results looks ok (I only refer to the first test).
Your comments "setImmediate() will only be executed ONE per event loop." is wrong, See https://nodejs.org/api/timers.html#timers_setimmediate_callback_arg
and thats the reason you don't understand the first test results.
Basically process.nextTick will put your function at the end of the current event loop while setImmediate will put it at the next event loop.
But executing multiple setImmediate is ok and they will all go to that same next event loop.
In
setImmediate(function() {
console.log('setImmediate1');
process.nextTick(function() {
console.log('nextTick3');
});
});
setImmediate(function() {
console.log('setImmediate2');
});
Basically you have put 2 function sthat will be invoked in next event loop, and in one of them you put another nextTick which will be invoked at the end of that same cycle but at the end of that same cycle.
So it will first invoke those 2 setImmediate and than the nextTick that was called in the first.

Stopping the parallel execution

I have four functions which are running parallely. If any one function fails in between, how can I stop the execution of other functions. Any help on this will be really helpful.
Except if it's a setTimeout or setInterval, I think you can't. You can, anyway, set checkpoints in the logic of the functions. It's not clean a all, but it can work. For example, checking the other function in the callback:
var control = true;
async1(function(e,r){
if(e) {
control = false;
return callback1(e,r);
};
if(control) callback1(e,r);
});
async2(function(e,r){
if(e) {
control = false;
return callback2(e,r);
};
if(control) callback2(e,r);
});
Althougt to do this, I would go with throrin19 and say that Async it's a nice lib to do this.
But maybe you would like to check co. It could handle your problem better than Async.
You can't stop execution of a function. Functions are executed and return synchronously, so there is technically nothing to stop. But there can be asynchronous tasks, which use underlying libuv capabilites somehow (that is, you can't do anything asynchronous without calling some asynchronous node api or some native module). Functions are only interfaces for such tasks and they don't support canceling tasks, only starting.
So, you can't really cancel async operations, but what you can do is ignore the results of other operations if one fails. Here is how it can be implemented:
var tasks = [], //array of tasks, functions accepting callback(err)
pending = tasks.length, //amount of pending tasks
failed = false;
function done(err) { //callback for each task
if (failed) return;
if (err) {
failed = true;
callback(err); //callback for all tasks
}
if (!--pending) callback(); //all tasks completed
}
tasks.forEach(function(task) {
task(done);
});
I think you should use the Async library. Async contains parallel function that executes functions in parallel (of course). And each function takes a callback (error, result).
If an error occurs, Async will cut all other processes.
Documentation

Is the following node.js code blocking or non-blocking?

I have the node.js code running on a server and would like to know if it is blocking or not. It is kind of similar to this:
function addUserIfNoneExists(name, callback) {
userAccounts.findOne({name:name}, function(err, obj) {
if (obj) {
callback('user exists');
} else {
// Add the user 'name' to DB and run the callback when done.
// This is non-blocking to here.
user = addUser(name, callback)
// Do something heavy, doesn't matter when this completes.
// Is this part blocking?
doSomeHeavyWork(user);
}
});
};
Once addUser completes the doSomeHeavyWork function is run and eventually places something back into the database. It does not matter how long this function takes, but it should not block other events on the server.
With that, is it possible to test if node.js code ends up blocking or not?
Generally, if it reaches out to another service, like a database or a webservice, then it is non-blocking and you'll need to have some sort of callback. However, any function will block until something (even if nothing) is returned...
If the doSomeHeavyWork function is non-blocking, then it's likely that whatever library you're using will allow for some sort of callback. So you could write the function to accept a callback like so:
var doSomHeavyWork = function(user, callback) {
callTheNonBlockingStuff(function(error, whatever) { // Whatever that is it likely takes a callback which returns an error (in case something bad happened) and possible a "whatever" which is what you're looking to get or something.
if (error) {
console.log('There was an error!!!!');
console.log(error);
callback(error, null); //Call callback with error
}
callback(null, whatever); //Call callback with object you're hoping to get back.
});
return; //This line will most likely run before the callback gets called which makes it a non-blocking (asynchronous) function. Which is why you need the callback.
};
You should avoid in any part of your Node.js code synchronous blocks which don't call system or I/O operations and which computation takes long time (in computer meaning), e.g iterating over big arrays. Instead move this type of code to the separate worker or divide it to smaller synchronous pieces using process.nextTick(). You can find explanation for process.nextTick() here but read all comments too.

Resources