Stopping the parallel execution - node.js

I have four functions which are running parallely. If any one function fails in between, how can I stop the execution of other functions. Any help on this will be really helpful.

Except if it's a setTimeout or setInterval, I think you can't. You can, anyway, set checkpoints in the logic of the functions. It's not clean a all, but it can work. For example, checking the other function in the callback:
var control = true;
async1(function(e,r){
if(e) {
control = false;
return callback1(e,r);
};
if(control) callback1(e,r);
});
async2(function(e,r){
if(e) {
control = false;
return callback2(e,r);
};
if(control) callback2(e,r);
});
Althougt to do this, I would go with throrin19 and say that Async it's a nice lib to do this.
But maybe you would like to check co. It could handle your problem better than Async.

You can't stop execution of a function. Functions are executed and return synchronously, so there is technically nothing to stop. But there can be asynchronous tasks, which use underlying libuv capabilites somehow (that is, you can't do anything asynchronous without calling some asynchronous node api or some native module). Functions are only interfaces for such tasks and they don't support canceling tasks, only starting.
So, you can't really cancel async operations, but what you can do is ignore the results of other operations if one fails. Here is how it can be implemented:
var tasks = [], //array of tasks, functions accepting callback(err)
pending = tasks.length, //amount of pending tasks
failed = false;
function done(err) { //callback for each task
if (failed) return;
if (err) {
failed = true;
callback(err); //callback for all tasks
}
if (!--pending) callback(); //all tasks completed
}
tasks.forEach(function(task) {
task(done);
});

I think you should use the Async library. Async contains parallel function that executes functions in parallel (of course). And each function takes a callback (error, result).
If an error occurs, Async will cut all other processes.
Documentation

Related

nodejs concurrency synchronous execution

I've two WebSockets getting data asynchronously, every time I get some message from the sockets I execute some code in CompareData.
The problem is that CompareData should be executed synchronously, or (better) only if it is not already running
This is my code:
function CompareData(data) {
console.log('data ', data);
AsyncFunction();
};
ws1 = new WebSocket(WS1_URL);
ws2 = new WebSocket(WS2_URL);
ws1.on('message', (data) => {
CompareData(data);
});
ws2.on('message', (data) => {
CompareData(data);
});
Can you help me, please? I'm very new to NodeJs
Node.js is single threaded. So you don't really get true concurrency issues occurring in Node programs as you might in other languages. In your example, there can only be at most one WebSocket callback for CompareData occurring at any given time.
You should not make synchronous call in node.js but you can make those call sequential. See below example might be helpful.
var messages = [];
var inProgress = false;
function CompareData(data) {
return new Promise((resolve, reject) => {
// do some stuff and resolve
setTimeout(() => {
resolve(data);
}, 1000);
});
};
const start = async () => {
if (!inProgress) {
if (messages.length !== 0) {
inProgress = true;
try {
const data = await CompareData(messages.shift());
console.log(data);
} catch (error) {
console.log(error);
}
inProgress = false;
await start();
}else{
console.log('Process Done');
}
}
}
const handler = (data) => {
messages.push(data);
start();
}
handler(1);
handler(2);
handler(3);
handler(4);
// ws1 = new WebSocket(WS1_URL);
// ws2 = new WebSocket(WS2_URL);
// ws1.on('message', handler);
// ws2.on('message', handler);
You should use some mutex in order to avoid that two async operations of compareData are executed at the same time, like node-mutex or mutexify.
My suggestions are:
First of all, you need to know when CompareData is finished. Reorganize your code to use promises or callbacks. If you're using third-party async functions, I'm almost sure they provide some feedback on completion - This is a must have in async world
Add inProgress = false flag somewhere to serve for you as simple lock. As someone posted, JS is single-threaded and you're guaranteed that your code won't get interrupted in the middle of operation. Thanks to that you can use really simple locks instead of complicated os-based mutexes known from multithreaded
langs.
In ws.on(...) check if inProgress is set. If not, lock it and run CompareData
In CompareData completion callback or on promise resolution set inProgress back to false, so you're no longer ignoring incoming data.
If you can simply discard the data, there is no need to complicate this scenario with extra queues, mutexes, etc.
If you need to serve it all, then queue incoming data and serve next piece after completion callback is fired.
This is basically what Rahul's suggests, but he uses features that are not established in current version of standard, so don't use it if you're not transpiling your code.

How to process a big array applying a async function for each element in nodejs?

I am working with zombie.js to scrape one site, I must to use the callback style to connect to each url. The point is that I have got an urls array and I need to process each urls using an async function. This is my first approach:
Array urls = {http..., http...};
function process_url(index)
{
if(index == urls.length)
return;
async_function(url,
function() {
...
//parse the url
...
// Process the next url
process_url(index++);
}
);
}
process_url(0)
Without use someone third party nodejs library to use the asyn funtion as sync function or to wait for the function (wait.for, synchornized, mocha), this is the way that I though to resolve this problem, I don't know what would happen if the array is too big. Is the function released from the memory when the next function is called? or all the functions are in memory until the end?
Any ideas?
Your scheme will work. I call it "manually sequencing async operations".
A general purpose version of what you're doing would look like this:
function processItem(data, callback) {
// do your async function here
// for example, let's suppose it was an http request using the request module
request(data, callback);
}
function processArray(array, fn) {
var index = 0;
function next() {
if (index < array.length) {
fn(array[index++], function(err, result) {
// process error here
if (err) return;
// process result here
next();
});
}
}
next();
}
processArray(arr, processItem);
As to your specific questions:
I don't know what would happen if the array is too big. Is the
function released from the memory when the next function is called? or
all the functions are in memory until the end?
Memory in Javascript is released when it is not longer referenced by any running code and when the garbage collector gets time to run. Since you are running a series of asynchronous operations here, it is likely that the garbage collector gets a chance to run regularly while waiting for the http response from the async operation so memory could get cleaned up then. Functions are just another type of object in Javascript and they get garbage collected just like anything else. When they are no longer reference by running code, they are eligible for garbage collection.
In your specific code, because you are re-calling process_url() only in an async callback, there is no stack build-up (as in normal recursion). The prior instance of process_url() has already completed BEFORE the async callback is called and BEFORE you call the next iteration of process_url().
In general, management and coordination of multiple async operations is much, much easier using promises which are built into the current versions of node.js and are part of the ES6 ECMAScript standard. No external libraries are required to use promises in current versions of node.js.
For a list of a number of different techniques for sequencing your asynchronous operations on your array, both using promises and not using promises, see:
How to synchronize a sequence of promises?.
The first step in using promises is the "promisify" your async function so that it returns a promise instead of takes a callback.
function async_function_promise(url) {
return new Promise(function(resolve, reject) {
async_function(url, function(err, result) {
if (err) {
reject(err);
} else {
resolve(result);
}
});
});
}
Now, you have a version of your function that returns promises.
If you want your async operations to proceed one at a time so the next one doesn't start until the previous one has completed, then a usual design pattern for that is to use .reduce() like this:
function process_urls(array) {
return array.reduce(function(p, url) {
return p.then(function(priorResult) {
return async_function_promise(url);
});
}, Promise.resolve());
}
Then, you can call it like this:
var myArray = ["url1", "url2", ...];
process_urls(myArray).then(function(finalResult) {
// all of them are done here
}, function(err) {
// error here
});
There are also Promise libraries that have some helpful features that make this type of coding simpler. I, myself, use the Bluebird promise library. Here's how your code would look using Bluebird:
var Promise = require('bluebird');
var async_function_promise = Promise.promisify(async_function);
function process_urls(array) {
return Promise.map(array, async_function_promise, {concurrency: 1});
}
process_urls(myArray).then(function(allResults) {
// all of them are done here and allResults is an array of the results
}, function(err) {
// error here
});
Note, you can change the concurrency value to whatever you want here. For example, you would probably get faster end-to-end performance if you increased it to something between 2 and 5 (depends upon the server implementation on how this is best optimized).

Make chained functions wait for each other to execute

How do I make a chained function wait for the function before it, to execute properly?
I have this excerpt from my module:
var ParentFunction = function(){
this.userAgent = "SomeAgent";
return this;
}
ParentFunction.prototype.login = function(){
var _this = this;
request.post(
url, {
"headers": {
"User-Agent": _this.userAgent
}
}, function(err, response, body){
return _this;
})
}
ParentFunction.prototype.user = function(username){
this.username = username;
return this;
}
ParentFunction.prototype.exec = function(callback){
request.post(
anotherURL, {
"headers": {
"User-Agent": _this.userAgent
}
}, function(err, response, body){
callback(body);
})
}
module.exports = parentFunction;
And this is from within my server:
var pF = require("./myModule.js"),
parentFunction = new pF();
parentFunction.login().user("Mobilpadde").exec(function(data){
res.json(data);
});
The problem is, that the user-function won't wait for login to finish (Meaning, it executes before the login returns _this). So how do I make it wait?
You can't make Javascript "wait" before executing the next call in the chain. The whole chain will execute immediately in sequential order without waiting for any async operations to complete.
The only way I can think of to make this structure work is to create a queue of things waiting to execute and then somehow monitor the things in that queue that are async so you know when to execute the next thing in the queue. This requires making each method follow some sort of standard convention for knowing both whether the method is async and if it is async when the async operation is done and what to do if there's an error in the chain.
jQuery does something like this for jQuery animations (it implements a queue for all chained animations and calls each animation in turn when the previous animation is done). But, its implementation is simpler than what you have here because it works only with jQuery animations (or manually queued functions that follow the proper convention), not with other jQuery methods. You are proposing a mix of three different kinds of methods, two of which are not even async.
So, to make a long story shorter, what you are asking for could likely be done if you make all methods follow a set of conventions, but it's not easy. Since you appear to only have one async operation here, I'm wondering if you could do something like this instead. You don't show what the .exec() method does, but if all it does is call some other function at the end of the chain, then you'ld only have one async method in the chain so you could just let it take a callback and do this:
parentFunction.user("Mobilepadde").login(function(data) {
res.json(data);
});
I was working on a queued means of doing this, but it is not something I can write and test in less than an hour and you'd have to offer some ideas for what you want to do with errors that occur anywhere in the chain. Non-async errors could just throw an exception, but async errors or even non-async errors that occur after an async operation completes can't just throw because there's no good way to catch them. So, error handling becomes complex and you really shouldn't embark on a design path that doesn't anticipate appropriate error handling.
The more I think about this, the more I think you either want to get a library designed to handle the sequencing of async operations (such as the async module) and use that for your queueing of operations or you want to just give up on the chaining and use promises to support sequencing of your operations. As I was thinking about how to do error handling in the task queue with async operations, it occurred to me that all these problems have already been dealt with in promises (propagation of errors through reject and catching of exceptions in async handlers that are turned into promise rejections, etc...). Doing error handling right with async operations is difficult and is one huge reason to build off of promises for sequencing async operations.
So, now thinking about solving this using promises, what you could do is to make each async method return a promise (you can promisfy the entire request module with one call with many promise libraries such as Bluebird). Then, one .login() and .exec() return promises, you can do this:
var Promise = require('bluebird');
var request = Promise.promisifyAll(require('request'));
ParentFunction.prototype.login = function(){
return request.postAsync(
url, {
"headers": {
"User-Agent": this.userAgent
}
});
}
ParentFunction.prototype.exec = function(){
return request.postAsync(
anotherURL, {
"headers": {
"User-Agent": this.userAgent
}
}).spread(function(response, body) {
return body;
})
}
parentFunction.login().then(function() {
parentFunction.user("Mobilpadde");
return parentFunction.exec().then(function(data) {
res.json(data);
});
}).catch(function(err) {
// handle errors here
});
This isn't chaining, but it gets you going in minutes rather than something that probably takes quite awhile to write (with robust error handling).
Try this and see if it works:
ParentFunction.prototype.login = function(callback){
var _this = this;
request.post(
url, {
"headers": {
"User-Agent": _this.userAgent
}
}, function(err, response, body){
return callback(_this);
})
}
}
On the server side:
parentFunction.login(function(loggedin){
loggedin.user("Mobilpadde").exec(function(data){
res.json(data);
});
});

NodeJS callback sequence

Folks,
I have the following function, and am wondering whats the correct way to call the callback() only when the database operation completes on all items:
function mapSomething (callback) {
_.each(someArray, function (item) {
dao.dosomething(item.foo, function (err, account) {
item.email = account.email;
});
});
callback();
},
What I need is to iterate over someArray and do a database call for each element. After replacing all items in the array, I need to only then call the callback. Ofcourse the callback is in the incorrect place right now
Thanks!
The way you currently have it, callback is executed before any of the (async) tasks finish.
The async module has an each() that allows for a final callback:
var async = require('async');
// ...
function mapSomething (callback) {
async.each(someArray, function(item, cb) {
dao.dosomething(item.foo, function(err, account) {
if (err)
return cb(err);
item.email = account.email;
cb();
});
}, callback);
}
This will not wait for all your database calls to be done before calling callback(). It will launch all the database calls at once in parallel (I'm assuming that's what dao.dosomething() is). And, then immediately call callback() before any of the database calls have finished.
You have several choices to solve the problem.
You can use promises (by promisifying the database call) and then use Promise.all() to wait for all the database calls to be done.
You can use the async library to manage the coordination for you.
You can keep track of when each one is done yourself and when the last one is done, call your callback.
I would recommend options 1. or 2. Personally, I prefer to use promises and since you're interfacing with a database, this is probably not the only place you're making database calls, so I'd promisify the interface (bluebird will do that for you in one function call) and then use promises.
Here's what a promise solution could look like:
var Promise = require('bluebird');
// make promise version of your DB function
// ideally, you'd promisify the whole DB API with .promisifyAll()
var dosomething = Promise.promisify(dao.dosomething, dao);
function mapSomething (callback, errCallback) {
Promise.all(_.map(someArray, function(item) {
return dosomething(item.foo).then(function (account) {
item.email = account.email;
});
}).then(callback, errCallback);
}
This assumes you want to run all the DB calls in parallel and then call the callback when they are all done.
FYI, here's a link to how Bluebird promisify's existing APIs. I use this mechanism for all async file I/O in node and it saves a ton of coding time and makes error handling much more sane. Async callbacks are a nightmare for proper error handling, especially if exceptions can be thrown from async callbacks.
P.S. You may actually want your mapSomething() function to just return a promise itself so the caller is then responsible for specifying their own .then() handler and it allows the caller to use the returned promise for their own synchronization with other things (e.g. it's just more flexible that way).
function mapSomething() {
return Promise.all(_.map(someArray, function(item) {
return dosomething(item.foo).then(function (account) {
item.email = account.email;
});
})
}
mapSomething.then(mapSucessHandler, mapErrorHandler);
I haven't tried Bluebird's .map() myself, but once you've promisified the database call, I think it would simplify it a bit more like this:
function mapSomething() {
return Promise.map(someArray, function(item) {
return dosomething(item.foo).then(function (account) {
item.email = account.email;
});
})
}
mapSomething.then(mapSucessHandler, mapErrorHandler);

Is the following node.js code blocking or non-blocking?

I have the node.js code running on a server and would like to know if it is blocking or not. It is kind of similar to this:
function addUserIfNoneExists(name, callback) {
userAccounts.findOne({name:name}, function(err, obj) {
if (obj) {
callback('user exists');
} else {
// Add the user 'name' to DB and run the callback when done.
// This is non-blocking to here.
user = addUser(name, callback)
// Do something heavy, doesn't matter when this completes.
// Is this part blocking?
doSomeHeavyWork(user);
}
});
};
Once addUser completes the doSomeHeavyWork function is run and eventually places something back into the database. It does not matter how long this function takes, but it should not block other events on the server.
With that, is it possible to test if node.js code ends up blocking or not?
Generally, if it reaches out to another service, like a database or a webservice, then it is non-blocking and you'll need to have some sort of callback. However, any function will block until something (even if nothing) is returned...
If the doSomeHeavyWork function is non-blocking, then it's likely that whatever library you're using will allow for some sort of callback. So you could write the function to accept a callback like so:
var doSomHeavyWork = function(user, callback) {
callTheNonBlockingStuff(function(error, whatever) { // Whatever that is it likely takes a callback which returns an error (in case something bad happened) and possible a "whatever" which is what you're looking to get or something.
if (error) {
console.log('There was an error!!!!');
console.log(error);
callback(error, null); //Call callback with error
}
callback(null, whatever); //Call callback with object you're hoping to get back.
});
return; //This line will most likely run before the callback gets called which makes it a non-blocking (asynchronous) function. Which is why you need the callback.
};
You should avoid in any part of your Node.js code synchronous blocks which don't call system or I/O operations and which computation takes long time (in computer meaning), e.g iterating over big arrays. Instead move this type of code to the separate worker or divide it to smaller synchronous pieces using process.nextTick(). You can find explanation for process.nextTick() here but read all comments too.

Resources