async.waterfall jumping to end before functions complete - node.js

Hope you guys are well?
I am diving into Node.js and having to relearn a lot of the ways in which I used to code - and having to retrain myself to asynchronous ways... I am writing a server-side script (like a java POJO), that will run either from the command line or triggered from an event.
I wanted to have the output(return) of one function to be the input of the next so I decided to use async.waterfall - as I read this will execute the functions in order, using the output from one for the input of the other...
The idea of the script is to walk through a given folder structure, create an array of sub-folders and then pass that array to the next. Then do the same for each path in that array. I wanted to use underscore.js and the "_.each()" function as it seemed a good way to iterate through the array in sequence. But this is where I get stuck as it seems to fall through all the functions to the end, before the work is complete. So my logic is a little off somewhere..
I use a 'walk' function to go into the folder and return all sub-folders.. The idea is that the script will run and then 'process.exit()' at the end of the waterfall.
The code is:
async.waterfall([
function(callback){ /* Get List of Artists from MusicFolder */
console.log('first->IN');
walk(musicFolder, function(err, foldersFound) {
if (err) { return err;}
_.each(foldersFound, function(folderPath){
console.log('Folder: ' + folderPath);
});
console.log('first->OUT');
callback(null, foldersFound);
});
},
function(artistsFound, callback){ /* Get List of Albums from Artist Folders */
var eachLoop=null;
console.log('second->IN');
_.each(artistsFound, function(artistPath){
console.log('second->eachPath:Start:'+artistPath);
walk(artistPath, function(err, albumsFound) {
console.log('second->Walk:Found');
console.log(albumsFound);
if (err) { console.log(err);}
_.each(albumsFound, function(albumPath){
eachLoop++;
console.log('second->Walk:each:'+eachLoop);
});
console.log('second->Walk:End');
});
console.log('second->eachPath:End:'+artistPath);
});
console.log('second->OUT');
callback(null, albumsFound);
},
function(paths, callback){
console.log('third->IN');
console.log('third->OUT');
callback(null, paths);
}
], function (err, result) {
console.log('last->IN');
console.log(result);
console.log('last->OUT');
// process.exit();
});
I have commented out the 'process.exit()' in the example.
IF I uncomment the 'process.exit()' I get the following output:
first->IN
Folder: /music/Adele
Folder: /music/Alex Clare
first->OUT
second->IN
second->eachPath:Start:/music/Adele
second->eachPath:End:/music/Adele
second->eachPath:Start:/music/Alex Clare
second->eachPath:End:/music/Alex Clare
second->OUT
third->IN
third->OUT
last->IN
null
last->OUT
What I can see is it does not enter the 'walk' function in the second waterfall function, but skips the 'walk' altogether even though the 'walk' is inside the _.each() iteration.
IF I comment out the 'process.exit()' command in the last function I get the following:
first->IN
Folder: /music/Adele
Folder: /music/Alex Clare
first->OUT
second->IN
second->eachPath:Start:/music/Adele
second->eachPath:End:/music/Adele
second->eachPath:Start:/music/Alex Clare
second->eachPath:End:/music/Alex Clare
second->OUT
third->IN
third->OUT
last->IN
null
last->OUT
second->Walk:Found
[ '/music/Alex Clare/The Lateness of the Hour' ]
second->Walk:each:1
second->Walk:End
second->Walk:Found
[ '/music/Adele/19',
'/music/Adele/21',
'/music/Adele/Live At The Royal Albert Hall' ]
second->Walk:each:2
second->Walk:each:3
second->Walk:each:4
second->Walk:End
I'll admit this is frustrating. Any help would be greatly appreciated as I have been rewriting this over and over for the past week in various 'async' forms and they all jump out of the functions too early - so everything is out of order.
Thanks for your help or thoughts in advance :)
Mark

It seems that your walk function is asynchronous. And you want to fire parallel asynchronous jobs, combine the result and move down the waterfall. So what you can do is to combine async.waterfall with async.parallel. Your second function may look like this:
function(artistsFound, callback) {
// some code
var jobs = [];
artistsFound.forEach(function(artistPath) {
jobs.push(function(clb) {
walk(artistPath, function(err, albumsFound) {
// some code
clb(err, albumsFound);
});
});
});
// some code
async.parallel(jobs, callback);
}
Side note: you don't have to use underscore.js to simply loop over an array. Modern JavaScript has a builtin .forEach function.

Related

values getting overriden in callbacks inside callback in setInterval method node js

I have situation where i have an array of employees data and i need to process something parallel for every employee.To implement it and achieve the task i broke the things to chunks to four methods and every method has a callback calling each other and returning callback.I am using
async.eachSeries
to start the process for each element of the employee array.
In the last method i have to set the setInterval to perform same task if required response is not achieved,this interval is not cleared till the process of repeating task continues to 5 times(if desired value is not received 5 times,but cleared after 5th time).
Now,the problem happening is that data which i am processing inside setInterval is getting overriden by values of last employees.
So i am not able to keep track of process happening for all the employee Array elements and also the details of processing for last employee are getting mixed up.
In between the four methods which i am using for performing the task are carrying out the process of saving data to redis , MongoDB , Outside Api's giving response in callback.
Can anyone suggest better way of doing this and also i feel that the problem is happening because i am not returning any callback from SetInterval method().But since that method itself is an asynchronous method so i am unware about how to handle the situation.
EmployeeArray
async.eachSeries() used to process EmployeeArray
for each i have Four callBack Medhods .
async.eachSeries() {
Callback1(){
Callback2(){
Callback3(){
Callback4(){
SetInterval(No CallBack inside this from my side)
}
}
}
}
}
As per I know the async each function does parallel processing. Also u can use async waterfall to make your code more clean. Try something like this.
async.each(openFiles, function(file, callback1) {
async.waterfall([
function(callback) {
callback(null, 'one', 'two');
},
function(arg1, arg2, callback) {
// arg1 now equals 'one' and arg2 now equals 'two'
callback(null, 'three');
},
function(arg1, callback) {
// arg1 now equals 'three'
callback(null, 'done');
}
], function (err, result) {
callback1(err);
});
}, function(err){
//if you come here without error your data is processed
});

How to properly get result array in async eachSeries for Node?

I'm trying to use async eachSeries in order to code what's the report count for every category. Categories and Reports and stored in separate collections, then I first get available categories and perform a count search on them.
This is my code:
Category.find({},{_id:0, name: 1}, function (err, foundCategories) {
async.eachSeries(foundCategories,
function (item,callback) {
Report.count({category: item.name}, function (err,count) {
var name = item.name;
console.log(count);
return callback(null,{name: count});
});
}
,function (err, results) {
if (err)
response.send(err);
response.send(JSON.stringify(results));
});
});
The problem is that I'm receiving nothing, the console.log outputs actual numbers there, what am I doing wrong?
The API of eachSeries does not provide any results to the final callback - only an error in the failure case. In the success case, it's just a pure control flow "eachSeries is done" indicator, but does not provide a mechanism for passing values from the worker function. mapSeries does provide the functionality you need.
Similar as Peter's answer, async.waterfall provides you with waterfall-execution of your functions, while passing a return value to the next async function in the waterfall chain.

node.js for loop execution in a synchronous manner

I have to implement a program in node.js which looks like the following code snippet. It has an array though which I have to traverse and match the values with database table entries. I need to wait till the loop ends and send the result back to the calling function:
var arr=[];
arr=[one,two,three,four,five];
for(int j=0;j<arr.length;j++) {
var str="/^"+arr[j]+"/";
// consider collection to be a variable to point to a database table
collection.find({value:str}).toArray(function getResult(err, result) {
//do something incase a mathc is found in the database...
});
}
However, as the str="/^"+arr[j]+"/"; (which is actually a regex to be passed to find function of MongoDB in order to find partial match) executes asynchronously before the find function, I am unable to traverse through the array and get required output.
Also, I am having hard time traversing through array and send the result back to calling function as I do not have any idea when will the loop finish executing.
Try using async each. This will let you iterate over an array and execute asynchronous functions. Async is a great library that has solutions and helpers for many common asynchronous patterns and problems.
https://github.com/caolan/async#each
Something like this:
var arr=[];
arr=[one,two,three,four,five];
asych.each(arr, function (item, callback) {
var str="/^"+item+"/";
// consider collection to be a variable to point to a database table
collection.find({value:str}).toArray(function getResult(err, result) {
if (err) { return callback(err); }
// do something incase a mathc is found in the database...
// whatever logic you want to do on result should go here, then execute callback
// to indicate that this iteration is complete
callback(null);
});
} function (error) {
// At this point, the each loop is done and you can continue processing here
// Be sure to check for errors!
})

Asynchronous Database Queries with PostgreSQL in Node not working

Using Node.js and the node-postgres module to communicate with a database, I'm attempting to write a function that accepts an array of queries and callbacks and executes them all asynchronously using the same database connection. The function accepts a two-dimensional array and calling it looks like this:
perform_queries_async([
['SELECT COUNT(id) as count FROM ideas', function(result) {
console.log("FUNCTION 1");
}],
["INSERT INTO ideas (name) VALUES ('test')", function(result) {
console.log("FUNCTION 2");
}]
]);
And the function iterates over the array, creating a query for each sub-array, like so:
function perform_queries_async(queries) {
var client = new pg.Client(process.env.DATABASE_URL);
for(var i=0; i<queries.length; i++) {
var q = queries[i];
client.query(q[0], function(err, result) {
if(err) {
console.log(err);
} else {
q[1](result);
}
});
}
client.on('drain', function() {
console.log("drained");
client.end();
});
client.connect();
}
When I ran the above code, I expected to see output like this:
FUNCTION 1
FUNCTION 2
drained
However, the output bizarrely appears like so:
FUNCTION 2
drained
FUNCTION 2
Not only is the second function getting called for both requests, it also seems as though the drain code is getting called before the client's queue of queries is finished running...yet the second query still runs perfectly fine even though the client.end() code ostensibly killed the client once the event is called.
I've been tearing my hair out about this for hours. I tried hardcoding in my sample array (thus removing the for loop), and my code worked as expected, which leads me to believe that there is some problem with my loop that I'm not seeing.
Any ideas on why this might be happening would be greatly appreciated.
The simplest way to properly capture the value of the q variable in a closure in modern JavaScript is to use forEach:
queries.forEach(function(q) {
client.query(q[0], function(err, result) {
if(err) {
console.log(err);
} else {
q[1](result);
}
});
});
If you don't capture the value, your code reflects the last value that q had, as the callback function executed later, in the context of the containing function.
forEach, by using a callback function isolates and captures the value of q so it can be properly evaluated by the inner callback.
A victim of the famous Javascript closure/loop gotcha. See my (and other) answers here:
I am trying to open 10 websocket connections with nodejs, but somehow my loop doesnt work
Basically, at the time your callback is executed, q is set to the last element of the input array. The way around it is to dynamically generate the closure.
It will be good to execute this using async module . It will help you to reuse the code also . and will make the code more readable . I just love the auto function provided by async module
Ref: https://github.com/caolan/async

How to know when finished

Im pretty new to node.js, so i'm wondering how to know when all elements are processed in lets say:
["one", "two", "three"].forEach(function(item){
processItem(item, function(result){
console.log(result);
});
});
...now if i want to do something that can only be done when all items are processed, how would i do that?
You can use async module. Simple example: The
async.map(['one','two','three'], processItem, function(err, results){
// results[0] -> processItem('one');
// results[1] -> processItem('two');
// results[2] -> processItem('three');
});
The callback function of async.map will when all items are processed. However, in processItem you should be careful, processItem should be something like this:
processItem(item, callback){
// database call or something:
db.call(myquery, function(){
callback(); // Call when async event is complete!
});
}
forEach is blocking, see this post:
JavaScript, Node.js: is Array.forEach asynchronous?
so to call a function when all items are done processing, it can be done inline:
["one", "two", "three"].forEach(function(item){
processItem(item, function(result){
console.log(result);
});
});
console.log('finished');
if there is a high io-bound load for each item to be processed, then take a look at the module Mustafa recommends. there is also a pattern referenced in the post linked above.
Albeit other answers are correct, since node.js supports ES6 henceforth, in my opinion using built-in Promise library will be more stable and tidy.
You don't even need to require something, Ecma took the Promises/A+ library and implemented it to the native Javascript.
Promise.all(["one", "two","three"].map(processItem))
.then(function (results) {
// here we got the results in the same order of array
} .catch(function (err) {
// do something with error if your function throws
}
As Javascript is a adequately problematic language (dynamic typing, asynchronous flow) when it comes to debugging, sticking with promises instead of callbacks will save your time at the end.

Resources