How to control serial and parallel control flow with mapped functions? - node.js

I've drawn a simple flow chart, which basically crawls some data from internet and loads them into the database. So far, I had thought I was peaceful with promises, however now I have an issue that I'm working for at least three days without a simple step.
Here is the flow chart:
Consider there is a static string array like so: const courseCodes = ["ATA, "AKM", "BLG",... ].
I have a fetch function, it basically does a HTTP request followed by parsing. Afterwards it returns some object array.
fetch works perfectly with invoking its callback with that expected object array, it even worked with Promises, which was way greater and tidy.
fetch function should be invoked with every element in the courseCodes array as its parameter. This task should be performed in parallel execution, since those seperate fetch functions do not affect each other.
As a result, there should be a results array in callback (or Promises resolve parameter), which includes array of array of objects. With those results, I should invoke my loadCourse with those objects in the results array as its parameter. Those tasks should be performed in serial execution, because it basically queries database if similar object exists, adds it if it's not.
How can perform this kind of tasks in node.js? I could not maintain the asynchronous flow in such a scenario like this. I've failed with caolan/async library and bluebird & q promise libraries.

Try something like this, if you are able to understand this:
const courseCodes = ["ATA, "AKM", "BLG",... ]
//stores the tasks to be performed.
var parallelTasks = [];
var serialTasks = [];
//keeps track of courses fetched & results.
var courseFetchCount = 0;
var results = {};
//your fetch function.
fetch(course_code){
//your code to fetch & parse.
//store result for each course in results object
results[course_code] = 'whatever result comes from your fetch & parse code...';
}
//your load function.
function loadCourse(results) {
for(var index in results) {
var result = results[index]; //result for single course;
var task = (
function(result) {
return function() {
saveToDB(result);
}
}
)(result);
serialTasks.push(task);
}
//execute serial tasks for saving results to database or whatever.
var firstSerialTask = serialTasks.shift();
nextInSerial(null, firstSerialTask);
}
//pseudo function to save a result to database.
function saveToDB(result) {
//your code to store in db here.
}
//checks if fetch() is complete for all course codes in your array
//and then starts the serial tasks for saving results to database.
function CheckIfAllCoursesFetched() {
courseFetchCount++;
if(courseFetchCount == courseCodes.length) {
//now process courses serially
loadCourse(results);
}
}
//helper function that executes tasks in serial fashion.
function nextInSerial(err, result) {
if(err) throw Error(err.message);
var nextSerialTask = serialTasks.shift();
nextSerialTask(result);
}
//start executing parallel tasks for fetching.
for(var index in courseCode) {
var course_code = courseCode[index];
var task = (
function(course_code) {
return function() {
fetch(course_code);
CheckIfAllCoursesFetched();
}
}
)(course_code);
parallelTasks.push(task);
for(var task_index in parallelTasks) {
parallelTasks[task_index]();
}
}
Or you may refer to nimble npm module.

Related

Render page in express after for loop completes

I have to run mongoose query repeatedly within a for loop and once that completes, I want to render a page in express. Sample code given below. Since mongoose runs asynchronously, how can I make the 'commands/new' page to render only once the 'commands' array has been populated within the for loop?
...
...
var commands = [];
for (var index=0; index<ids.length; index++) {
mongoose.model('Command').find({_id : ids[index]}, function (err, command){
// do some biz logic with the 'command' object
// and add it to the 'commands' array
commands[index] = command;
});
}
res.render('commands/new', {
commands : commands
});
...
...
Your basic for loop here does not respect the callback completion of the asynchronous methods you are calling before executing each iteration. So simply use something that does instead. The node async library fits the bill here, and in fact for even better methods of array iteration:
var commands = [];
async.each(ids,function(id,callback) {
mongoose.model("Command").findById(id,function(err,command) {
if (command)
commands.push(command);
callback(err);
});
},function(err) {
// code to run on completion or err
})
And therefore async.each or possibly a variant like async.eachLimit which will only run a limited number of parallel tasks as set will be your better loop iteration control methods here.
NB The .findById() method of Mongoose also helps shorten coding here.

How to use promise bluebird in nested for loop?

I need to use bluebird in my code and I have no idea how to use it. My code contains nested loops. When the user logs in, my code will run. It will begin to look for any files under the user, and if there are files then, it will loop through to get the name of the files, since the name is stored in a dictionary. Once it got the name, it will store the name in an array. Once all the names are stored, it will be passed along in res.render().
Here is my code:
router.post('/login', function(req, res){
var username = req.body.username;
var password = req.body.password;
Parse.User.logIn(username, password, {
success: function(user){
var Files = Parse.Object.extend("File");
var object = [];
var query = new Parse.Query(Files);
query.equalTo("user", Parse.User.current());
var temp;
query.find({
success:function(results){
for(var i=0; i< results.length; i++){
var file = results[i].toJSON();
for(var k in file){
if (k ==="javaFile"){
for(var t in file[k]){
if (t === "name"){
temp = file[k][t];
var getname = temp.split("-").pop();
object[i] = getname;
}
}
}
}
}
}
});
console.log(object);
res.render('filename', {title: 'File Name', FIles: object});
console.log(object);
},
error: function(user, error) {
console.log("Invalid username/password");
res.render('logins');
}
})
});
EDIT:The code doesn't work, because on the first and second console.log(object), I get an empty array. I am suppose to get one item in that array, because I have one file saved
JavaScript code is all parsed from top to bottom, but it doesn't necessarily execute in that order with asynchronous code. The problem is that you have the log statements inside of the success callback of your login function, but it's NOT inside of the query's success callback.
You have a few options:
Move the console.log statements inside of the inner success callback so that while they may be parsed at load time, they do not execute until both callbacks have been invoked.
Promisify functions that traditionally rely on and invoke callback functions, and hang then handlers off of the returned value to chain the promises together.
The first option is not using promises at all, but relying solely on callbacks. To flatten your code you will want to promisify the functions and then chain them.
I'm not familiar with the syntax you're using there with the success and error callbacks, nor am I familiar with Parse. Typically you would do something like:
query.find(someArgsHere, function(success, err) {
});
But then you would have to nest another callback inside of that, and another callback inside of that. To "flatten" the pyramid, we make the function return a promise instead, and then we can chain the promises. Assuming that Parse.User.logIn is a callback-style function (as is Parse.Query.find), you might do something like:
var Promise = require('bluebird');
var login = Promise.promisify(Parse.User.logIn);
var find = Promise.promisify(Parse.Query.find);
var outerOutput = [];
return login(yourArgsHere)
.then(function(user) {
return find(user.someValue);
})
.then(function(results) {
var innerOutput = [];
// do something with innerOutput or outerOutput and render it
});
This should look familiar to synchronous code that you might be used to, except instead of saving the returned value into a variable and then passing that variable to your next function call, you use "then" handlers to chain the promises together. You could either create the entire output variable inside of the second then handler, or you can declare the variable output prior to even starting this promise chain, and then it will be in scope for all of those functions. I have shown you both options above, but obviously you don't need to define both of those variables and assign them values. Just pick the option that suits your needs.
You can also use Bluebird's promisifyAll() function to wrap an entire library with equivalent promise-returning functions. They will all have the same name of the functions in the library suffixed with Async. So assuming the Parse library contains callback-style functions named someFunctionName() and someOtherFunc() you could do this:
var Parse = Promise.promisifyAll(require("Parse"));
var promiseyFunction = function() {
return Parse.someFunctionNameAsync()
.then(function(result) {
return Parse.someOtherFuncAsync(result.someProperty);
})
.then(function(otherFuncResult) {
var something;
// do stuff to assign a value to something
return something;
});
}
I have a few pointers. ... Btw tho, are you trying to use Parse's Promises?
You can get rid of those inner nested loops and a few other changes:
Use some syntax like this to be more elegant:
/// You could use a map function like this to get the files into an array of just thier names
var fileNames = matchedFiles.map(function _getJavaFile(item) {
return item && item.javaFile && item.javaFile.name // NOT NULL
&& item.javaFile.name.split('-')[0]; // RETURN first part of name
});
// Example to filter/retrieve only valid file objs (with dashes in name)
var matchedFiles = results.filter(function _hasJavaFile(item) {
return item && item.javaFile && item.javaFile.name // NOT NULL
&& item.javaFile.name.indexOf('-') > -1; // and has a dash
});
And here is an example on using Parse's native promises (add code above to line 4/5 below, note the 'then()' function, that's effectively now your 'callback' handler):
var GameScore = Parse.Object.extend("GameScore");
var query = new Parse.Query(GameScore);
query.select("score", "playerName");
query.find().then(function(results) {
// each of results will only have the selected fields available.
});

How to overcome asynchronous non-blocking NOT returning values in times?

I am creating an array of JSON objects which is then stored in mongodb.
Each JSON object contains a number of fields - each being populated before I save the object to mongodb.
Some of the Objects attributes are populated by making API calls to other websites such as last.fm but the returned value is not quick enough to populate the attribute before the object is saved to mongodb.
How can I wait for all attributes of an object to be populated before saving it? I did try async.waterfall but it still falls through without waiting and I end up with a database filled with documents with empty fields..
Any help would be greatly appreciated.
Thanks :)
You have a few options for controlling asynchrony in JavaScript:
Callback pattern: (http://npmjs.org/async) async.all([...], function (err) {
Promises: (http://npmjs.org/q) Q.all([...]).then(function () {
Streams: (http://npmjs.org/concat-stream) see also https://github.com/substack/stream-handbook
Since you say you are making multiple API calls to other websites, you may want to try:
async.each(api_requests,
function(api_request, cb) {
request(api_request, function (error, response, body) {
/* code */
/* add to model for Mongo */
cb();
});
},
function(err) {
// continue execution after all cbs are received
/* code */
/* save to Mongo, etc.. */
}
);
The above example is most applicable when you are making numerous requests following the same format. Please review the documentation for Waterfall (https://github.com/caolan/async#waterfall) if the input into your next step depends on the output of the previous step or Parallel (https://github.com/caolan/async#parallel) if you have a bunch of unrelated tasks that don't rely on each other. The great thing about async is that you can nest and string all the functions together to support what you're trying to do.
You'll either want to use promises or some sort of callback mechanism. Here's an example of the promise method with jPromise:
var jPromise = require('jPromise');
var promises = [];
for(var i = 0; i < 10; i++) {
promises.push(someAsyncApiCall(i));
}
jPromise.when(promises).then(function() {
saveThingsToTheDb();
});
Similarly, without the promise library:
var finished = 0;
var toDo = 10;
function allDone() {
saveThingsToTheDb();
}
for(var i = 0; i < toDo.length; i++) {
someAsyncApiCall(function() {
finished++;
if(finished === toDo) {
allDone();
}
});
}
Personally, I prefer the promise method, but that will only that well if the API you're calling returns some sort of a promise. If it doesn't, you'll be SOL and wrap the callback API with promises somehow (Q does this pretty well).

Returning an Array using Firebase

Trying to find the best-use example of returning an array of data in Node.js with Q library (or any similar library, I'm not partial) when using Firebase .on("child_added");
I've tried using Q.all() but it never seems to wait for the promises to fill before returning. This is my current example:
function getIndex()
{
var deferred = q.defer();
deferred.resolve(new FirebaseIndex( Firebase.child('users').child(user.app_user_id).child('posts'), Firebase.child('posts') ) );
return deferred.promise;
}
function getPost( post )
{
var deferred = q.defer();
deferred.resolve(post.val());
return deferred.promise;
}
function getPosts()
{
var promises = [];
getIndex().then( function (posts) {
posts.on( 'child_added', function (_post) {
promises.push( getPost(_post) );
});
});
return q.all(promises);
}
The problem occurs in getPosts(). It pushes a promise into your array inside an async function--that won't work since q.all is called before the promise objects have been added.
Also, child_added is a real-time event notification. You can't use that as a way to grab "all of the data" because there is no such thing as "all"; the data is constantly changing in real-time environments. FirebaseIndex is also using child_added callbacks internally, so that's not going to work with this use case either.
You can grab all of the posts using the 'value' callback (but not a specific subset of records) as follows:
function getPosts() {
var def = q.defer();
Firebase.child('users').once('value', function(snap) {
var records = [];
snap.forEach(function(ss) {
records.push( ss.val() );
});
def.resolve(records);
});
return def.promise;
}
But at this point, it's time to consider things in terms of real-time environments. Most likely, there is no reason "all" data needs to be present before getting to work.
Consider just grabbing each record as they come in and appending them to whatever DOM or Array where they need to be stored, and working from an event driven model instead of a GET/POST centered approach.
With luck, you can bypass this use case entirely.

In node.js, how to use node.js and mongodb to store data in multiple levels

I met a wired problem, that when i use mongodb to store data, some data is missing, which I think it is because of its asynchronous feature
So for this list the timetable, I would use re
/* Here is the a application, in which by using a train_uid and today,
*/
var today = new Date();
var day = today.getDay();
scheduleModel.findByTrainAndTime(train_uid,today,function(err, doc){
var a = new Object();
if(err){}
else{
if(doc != null)
{
//mongodb database can give me some data about the train_id ,uid
a.train_uid = doc.train_uid;
a.train_id = train_id;
and most importantly a train schedule time table, the train schedule time table is a list ( doc.time_schedule )of json objects like arrival, departure and tiploc. However, I need to change the tiploc to sanox number, which referenceModel can help find sanox by providing tiploc number.
//doc.time_schedule
// here is to add a array
so I use async, for each item in the list, I use referenceModel to query sanox and construct a array - a.timeline to store each b, at last when async each operation is finished, trainModel is to store a object with an array of sanox object. However when it comes to the mongodb database, only the array of sanox objects are empty, I guess it is because of asynchronous operation, but since I used async , why it doesn't work
a.train_uid = doc.train_uid; //works
a.train_id = train_id; works
a.timeline = [] // doesn't work
a.timeline = new Array();
var b ;
async.forEachSeries(doc.time_schedule,
function(item,callback){
referenceModel.findStanoxByTicloc(item.tiploc_code,function(err,sanox){
try{
b = new Object();
b.sanox = sanox;
a.time.push(b);
}catch(err2){
}
});
callback();
},
function(err){
trainModel.createNewTrain(a,function(){});
}
}
});
You're calling callback after you fire off the asynchronous find, but before it actually comes back. You need to wait until after you've gotten the data to do that. The following should work better:
async.forEachSeries(doc.time_schedule,
function(item,callback){
referenceModel.findStanoxByTicloc(item.tiploc_code,function(err,sanox){
try{
b = new Object();
b.sanox = sanox;
a.time.push(b);
}catch(err2){
}
callback();
});
},
function(err){
trainModel.createNewTrain(a,function(){});
}

Resources