Design pattern for many asynchronous tasks in node - node.js

I'm learning node and writing an API. One of my API calls takes a parameter called Tags, which will contain comma-delimited tags, each of which I want to save to disk (I'm using MongoDB + Mongoose). Typically when I save to DB in my API I pass a callback and carry on after the save inside of that callback, but here I have a variable number of objects to save to disk, and I don't know the cleanest way to save all of these tags to disk, then save the object which references them afterward. Can anyone suggest a clean async pattern to use? Thanks!

async is a good node library for these tasks..
run multiple async calls in parallel or in series and trigger one single callback after that:
async.parallel([
function(){ ... },
function(){ ... }
], callback);
async.series([
function(){ ... },
function(){ ... }
]);

This is common code pattern I often use when I don't want additional dependencies:
var tags = ['tag1', 'tag2', 'tag3'];
var wait = tags.length;
tags.forEach(function (tag) {
doAsyncJob(tag, done);
});
function done() {
if (--wait === 0) allDone();
}
This code will run doAsyncJob(tag, callback) in parallel for each item of array, and call allDone when each job completed. If you need to process data continuously (each after another), here's another pattern:
(function oneIteration() {
var item = tags.shift();
if (item) {
doAsyncJob(item, oneIteration);
} else {
allDone();
}
})();

Related

How to process a big array applying a async function for each element in nodejs?

I am working with zombie.js to scrape one site, I must to use the callback style to connect to each url. The point is that I have got an urls array and I need to process each urls using an async function. This is my first approach:
Array urls = {http..., http...};
function process_url(index)
{
if(index == urls.length)
return;
async_function(url,
function() {
...
//parse the url
...
// Process the next url
process_url(index++);
}
);
}
process_url(0)
Without use someone third party nodejs library to use the asyn funtion as sync function or to wait for the function (wait.for, synchornized, mocha), this is the way that I though to resolve this problem, I don't know what would happen if the array is too big. Is the function released from the memory when the next function is called? or all the functions are in memory until the end?
Any ideas?
Your scheme will work. I call it "manually sequencing async operations".
A general purpose version of what you're doing would look like this:
function processItem(data, callback) {
// do your async function here
// for example, let's suppose it was an http request using the request module
request(data, callback);
}
function processArray(array, fn) {
var index = 0;
function next() {
if (index < array.length) {
fn(array[index++], function(err, result) {
// process error here
if (err) return;
// process result here
next();
});
}
}
next();
}
processArray(arr, processItem);
As to your specific questions:
I don't know what would happen if the array is too big. Is the
function released from the memory when the next function is called? or
all the functions are in memory until the end?
Memory in Javascript is released when it is not longer referenced by any running code and when the garbage collector gets time to run. Since you are running a series of asynchronous operations here, it is likely that the garbage collector gets a chance to run regularly while waiting for the http response from the async operation so memory could get cleaned up then. Functions are just another type of object in Javascript and they get garbage collected just like anything else. When they are no longer reference by running code, they are eligible for garbage collection.
In your specific code, because you are re-calling process_url() only in an async callback, there is no stack build-up (as in normal recursion). The prior instance of process_url() has already completed BEFORE the async callback is called and BEFORE you call the next iteration of process_url().
In general, management and coordination of multiple async operations is much, much easier using promises which are built into the current versions of node.js and are part of the ES6 ECMAScript standard. No external libraries are required to use promises in current versions of node.js.
For a list of a number of different techniques for sequencing your asynchronous operations on your array, both using promises and not using promises, see:
How to synchronize a sequence of promises?.
The first step in using promises is the "promisify" your async function so that it returns a promise instead of takes a callback.
function async_function_promise(url) {
return new Promise(function(resolve, reject) {
async_function(url, function(err, result) {
if (err) {
reject(err);
} else {
resolve(result);
}
});
});
}
Now, you have a version of your function that returns promises.
If you want your async operations to proceed one at a time so the next one doesn't start until the previous one has completed, then a usual design pattern for that is to use .reduce() like this:
function process_urls(array) {
return array.reduce(function(p, url) {
return p.then(function(priorResult) {
return async_function_promise(url);
});
}, Promise.resolve());
}
Then, you can call it like this:
var myArray = ["url1", "url2", ...];
process_urls(myArray).then(function(finalResult) {
// all of them are done here
}, function(err) {
// error here
});
There are also Promise libraries that have some helpful features that make this type of coding simpler. I, myself, use the Bluebird promise library. Here's how your code would look using Bluebird:
var Promise = require('bluebird');
var async_function_promise = Promise.promisify(async_function);
function process_urls(array) {
return Promise.map(array, async_function_promise, {concurrency: 1});
}
process_urls(myArray).then(function(allResults) {
// all of them are done here and allResults is an array of the results
}, function(err) {
// error here
});
Note, you can change the concurrency value to whatever you want here. For example, you would probably get faster end-to-end performance if you increased it to something between 2 and 5 (depends upon the server implementation on how this is best optimized).

Meteor method doesn't work

Assume that I have a Collection called Tasks which has few tasks in it.I call a method to return a task array to the user but for some reason it doesn't return anything.
Here is a code for example:
if (Meteor.isClient) {
// This code only runs on the client
Template.body.helpers({
tasks: function () {
// Show newest tasks first
Meteor.call("getTasks", function(error, result) {
return result; // Doesn't do anything..
});
}
});
}
Meteor.methods({
getTasks: function() {
return Tasks.find({}, {sort: {createdAt: -1}});
}
});
Any ideas why when I call the method it doesn't return anything?
Tasks.find() returns a cursor, which makes no sense to transmit to the client via DDP.
You probably mean to return Tasks.find().fetch(), but that defeats the purpose of Meteor's very nice data synchronization mechanism.
Have you read Understanding Meteor's publish/subscribe?

Synchronize node.js object

I am using a variable and that is used by many functions at a time. I need to synchronize it. How do I do it?
var x = 0;
var a = function(){
x=x+1;
}
var b = function(){
x=x+2;
}
var c = function(){
var t = x;
return t;
}
This is the simplified logic of my code. To give more insight, X is as good as my mongoDB object which needs to be used by only one function at a time. Also 3 functions are like REST api calls so there is probability they will be called at same time.
I need to write getX function which should manage locking and unlocking.
Any suggestions?
Node is single threaded so there is no chance of the the 3 functions to be executed at the same time. Syncronization and race conditions only apply in multithreaded environments. There is a case though, if the first function blocks for i/o.
You are asking about keeping a single object synchronized as several
asynchronous operations modify that object. This is a bit vague (do you need to execute them in order? do they change the same properties?) Its hard to make a catch all solution, so I suggest that you determine what order, if any, the operations must take place in, and use the async library to handle
the control flow.
The async.waterfall method (example below) is useful if you want to pass
results down a chain of functions that execute in order. There are many other
useful functions included in the library, like async.eachSeries (execute a function once per array item in order) and
async.parallel (execute an array of functions simultaneously.) All docs available at https://github.com/caolan/async
var async = require('async');
function calculateX(callback){
async.waterfall(
[
function(done){
var x = 0;
asyncCall1(x, function(x1){ // add x1=x+1;
done(null, x1);
});
},
function(x1, done){
asyncCall2(x1, function(x2){ // add x2=x1+2;
done(null, x2);
});
},
],
function(err, x2){
var t = x2;
callback(t);
});
};
calculateX(function(x2){
mongo.save(x2, function(err){ // or something idk mongo
if(err){ console.log(err) };
});
});

Returning an Array using Firebase

Trying to find the best-use example of returning an array of data in Node.js with Q library (or any similar library, I'm not partial) when using Firebase .on("child_added");
I've tried using Q.all() but it never seems to wait for the promises to fill before returning. This is my current example:
function getIndex()
{
var deferred = q.defer();
deferred.resolve(new FirebaseIndex( Firebase.child('users').child(user.app_user_id).child('posts'), Firebase.child('posts') ) );
return deferred.promise;
}
function getPost( post )
{
var deferred = q.defer();
deferred.resolve(post.val());
return deferred.promise;
}
function getPosts()
{
var promises = [];
getIndex().then( function (posts) {
posts.on( 'child_added', function (_post) {
promises.push( getPost(_post) );
});
});
return q.all(promises);
}
The problem occurs in getPosts(). It pushes a promise into your array inside an async function--that won't work since q.all is called before the promise objects have been added.
Also, child_added is a real-time event notification. You can't use that as a way to grab "all of the data" because there is no such thing as "all"; the data is constantly changing in real-time environments. FirebaseIndex is also using child_added callbacks internally, so that's not going to work with this use case either.
You can grab all of the posts using the 'value' callback (but not a specific subset of records) as follows:
function getPosts() {
var def = q.defer();
Firebase.child('users').once('value', function(snap) {
var records = [];
snap.forEach(function(ss) {
records.push( ss.val() );
});
def.resolve(records);
});
return def.promise;
}
But at this point, it's time to consider things in terms of real-time environments. Most likely, there is no reason "all" data needs to be present before getting to work.
Consider just grabbing each record as they come in and appending them to whatever DOM or Array where they need to be stored, and working from an event driven model instead of a GET/POST centered approach.
With luck, you can bypass this use case entirely.

How to wait for all async calls to finish

I'm using Mongoose with Node.js and have the following code that will call the callback after all the save() calls has finished. However, I feel that this is a very dirty way of doing it and would like to see the proper way to get this done.
function setup(callback) {
// Clear the DB and load fixtures
Account.remove({}, addFixtureData);
function addFixtureData() {
// Load the fixtures
fs.readFile('./fixtures/account.json', 'utf8', function(err, data) {
if (err) { throw err; }
var jsonData = JSON.parse(data);
var count = 0;
jsonData.forEach(function(json) {
count++;
var account = new Account(json);
account.save(function(err) {
if (err) { throw err; }
if (--count == 0 && callback) callback();
});
});
});
}
}
You can clean up the code a bit by using a library like async or Step.
Also, I've written a small module that handles loading fixtures for you, so you just do:
var fixtures = require('./mongoose-fixtures');
fixtures.load('./fixtures/account.json', function(err) {
//Fixtures loaded, you're ready to go
};
Github:
https://github.com/powmedia/mongoose-fixtures
It will also load a directory of fixture files, or objects.
I did a talk about common asyncronous patterns (serial and parallel) and ways to solve them:
https://github.com/masylum/i-love-async
I hope its useful.
I've recently created simpler abstraction called wait.for to call async functions in sync mode (based on Fibers). It's at an early stage but works. It is at:
https://github.com/luciotato/waitfor
Using wait.for, you can call any standard nodejs async function, as if it were a sync function, without blocking node's event loop. You can code sequentially when you need it.
using wait.for your code will be:
//in a fiber
function setup(callback) {
// Clear the DB and load fixtures
wait.for(Account.remove,{});
// Load the fixtures
var data = wait.for(fs.readFile,'./fixtures/account.json', 'utf8');
var jsonData = JSON.parse(data);
jsonData.forEach(function(json) {
var account = new Account(json);
wait.forMethod(account,'save');
}
callback();
}
That's actually the proper way of doing it, more or less. What you're doing there is a parallel loop. You can abstract it into it's own "async parallel foreach" function if you want (and many do), but that's really the only way of doing a parallel loop.
Depending on what you intended, one thing that could be done differently is the error handling. Because you're throwing, if there's a single error, that callback will never get executed (count won't be decremented). So it might be better to do:
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
And handle the error in the callback. It's better node-convention-wise.
I would also change another thing to save you the trouble of incrementing count on every iteration:
var jsonData = JSON.parse(data)
, count = jsonData.length;
jsonData.forEach(function(json) {
var account = new Account(json);
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
});
If you are already using underscore.js anywhere in your project, you can leverage the after method. You need to know how many async calls will be out there in advance, but aside from that it's a pretty elegant solution.

Resources