Idiomatic way to wait for multiple callbacks in Node.js - node.js

Suppose you need to do some operations that depend on some temp file. Since
we're talking about Node here, those operations are obviously asynchronous.
What is the idiomatic way to wait for all operations to finish in order to
know when the temp file can be deleted?
Here is some code showing what I want to do:
do_something(tmp_file_name, function(err) {});
do_something_other(tmp_file_name, function(err) {});
fs.unlink(tmp_file_name);
But if I write it this way, the third call can be executed before the first two
get a chance to use the file. I need some way to guarantee that the first two
calls already finished (invoked their callbacks) before moving on without nesting
the calls (and making them synchronous in practice).
I thought about using event emitters on the callbacks and registering a counter
as receiver. The counter would receive the finished events and count how many
operations were still pending. When the last one finished, it would delete the
file. But there is the risk of a race condition and I'm not sure this is
usually how this stuff is done.
How do Node people solve this kind of problem?

Update:
Now I would advise to have a look at:
Promises
The Promise object is used for deferred and asynchronous computations.
A Promise represents an operation that hasn't completed yet, but is
expected in the future.
A popular promises library is bluebird. A would advise to have a look at why promises.
You should use promises to turn this:
fs.readFile("file.json", function (err, val) {
if (err) {
console.error("unable to read file");
}
else {
try {
val = JSON.parse(val);
console.log(val.success);
}
catch (e) {
console.error("invalid json in file");
}
}
});
Into this:
fs.readFileAsync("file.json").then(JSON.parse).then(function (val) {
console.log(val.success);
})
.catch(SyntaxError, function (e) {
console.error("invalid json in file");
})
.catch(function (e) {
console.error("unable to read file");
});
generators: For example via co.
Generator based control flow goodness for nodejs and the browser,
using promises, letting you write non-blocking code in a nice-ish way.
var co = require('co');
co(function *(){
// yield any promise
var result = yield Promise.resolve(true);
}).catch(onerror);
co(function *(){
// resolve multiple promises in parallel
var a = Promise.resolve(1);
var b = Promise.resolve(2);
var c = Promise.resolve(3);
var res = yield [a, b, c];
console.log(res);
// => [1, 2, 3]
}).catch(onerror);
// errors can be try/catched
co(function *(){
try {
yield Promise.reject(new Error('boom'));
} catch (err) {
console.error(err.message); // "boom"
}
}).catch(onerror);
function onerror(err) {
// log any uncaught errors
// co will not throw any errors you do not handle!!!
// HANDLE ALL YOUR ERRORS!!!
console.error(err.stack);
}
If I understand correctly I think you should have a look at the very good async library. You should especially have a look at the series. Just a copy from the snippets from github page:
async.series([
function(callback){
// do some stuff ...
callback(null, 'one');
},
function(callback){
// do some more stuff ...
callback(null, 'two');
},
],
// optional callback
function(err, results){
// results is now equal to ['one', 'two']
});
// an example using an object instead of an array
async.series({
one: function(callback){
setTimeout(function(){
callback(null, 1);
}, 200);
},
two: function(callback){
setTimeout(function(){
callback(null, 2);
}, 100);
},
},
function(err, results) {
// results is now equals to: {one: 1, two: 2}
});
As a plus this library can also run in the browser.

The simplest way increment an integer counter when you start an async operation and then, in the callback, decrement the counter. Depending on the complexity, the callback could check the counter for zero and then delete the file.
A little more complex would be to maintain a list of objects, and each object would have any attributes that you need to identify the operation (it could even be the function call) as well as a status code. The callbacks would set the status code to completed.
Then you would have a loop that waits (using process.nextTick) and checks to see if all tasks are completed. The advantage of this method over the counter, is that if it is possible for all outstanding tasks to complete, before all tasks are issued, the counter technique would cause you to delete the file prematurely.

// simple countdown latch
function CDL(countdown, completion) {
this.signal = function() {
if(--countdown < 1) completion();
};
}
// usage
var latch = new CDL(10, function() {
console.log("latch.signal() was called 10 times.");
});

There is no "native" solution, but there are a million flow control libraries for node. You might like Step:
Step(
function(){
do_something(tmp_file_name, this.parallel());
do_something_else(tmp_file_name, this.parallel());
},
function(err) {
if (err) throw err;
fs.unlink(tmp_file_name);
}
)
Or, as Michael suggested, counters could be a simpler solution. Take a look at this semaphore mock-up. You'd use it like this:
do_something1(file, queue('myqueue'));
do_something2(file, queue('myqueue'));
queue.done('myqueue', function(){
fs.unlink(file);
});

I'd like to offer another solution that utilizes the speed and efficiency of the programming paradigm at the very core of Node: events.
Everything you can do with Promises or modules designed to manage flow-control, like async, can be accomplished using events and a simple state-machine, which I believe offers a methodology that is, perhaps, easier to understand than other options.
For example assume you wish to sum the length of multiple files in parallel:
const EventEmitter = require('events').EventEmitter;
// simple event-driven state machine
const sm = new EventEmitter();
// running state
let context={
tasks: 0, // number of total tasks
active: 0, // number of active tasks
results: [] // task results
};
const next = (result) => { // must be called when each task chain completes
if(result) { // preserve result of task chain
context.results.push(result);
}
// decrement the number of running tasks
context.active -= 1;
// when all tasks complete, trigger done state
if(!context.active) {
sm.emit('done');
}
};
// operational states
// start state - initializes context
sm.on('start', (paths) => {
const len=paths.length;
console.log(`start: beginning processing of ${len} paths`);
context.tasks = len; // total number of tasks
context.active = len; // number of active tasks
sm.emit('forEachPath', paths); // go to next state
});
// start processing of each path
sm.on('forEachPath', (paths)=>{
console.log(`forEachPath: starting ${paths.length} process chains`);
paths.forEach((path) => sm.emit('readPath', path));
});
// read contents from path
sm.on('readPath', (path) => {
console.log(` readPath: ${path}`);
fs.readFile(path,(err,buf) => {
if(err) {
sm.emit('error',err);
return;
}
sm.emit('processContent', buf.toString(), path);
});
});
// compute length of path contents
sm.on('processContent', (str, path) => {
console.log(` processContent: ${path}`);
next(str.length);
});
// when processing is complete
sm.on('done', () => {
const total = context.results.reduce((sum,n) => sum + n);
console.log(`The total of ${context.tasks} files is ${total}`);
});
// error state
sm.on('error', (err) => { throw err; });
// ======================================================
// start processing - ok, let's go
// ======================================================
sm.emit('start', ['file1','file2','file3','file4']);
Which will output:
start: beginning processing of 4 paths
forEachPath: starting 4 process chains
readPath: file1
readPath: file2
processContent: file1
readPath: file3
processContent: file2
processContent: file3
readPath: file4
processContent: file4
The total of 4 files is 4021
Note that the ordering of the process chain tasks is dependent upon system load.
You can envision the program flow as:
start -> forEachPath -+-> readPath1 -> processContent1 -+-> done
+-> readFile2 -> processContent2 -+
+-> readFile3 -> processContent3 -+
+-> readFile4 -> processContent4 -+
For reuse, it would be trivial to create a module to support the various flow-control patterns, i.e. series, parallel, batch, while, until, etc.

The simplest solution is to run the do_something* and unlink in sequence as follows:
do_something(tmp_file_name, function(err) {
do_something_other(tmp_file_name, function(err) {
fs.unlink(tmp_file_name);
});
});
Unless, for performance reasons, you want to execute do_something() and do_something_other() in parallel, I suggest to keep it simple and go this way.

Wait.for https://github.com/luciotato/waitfor
using Wait.for:
var wait=require('wait.for');
...in a fiber...
wait.for(do_something,tmp_file_name);
wait.for(do_something_other,tmp_file_name);
fs.unlink(tmp_file_name);

With pure Promises it could be a bit more messy, but if you use Deferred Promises then it's not so bad:
Install:
npm install --save #bitbar/deferred-promise
Modify your code:
const DeferredPromise = require('#bitbar/deferred-promise');
const promises = [
new DeferredPromise(),
new DeferredPromise()
];
do_something(tmp_file_name, (err) => {
if (err) {
promises[0].reject(err);
} else {
promises[0].resolve();
}
});
do_something_other(tmp_file_name, (err) => {
if (err) {
promises[1].reject(err);
} else {
promises[1].resolve();
}
});
Promise.all(promises).then( () => {
fs.unlink(tmp_file_name);
});

Related

How do I achieve a synchronous requirement using asynchronous NodeJS

I am adding user validation an data modification page on a node.js application.
In a synchronous universe, in a single function I would:
Lookup the original record in the database
Lookup the user in LDAP to see if they are the owner or admin
Do the logic and write the record.
In an asynchronous universe that won't work. To solve it I've built a series of hand-off functions:
router.post('/writeRecord', jsonParser, function(req, res) {
post = req.post;
var smdb = new AWS.DynamoDB.DocumentClient();
var params = { ... }
smdb.query(params, function(err,data){
if( err == null ) writeRecordStep2(post,data);
}
});
function writeRecord2( ru, post, data ){
var conn = new LDAP();
conn.search(
'ou=groups,o=amazon.com',
{ ... },
function(err,resp){
if( err == null ){
writeRecordStep3( ru, post, data, ldap1 )
}
}
}
function writeRecord3( ru, post, data ){
var conn = new LDAP();
conn.search(
'ou=groups,o=amazon.com',
{ ... },
function(err,resp){
if( err == null ){
writeRecordStep4( ru, post, data, ldap1, ldap2 )
}
}
}
function writeRecordStep4( ru, post, data, ldap1, ldap2 ){
// Do stuff with collected data
}
Additionally, because the LDAP and Dynamo logic are in their own source documents, these functions are scattered tragically around the code.
This strikes me as inefficient, as well as inelegant. I'm eager to find a more natural asynchronous pattern to achieve the same result.
Any promise library should sort your issue out. My preferred choice is bluebird. In summary they help you in performing blocking operations.
If you haven't heard about bluebird then just use it. It converts all function of a module and return promise which is then-able. Simply put, it promisifies all functions.
Here is the mechanism:
Module1.someFunction() \\do your job and finally pass the return object to next call
.then() \\Use that object which is return from the first call, do your job and return the updated value
.then() \\same goes on
.catch() \\do your job when any error occurs.
Hope you understand. Here is an example:
var readFile = Promise.promisify(require("fs").readFile);
readFile("myfile.js",
"utf8").then(function(contents) {
return eval(contents);
}).then(function(result) {
console.log("The result of evaluating
myfile.js", result);
}).catch(SyntaxError, function(e) {
console.log("File had syntax error", e);
//Catch any other error
}).catch(function(e) {
console.log("Error reading file", e);
});
I could not tell from your pseudo-code exactly which async operations depend upon results from with other ones and knowing that is key to the most efficient way to code a series of asynchronous operations. If two operations do not depend upon one another, they can run in parallel which generally gets to an end result faster. I also can't tell exactly what data needs to be passed on to later parts of the async requests (too much pseudo-code and not enough real code to show us what you're really attempting to do).
So, without that level of detail, I'll show you two ways to approach this. The first runs each operation sequentially. Run the first async operation, when it's done, run the next one and accumulates all the results into an object that is passed along to the next link in the chain. This is general purpose since all async operations have access to all the prior results.
This makes use of promises built into the AWS.DynamboDB interface and makes our own promise for conn.search() (though if I knew more about that interface, it may already have a promise interface).
Here's the sequential version:
// promisify the search method
const util = require('util');
LDAP.prototype.searchAsync = util.promisify(LDAP.prototype.search);
// utility function that does a search and adds the result to the object passed in
// returns a promise that resolves to the object
function ldapSearch(data, key) {
var conn = new LDAP();
return conn.searchAsync('ou=groups,o=amazon.com', { ... }).then(results => {
// put our results onto the passed in object
data[key] = results;
// resolve with the original object (so we can collect data here in a promise chain)
return data;
});
}
router.post('/writeRecord', jsonParser, function(req, res) {
let post = req.post;
let smdb = new AWS.DynamoDB.DocumentClient();
let params = { ... }
// The latest AWS interface gets a promise with the .promise() method
smdb.query(params).promise().then(dbresult => {
return ldapSearch({post, dbresult}, "ldap1");
}).then(result => {
// result.dbresult
// result.ldap1
return ldapSearch(result, "ldap2")
}).then(result => {
// result.dbresult
// result.ldap1
// result.ldap2
// doSomething with all the collected data here
}).catch(err => {
console.log(err);
res.status(500).send("Internal Error");
});
});
And, here's a parallel version that runs all three async operations at once and then waits for all three of the to be done and then has all the results at once:
// if the three async operations you show can be done in parallel
// first promisify things
const util = require('util');
LDAP.prototype.searchAsync = util.promisify(LDAP.prototype.search);
function ldapSearch(params) {
var conn = new LDAP();
return conn.searchAsync('ou=groups,o=amazon.com', { ... });
}
router.post('/writeRecord', jsonParser, function(req, res) {
let post = req.post;
let smdb = new AWS.DynamoDB.DocumentClient();
let params = { ... }
Promise.all([
ldapSearch(...),
ldapSearch(...),
smdb.query(params).promise()
]).then(([ldap1Result, ldap2Result, queryResult]) => {
// process ldap1Result, ldap2Result and queryResult here
}).catch(err => {
console.log(err);
res.status(500).send("Internal Error");
});
});
Keep in mind that due to the pseudo-code nature of the code in your question, this is also pseudo-code where implementation details (exactly what parameters you're searching for, what response you're sending, etc...) have to be filled in. This should be illustrative of promise chaining to serialize operations and the use of Promise.all() for parallelizing operations and promisifying a method that didn't have promises built in.

How to wait in node.js for a variable available on the cloud to have a specific value

I'm sorry if this is a basic question, but I am trying to implement a program in node.js that should wait for the value of a variable available trough a request to a cloud api (photon.variable()) to be 1. This variable should not be requested more than once per second. My first attempt is included in the sample code below. Despite knowing it does not work at all, I think it could be useful to show the functionality I would like to implement.
var photondata = 0;
while (photondata < 1)
{
setTimeout(function () {
photon.variable("witok", function(err, data) {
if (!err) {
console.log("data: ", data.result);
photondata = data.result;
}
else console.log(err);
})}, 1000);
}
Since you couldn't do async stuff in loops before, the traditional approach would be to create a function that adds itself to setTimeout for as long as needed, then calls some other function when it's done. You still need to do this in the browser if not using Babel.
These days, you can stop execution and wait for things to happen when using a generator function (which latest versions of Node now support). There are many libraries that will let you do this and I will advertise ours :)
CL.run(function* () {
var photondata = 0;
while (true) {
yield CL.try(function* () {
var data = yield photon.variable("witok", CL.cb());
console.log("data: ", data.result);
photondata = data.result;
}, function* (err) {
console.log(err.message);
});
if (photondata >= 1) break;
yield CL.sleep(1000);
}
// do whatever you need here
});

nodeJS too many child processes?

I am using node to recursively traverse a file system and make a system call for each file, by using child.exec. It works well when tested on a small structure, with a couple of folders and files, but when run on the whole home directory, it crashes after a while
child_process.js:945
throw errnoException(process._errno, 'spawn');
^
Error: spawn Unknown system errno 23
at errnoException (child_process.js:998:11)
at ChildProcess.spawn (child_process.js:945:11)
at exports.spawn (child_process.js:733:9)
at Object.exports.execFile (child_process.js:617:15)
at exports.exec (child_process.js:588:18)
Does this happen because it uses up all resources? How can I avoid this?
EDIT: Code
improvement and best practices suggestions always welcome :)
function processDir(dir, callback) {
fs.readdir(dir, function (err, files) {
if (err) {...}
if (files) {
async.each(files, function (file, cb) {
var filePath = dir + "/" + file;
var stats = fs.statSync(filePath);
if (stats) {
if (stats.isFile()) {
processFile(dir, file, function (err) {
if (err) {...}
cb();
});
} else if (stats.isDirectory()) {
processDir(filePath, function (err) {
if (err) {...}
cb();
});
}
}
}, function (err) {
if (err) {...}
callback();
}
);
}
});
}
the issue can be because of having many open files simultaneously
consider using async module to solve the issue
https://github.com/caolan/async#eachLimit
async.eachLimit(
files,
20,
function(file, callback){
//process file here and call callback
},
function(err){
//done
}
);
in current example you will process 20 files at a time
Well, I don't know the reason for the failure, but if this is what you expect (using up all of the resources) or as others say (too many files open), you could try to use multitasking for it. JXcore (fork of Node.JS) offers such thing - it allows to run a task in a separate instance, but this is done still inside one single process.
While Node.JS app as a process has its limitations - JXcore with its sub-instances multiplies those limits: single process even with one extra instance (or task, or well: we can call it sub-thread) doubles the limits!
So, let's say, that you will run each of your spawn() in a separate task. Or, since tasks are not running in a main thread any more - you can even use sync method that jxcore offers : cmdSync().
Probably the the best illustration would be given by this few lines of the code:
jxcore.tasks.setThreadCount(4);
var task = function(file) {
var your_cmd = "do something with " + file;
return jxcore.utils.cmdSync(your_cmd);
};
jxcore.tasks.addTask(task, "file1.txt", function(ret) {
console.log("the exit code:", ret.exitCode);
console.log("output:", ret.out);
});
Let me repeat: the task will not block the main thread, since it is running in a separate instance!
Multitasking API is documented here: Multitasking.
As has been established in comments, you are likely running out of file handles because you are running too many concurrent operations on your files. So, a solution is to limit the number of concurrent operations that run at once so too many files aren't in use at the same time.
Here's a somewhat different implementation that uses Bluebird promises to control both the async aspects of the operation and the concurrency aspects of the operation.
To make the management of the concurrency aspect easier, this collects the entire list of files into an array first and then processes the array of filenames rather than processing as you go. This makes it easier to use a built-in concurrency capability in Bluebird's .map() (which works on a single array) so we don't have to write that code ourselves:
var Promise = require("bluebird");
var fs = Promise.promisifyAll(require("fs"));
var path = require("path");
// recurse a directory, call a callback on each file (that returns a promise)
// run a max of numConcurrent callbacks at once
// returns a promise for when all work is done
function processDir(dir, numConcurrent, fileCallback) {
var allFiles = [];
function listDir(dir, list) {
var dirs = [];
return fs.readdirAsync(dir).map(function(file) {
var filePath = path.join(dir , file);
return fs.statAsync(filePath).then(function(stats) {
if (stats.isFile()) {
allFiles.push(filePath);
} else if (stats.isDirectory()) {
return listDir(filePath);
}
}).catch(function() {
// ignore errors on .stat - file could just be gone now
return;
});
});
}
return listDir(dir, allFiles).then(function() {
return Promise.map(allFiles, function(filename) {
return fileCallback(filename);
}, {concurrency: numConcurrent});
});
}
// example usage:
// pass the initial directory,
// the number of concurrent operations allowed at once
// and a callback function (that returns a promise) to process each file
processDir(process.cwd(), 5, function(file) {
// put your own code here to process each file
// this is code to cause each callback to take a random amount of time
// for testing purposes
var rand = Math.floor(Math.random() * 500) + 500;
return Promise.delay(rand).then(function() {
console.log(file);
});
}).catch(function(e) {
// error here
}).finally(function() {
console.log("done");
});
FYI, I think you'll find that proper error propagation and proper error handling from many async operations is much, much easier with promises than the plain callback method.

How I can stop async.queue after the first fail?

I want to stop of executing of my async.queue after first task error was occurred. I need to perform several similar actions in parallel with the concurrency restriction, but stop all the actions after first error. How can I do that or what should I use instead?
Assuming you fired 5 parallel functions, each will take 5 seconds. While in 3rd second, function 1 failed. Then how you can stop the execution of the rest?
It depends of what those functions do, you may poll using setInterval. However if your question is how to stop further tasks to be pushed to the queue. You may do this:
q.push(tasks, function (err) {
if (err && !called) {
//Will prevent async to push more tasks to the queue, however please note that
//whatever pushed to the queue, it will be processed anyway.
q.kill();
//This will not allow double calling for the final callback
called = true;
//This the main process callback, the final callback
main(err, results);
}
});
Here a full working example:
var async = require('async');
/*
This function is the actual work you are trying to do.
Please note for example if you are running child processes
here, by doing q.kill you will not stop the execution
of those processes, so you need actually to keep track the
spawned processed and then kill them when you call q.kill
in 'pushCb' function. In-case of just long running function,
you may poll using setInterval
*/
function worker(task, wcb) {
setTimeout(function workerTimeout() {
if (task === 11 || task === 12 || task === 3) {
return wcb('error in processing ' + task);
}
wcb(null, task + ' got processed');
}, Math.floor(Math.random() * 100));
}
/*
This function that will push the tasks to async.queue,
and then hand them to your worker function
*/
function process(tasks, concurrency, pcb) {
var results = [], called = false;
var q = async.queue(function qWorker(task, qcb) {
worker(task, function wcb(err, data) {
if (err) {
return qcb(err); //Here how we propagate error to qcb
}
results.push(data);
qcb();
});
}, concurrency);
/*
The trick is in this function, note that checking q.tasks.length
does not work q.kill introduced in async 0.7.0, it is just setting
the drain function to null and the tasks length to zero
*/
q.push(tasks, function qcb(err) {
if (err && !called) {
q.kill();
called = true;
pcb(err, results);
}
});
q.drain = function drainCb() {
pcb(null, results);
}
}
var tasks = [];
var concurrency = 10;
for (var i = 1; i <= 20; i += 1) {
tasks.push(i);
}
process(tasks, concurrency, function pcb(err, results) {
console.log(results);
if (err) {
return console.log(err);
}
console.log('done');
});
async documentation on github page is either outdated or incorrect, while inspecting the queue object returned by async.queue() method I do not see the method kill().
Nevertheless there is a way around it. Queue object has property tasks which is an array, simply assigning a reference to an empty array did the trick for me.
queue.push( someTasks, function ( err ) {
if ( err ) queue.tasks = [];
});

How to wait for all async calls to finish

I'm using Mongoose with Node.js and have the following code that will call the callback after all the save() calls has finished. However, I feel that this is a very dirty way of doing it and would like to see the proper way to get this done.
function setup(callback) {
// Clear the DB and load fixtures
Account.remove({}, addFixtureData);
function addFixtureData() {
// Load the fixtures
fs.readFile('./fixtures/account.json', 'utf8', function(err, data) {
if (err) { throw err; }
var jsonData = JSON.parse(data);
var count = 0;
jsonData.forEach(function(json) {
count++;
var account = new Account(json);
account.save(function(err) {
if (err) { throw err; }
if (--count == 0 && callback) callback();
});
});
});
}
}
You can clean up the code a bit by using a library like async or Step.
Also, I've written a small module that handles loading fixtures for you, so you just do:
var fixtures = require('./mongoose-fixtures');
fixtures.load('./fixtures/account.json', function(err) {
//Fixtures loaded, you're ready to go
};
Github:
https://github.com/powmedia/mongoose-fixtures
It will also load a directory of fixture files, or objects.
I did a talk about common asyncronous patterns (serial and parallel) and ways to solve them:
https://github.com/masylum/i-love-async
I hope its useful.
I've recently created simpler abstraction called wait.for to call async functions in sync mode (based on Fibers). It's at an early stage but works. It is at:
https://github.com/luciotato/waitfor
Using wait.for, you can call any standard nodejs async function, as if it were a sync function, without blocking node's event loop. You can code sequentially when you need it.
using wait.for your code will be:
//in a fiber
function setup(callback) {
// Clear the DB and load fixtures
wait.for(Account.remove,{});
// Load the fixtures
var data = wait.for(fs.readFile,'./fixtures/account.json', 'utf8');
var jsonData = JSON.parse(data);
jsonData.forEach(function(json) {
var account = new Account(json);
wait.forMethod(account,'save');
}
callback();
}
That's actually the proper way of doing it, more or less. What you're doing there is a parallel loop. You can abstract it into it's own "async parallel foreach" function if you want (and many do), but that's really the only way of doing a parallel loop.
Depending on what you intended, one thing that could be done differently is the error handling. Because you're throwing, if there's a single error, that callback will never get executed (count won't be decremented). So it might be better to do:
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
And handle the error in the callback. It's better node-convention-wise.
I would also change another thing to save you the trouble of incrementing count on every iteration:
var jsonData = JSON.parse(data)
, count = jsonData.length;
jsonData.forEach(function(json) {
var account = new Account(json);
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
});
If you are already using underscore.js anywhere in your project, you can leverage the after method. You need to know how many async calls will be out there in advance, but aside from that it's a pretty elegant solution.

Resources