I have the following piece of code where I iterate through a collection and do another db query and construct an object within its callback. Finally I save that object to another collection.
I wish to call another function after all items have been saved, but can't figure out how. I tried using the async library, specifically async whilst when item is not null, but that just throws me in an infinite loop.
Is there a way to identify when all items have been saved?
Thanks!
var cursor = db.collection('user_apps').find({}, {timeout:false});
cursor.each(function (err, item) {
if (err) {
throw err;
}
if (item) {
var appList = item.appList;
var uuid= item.uuid;
db.collection('app_categories').find({schema_name:{$in: appList}}).toArray(function (err, result) {
if (err) throw err;
var catCount = _.countBy(result, function (obj) {
return obj.category;
})
catObj['_id'] = uuid;
catObj['total_app_num'] = result.length;
catObj['app_breakdown'] = catCount;
db.collection('audiences').insert(catObj, function (err) {
if (err) console.log(err);
});
});
}
else {
// do something here after all items have been saved
}
});
The key here is to use something that is going to respect the callback signal when performing the "loop" operation. The .each() as implemented here will not do that, so you need an "async" loop control that will signify that each loop has iterated and completed, with it's own callback within the callback.
Provided your underlying MongoDB driver is at least version 2, then there is a .forEach() which has a callback which is called when the loop is complete. This is better than .each(), but it does not solve the problem of knowing when the inner "async" .insert() operations have been completed.
So a better approach is to use the stream interface returned by .find(), where this is more flow control allowed. There is a .stream() method for backwards compatibility, but modern drivers will just return the interface by default:
var stream = db.collection('user_apps').find({});
stream.on("err",function(err){
throw(err);
});
stream.on("data",function(item) {
stream.pause(); // pause processing of stream
var appList = item.appList;
var uuid= item.uuid;
db.collection('app_categories').find({schema_name:{ "$in": appList}}).toArray(function (err, result) {
if (err) throw err;
var catCount = _.countBy(result, function (obj) {
return obj.category;
})
var catObj = {}; // always re-init
catObj['_id'] = uuid;
catObj['total_app_num'] = result.length;
catObj['app_breakdown'] = catCount;
db.collection('audiences').insert(catObj, function (err) {
if (err) console.log(err);
stream.resume(); // resume stream processing
});
});
});
stream.on("end",function(){
// stream complete and processing done
});
The .pause() method on the stream stops further events being emitted so that each object result is processed one at a time. When the callback from the .insert() is called, then the .resume() method is called, signifying that processing is complete for that item and a new call can be made to process the next item.
When the stream is complete, then everything is done so the "end" event hook is called to continue your code.
That way, both each loop is signified with an end to move to the next iteration as well as there being a defined "end" event for the complete end of processing. As the control is "inside" the .insert() callback, then those operations are respected for completion as well.
As a side note, you might consider including your "category" information in the source collection, as it seems likely your results can be more efficiently returned using .aggregate() if all required data were in a single collection.
Related
I have a function that processes an array of data (first parameter) and, once the procesing is finished, it invokes only one time a callback function (second parameter). I'm using forEach to process data item by item, consisting the processing of each item in some checkings and storing the param in database. The function storeInDB() does the storing work and uses a callback (second parameter) when the item has been stored.
A first approach to the code is the following:
function doWork(data, callback) {
data.forEach(function (item) {
// Do some check on item
...
storeInDB(item, function(err) {
// check error etc.
...
callback();
});
});
}
However, it's wrong, as the the callback function will be invoked several times (as many as element in the data array).
I'd like to know how to refactor my code in order to achieve the desired behaviour, i.e. only one invocation to callback once the storing work is finished. I guess that async could help in this task, but I haven't find the right pattern yet to combine async + forEach.
Any help is appreciated!
You can use a library such as async to do this, although I would recommend using promises if possible. For your immediate problem you can use a counter to determine how many storage calls have completed and call the callback when the total number are completed.
let counter = 0;
data.forEach(function (item) {
// Do some check on item
...
storeInDB(item, function(err) {
// check error etc.
counter++
if (counter == data.length) {
callback();
}
});
});
you can also utilize the three parameters passed to the function to execute on each array method
function doWork(data, callback) {
data.forEach(function (value,idx,arr) {
// Do some check on item
...
storeInDB(arr[idx], function(err) {
// check error etc.
...
if ( (idx + 1) === arr.length ) {
callback();
}
});
});
}
If storeInDB function returns a promise, you can push all async functions to an array and use Promise.all. After all tasks run successfully, It will invokes callback function.
Hope this helps you.
function doWork(data, callback) {
let arr = [];
data.map(function(itm) {
// Do some check on item
...
arr.push(storeInDB(item));
});
Promise.all(arr)
.then(function(res) {
callback();
});
}
I am adding user validation an data modification page on a node.js application.
In a synchronous universe, in a single function I would:
Lookup the original record in the database
Lookup the user in LDAP to see if they are the owner or admin
Do the logic and write the record.
In an asynchronous universe that won't work. To solve it I've built a series of hand-off functions:
router.post('/writeRecord', jsonParser, function(req, res) {
post = req.post;
var smdb = new AWS.DynamoDB.DocumentClient();
var params = { ... }
smdb.query(params, function(err,data){
if( err == null ) writeRecordStep2(post,data);
}
});
function writeRecord2( ru, post, data ){
var conn = new LDAP();
conn.search(
'ou=groups,o=amazon.com',
{ ... },
function(err,resp){
if( err == null ){
writeRecordStep3( ru, post, data, ldap1 )
}
}
}
function writeRecord3( ru, post, data ){
var conn = new LDAP();
conn.search(
'ou=groups,o=amazon.com',
{ ... },
function(err,resp){
if( err == null ){
writeRecordStep4( ru, post, data, ldap1, ldap2 )
}
}
}
function writeRecordStep4( ru, post, data, ldap1, ldap2 ){
// Do stuff with collected data
}
Additionally, because the LDAP and Dynamo logic are in their own source documents, these functions are scattered tragically around the code.
This strikes me as inefficient, as well as inelegant. I'm eager to find a more natural asynchronous pattern to achieve the same result.
Any promise library should sort your issue out. My preferred choice is bluebird. In summary they help you in performing blocking operations.
If you haven't heard about bluebird then just use it. It converts all function of a module and return promise which is then-able. Simply put, it promisifies all functions.
Here is the mechanism:
Module1.someFunction() \\do your job and finally pass the return object to next call
.then() \\Use that object which is return from the first call, do your job and return the updated value
.then() \\same goes on
.catch() \\do your job when any error occurs.
Hope you understand. Here is an example:
var readFile = Promise.promisify(require("fs").readFile);
readFile("myfile.js",
"utf8").then(function(contents) {
return eval(contents);
}).then(function(result) {
console.log("The result of evaluating
myfile.js", result);
}).catch(SyntaxError, function(e) {
console.log("File had syntax error", e);
//Catch any other error
}).catch(function(e) {
console.log("Error reading file", e);
});
I could not tell from your pseudo-code exactly which async operations depend upon results from with other ones and knowing that is key to the most efficient way to code a series of asynchronous operations. If two operations do not depend upon one another, they can run in parallel which generally gets to an end result faster. I also can't tell exactly what data needs to be passed on to later parts of the async requests (too much pseudo-code and not enough real code to show us what you're really attempting to do).
So, without that level of detail, I'll show you two ways to approach this. The first runs each operation sequentially. Run the first async operation, when it's done, run the next one and accumulates all the results into an object that is passed along to the next link in the chain. This is general purpose since all async operations have access to all the prior results.
This makes use of promises built into the AWS.DynamboDB interface and makes our own promise for conn.search() (though if I knew more about that interface, it may already have a promise interface).
Here's the sequential version:
// promisify the search method
const util = require('util');
LDAP.prototype.searchAsync = util.promisify(LDAP.prototype.search);
// utility function that does a search and adds the result to the object passed in
// returns a promise that resolves to the object
function ldapSearch(data, key) {
var conn = new LDAP();
return conn.searchAsync('ou=groups,o=amazon.com', { ... }).then(results => {
// put our results onto the passed in object
data[key] = results;
// resolve with the original object (so we can collect data here in a promise chain)
return data;
});
}
router.post('/writeRecord', jsonParser, function(req, res) {
let post = req.post;
let smdb = new AWS.DynamoDB.DocumentClient();
let params = { ... }
// The latest AWS interface gets a promise with the .promise() method
smdb.query(params).promise().then(dbresult => {
return ldapSearch({post, dbresult}, "ldap1");
}).then(result => {
// result.dbresult
// result.ldap1
return ldapSearch(result, "ldap2")
}).then(result => {
// result.dbresult
// result.ldap1
// result.ldap2
// doSomething with all the collected data here
}).catch(err => {
console.log(err);
res.status(500).send("Internal Error");
});
});
And, here's a parallel version that runs all three async operations at once and then waits for all three of the to be done and then has all the results at once:
// if the three async operations you show can be done in parallel
// first promisify things
const util = require('util');
LDAP.prototype.searchAsync = util.promisify(LDAP.prototype.search);
function ldapSearch(params) {
var conn = new LDAP();
return conn.searchAsync('ou=groups,o=amazon.com', { ... });
}
router.post('/writeRecord', jsonParser, function(req, res) {
let post = req.post;
let smdb = new AWS.DynamoDB.DocumentClient();
let params = { ... }
Promise.all([
ldapSearch(...),
ldapSearch(...),
smdb.query(params).promise()
]).then(([ldap1Result, ldap2Result, queryResult]) => {
// process ldap1Result, ldap2Result and queryResult here
}).catch(err => {
console.log(err);
res.status(500).send("Internal Error");
});
});
Keep in mind that due to the pseudo-code nature of the code in your question, this is also pseudo-code where implementation details (exactly what parameters you're searching for, what response you're sending, etc...) have to be filled in. This should be illustrative of promise chaining to serialize operations and the use of Promise.all() for parallelizing operations and promisifying a method that didn't have promises built in.
Here is the situation:
I am new to node.js, I have a 40MB file containing multilevel json file like:
[{},{},{}] This is an array of objects (~7000 objects). Each object has properties and a one of those properties is also an array of objects
I wrote a function to read the content of the file and iterate it. I succeeded to get what I wanted in terms of content but not usability. I thought that I wrote an async function that would allow node to serve other web requests while iterating the array but that is not the case. I would be very thankful if anyone can point me to what I've done wrong and how to rewrite it so I can have a non-blocking iteration. Here's the function that handles the situation:
function getContents(callback) {
fs.readFile(file, 'utf8', function (err, data) {
if (err) {
console.log('Error: ' + err);
return;
}
js = JSON.parse(data);
callback();
return;
});
}
getContents(iterateGlobalArr);
var count = 0;
function iterateGlobalArr() {
if (count < js.length) {
innerArr = js.nestedProp;
//iterate nutrients
innerArr.forEach(function(e, index) {
//some simple if condition here
});
var schema = {
//.....get props from forEach iteration
}
Model.create(schema, function(err, post) {
if(err) {
console.log('\ncreation error\n', err);
return;
}
if (!post) {
console.log('\nfailed to create post for schema:\n' + schema);
return;
}
});
count++;
process.nextTick(iterateGlobalArr);
}
else {
console.log("\nIteration finished");
next();
}
Just so it is clear how I've tested the above situation. I open two tabs one loading this iteration which takes some time and second with another node route which does not load until the iteration is over. So essentially I've written a blocking code but not sure how to re-factor it! I suspect that just because everything is happening in the callback I am unable to release the event loop to handle another request...
Your code is almost correct. What you are doing is inadvertently adding ALL the items to the very next tick... which still blocks.
The important piece of code is here:
Model.create(schema, function(err, post) {
if(err) {
console.log('\ncreation error\n', err);
return;
}
if (!post) {
console.log('\nfailed to create post for schema:\n' + schema);
return;
}
});
// add EVERYTHING to the very same next tick!
count++;
process.nextTick(iterateGlobalArr);
Let's say you are in tick A of the event loop when getContents() runs and count is 0. You enter iterateGlobalArr and you call Model.create. Because Model.create is async, it is returning immediately, causing process.nextTick() to add processing of item 1 to the next tick, let's say B. Then it calls iterateGlobalArr, which does the same thing, adding item 2 to the next tick, which is still B. Then item 3, and so on.
What you need to do is move the count increment and process.nextTick() into the callback of Model.create(). This will make sure the current item is processed before nextTick is invoked... which means next item is actually added to the next tick AFTER the model item has been created... which will give your app time to handle other things in between. The fixed version of iterateGlobalArr is here:
function iterateGlobalArr() {
if (count < js.length) {
innerArr = js.nestedProp;
//iterate nutrients
innerArr.forEach(function(e, index) {
//some simple if condition here
});
var schema = {
//.....get props from forEach iteration
}
Model.create(schema, function(err, post) {
// schedule our next item to be processed immediately.
count++;
process.nextTick(iterateGlobalArr);
// then move on to handling this result.
if(err) {
console.log('\ncreation error\n', err);
return;
}
if (!post) {
console.log('\nfailed to create post for schema:\n' + schema);
return;
}
});
}
else {
console.log("\nIteration finished");
next();
}
}
Note also that I would strongly suggest that you pass in your js and counter with each call to iterageGlobalArr, as it will make your iterateGlobalArr alot easier to debug, among other things, but that's another story.
Cheers!
Node is single-threaded so async will only help you if you are relying on another system/subsystem to do the work (a shell script, external database, web service etc). If you have to do the work in Node you are going to block while you do it.
It is possible to create one node process per core. This solution would result in only blocking one of the node processes and leave the rest to service your requests, but this feature is still listed as experimental http://nodejs.org/api/cluster.html.
A single instance of Node runs in a single thread. To take advantage
of multi-core systems the user will sometimes want to launch a cluster
of Node processes to handle the load.
The cluster module allows you to easily create child processes that
all share server ports.
I have a problem, but I have no idea how would one go around this.
I'm using loopback, but I think I would've face the same problem in mongodb sooner or later. Let me explain what am I doing:
I fetch entries from another REST services, then I prepare entries for my API response (entries are not ready yet, because they don't have id from my database)
Before I send response I want to check if entry exist in database, if it doesn't:
Create it, if it does (determined by source_id):
Use it & update it to newer version
Send response with entries (entries now have database ids assigned to them)
This seems okay, and easy to implement but it's not as far as my knowledge goes. I will try to explain further in code:
//This will not work since there are many async call, and fixedResults will be empty at the end
var fixedResults = [];
//results is array of entries
results.forEach(function(item) {
Entry.findOne({where: {source_id: item.source_id}}, functioN(err, res) {
//Did we find it in database?
if(res === null) {
//Create object, another async call here
fixedResults.push(newObj);
} else {
//Update object, another async call here
fixedResults.push(updatedObj);
}
});
});
callback(null, fixedResults);
Note: I left some of the code out, but I think its pretty self explanatory if you read through it.
So I want to iterate through all objects, create or update them in database, then when all are updated/created, use them. How would I do this?
You can use promises. They are callbacks that will be invoked after some other condition has completed. Here's an example of chaining together promises https://coderwall.com/p/ijy61g.
The q library is a good one - https://github.com/kriskowal/q
This question how to use q.js promises to work with multiple asynchronous operations gives a nice code example of how you might build these up.
This pattern is generically called an 'async map'
var fixedResults = [];
var outstanding = 0;
//results is array of entries
results.forEach(function(item, i) {
Entry.findOne({where: {source_id: item.source_id}}, functioN(err, res) {
outstanding++;
//Did we find it in database?
if(res === null) {
//Create object, another async call here
DoCreateObject(function (err, result) {
if (err) callback(err);
fixedResults[i] = result;
if (--outstanding === 0) callback (null, fixedResults);
});
} else {
//Update object, another async call here
DoOtherCall(function (err, result) {
if(err) callback(err);
fixedResults[i] = result;
if (--outstanding === 0) callback (null, fixedResults);
});
}
});
});
callback(null, fixedResults);
You could use async.map for this. For each element in the array, run the array iterator function doing what you want to do to each element, then run the callback with the result (instead of fixedResults.push), triggering the map callback when all are done. Each iteration ad database call would then be run in parallel.
Mongo has a function called upsert.
http://docs.mongodb.org/manual/reference/method/db.collection.update/
It does exactly what you ask for without needing the checks. You can fire all three requests asnc and just validate the result comes back as true. No need for additional processing.
First post here, so apologies if I get things wrong...
I'm creating some standalone code to read a folder structure and return all .mp3 files in an Array. Once this has been returned I then loop through the array and for each item I create a Mongoose Object and populate the fields, before saving the object with .save()
I am looping through the array using async.forEach - and while it does loop through all items in the Array they do not save, and there is no error produced to help me identify what is wrong.
If I move the logic of the loop elsewhere then the MP3s are stored in the mongodb database - if I have the example shown nothing is saved.
var saveMP3s = function(MP3Files, callback) {
console.log('start loop -> saving MP3s');
async.forEach( MP3Files, function(mp3file, callback) {
newTrack = new MP3Track();
newTrack.title = mp3file.title;
newTrack.track = mp3file.track;
newTrack.disk = mp3file.disk;
newTrack.metadata = mp3file.metadata;
newTrack.path = mp3file.path;
console.log('....:> Song Start: ');
console.log(newTrack.title);
console.log(newTrack.track);
console.log(newTrack.disk);
console.log(newTrack.metadata);
console.log(newTrack.path);
console.log('....:> Song End: ');
newTrack.save(function (err) {
if (err) {
console.log(err);
} else {
console.log('saving Track: '+ newTrack.title);
callback();
}
});
}, function(err) {
if (err) { console.log(err); }
});
console.log('end loop -> finished saving MP3s');
};
The trouble I have is that when the code is NOT in the async loop the code works and the MP3 is saved in the MongoDB database, inside the async code nothing is saved and no errors are given as to why.
I did try (in an earlier incarnation of the code) create the objects once I had read the metadata of the MP3 files - but for some reason it would not save the last 2 objects in the list (of 12)... So I've rewritten it to scan all the items first and then populate mongoDB using mongoose from an array; just to split things up. But having no luck in finding out why nothing happens and why there are no errors on the .save()
Any help with be greatly appreciated.
Regards,
Mark
var saveMP3s = function(MP3Files, callback) {
console.log('start loop -> saving MP3s');
async.forEach( MP3Files, function(mp3file, callback) {
var newTrack = new MP3Track(); // <--- USE VAR HERE
newTrack.title = mp3file.title;
newTrack.track = mp3file.track;
newTrack.disk = mp3file.disk;
newTrack.metadata = mp3file.metadata;
newTrack.path = mp3file.path;
console.log('....:> Song Start: ');
console.log(newTrack.title);
console.log(newTrack.track);
console.log(newTrack.disk);
console.log(newTrack.metadata);
console.log(newTrack.path);
console.log('....:> Song End: ');
newTrack.save(function (err) {
if (err) {
console.log(err);
} else {
console.log('saving Track: '+ newTrack.title);
callback();
}
});
}, function(err) {
if (err) { console.log(err); }
});
console.log('end loop -> finished saving MP3s');
};
I suspect that the missing var keyword is declaring a global variable and you are overriding it on each iteration of the loop. Meaning, before the first newTrack can complete it's async save operation, you've already moved on in the loop and overridden that variable with the next instance.
Also, in async.forEach you MUST invoke the callback when the operation is completed. You are only calling it if the record saves successfully. You should also be calling it if an error occurs and pass in the error.
Finally, the callback argument to your saveMP3s function never gets called at all. The call to callback() inside the newTrack.save function only refers to the callback argument passed into the anonymous function by async.forEach.
SOLVED
Hi,
I ended up solving it by having the async loop higher up the in the logic so the saveMP3s function was called from within the async loop itself.
Thank you all for your thoughts and suggestions.
Enjoy your day.
mark
(not allowed to Accept my answer until tomorrow, so will update the question status then)