Complex sequencing of promises - nested - node.js

After a lot of googling I have not been able to confirm the correct approach to this problem. The following code runs as expected but I have a grave feeling that I am not approaching this in the correct way, and I am setting myself up for problems.
The following code is initiated by the main app.js file and is passed a location to start loading XML files from and processing into a mongoDB
exports.processProfiles = function(path) {
var deferrer = q.defer();
q(dataService.deleteProfiles()) // simple mongodb call to empty the Profiles collection
.then(function(deleteResult) {
return loadFilenames(path); // method to load all filenames in the given path using fs
})
.then(function(filenames) {
// now we have all the file names lets load and save
filenames.forEach(function(filename) {
// Here is where i think the problem is!
// kick off another promise chain for the dynamically sized array of files to process
q(loadFileContent(path, filename)) // first we load the data in the file
.then(function(inboundFile) {
// then parse XML structure to my new shiny JSON structure
// and ask Mongo to store it for me
return dataService.createProfile(processProfileXML(filename, inboundFile));
})
.done(function(result) {
console.log(result);
})
});
})
.catch(function(err) {
deferrer.reject('Unable to Process Profile records : ' + err);
})
.done(function() {
deferrer.resolve('Profile Processing Completed');
});
return deferrer.promise;
}
Whilst this code works these are my main concerns but cannot solve them on my own after a few hours of Google and reading.
1) Is this blocking? The read out to the console is difficult to understand if this is running asynchronously as i want it to - i think it is but advice on if I am doing something fundamentally wrong would be great
2) Is having a nested promise a bad idea, should I be linking it to the outter promise - I have tried but could not get anything to compile or run.

I haven't used Q in a really long time, but I think that you'd need to do is let it know you're about to hand back an array of promises that need to all be satisfied before moving on.
Additionally as you're waiting for multiple promises on one section of code, rather than nesting further, throw the 'set' of promises back up once they're all satisfied.
q(dataService.deleteProfiles()) // simple mongodb call to empty the Profiles collection
.then(function (deleteResult) {
return loadFilenames(path); // method to load all filenames in the given path using fs
})
.then(function (filenames) {
return q.all(
filenames.map(function (filename) {
return q(loadFileContent(path, filename)) { /* Do stuff with your filenames */ });
})
);
.then(function (resultsOfLoadFileContentsPromises) {
console.log('I did stuff with all the things');
)
.catch(function(err) {});
What you have is not 'blocking'. But really what you're doing with promises is moving things into a new 'block'ing section. The more blocks you have, the more async-ish your code will appear. If nothing else is running apart from this promise, it will still appear procedural.
But inner promises must still resolve before the parent promises resolve thereafter.
Inner promises like what you have aren't an inherently bad, personally I will break them out into seperate files to makes easier to reason about, but I wouldn't define that as 'bad' unless there's no need for that inner promise to exist, however where possible (and in your example here) I've adjusted so I throw back up the next set of promises for a new section to deal with the data after it's gotten it.
(I'm not great with Q though, this code will probably require a little further tweaking).

Related

Difficulty processing CSV file, browser timeout

I was asked to import a csv file from a server daily and parse the respective header to the appropriate fields in mongoose.
My first idea was to make it to run automatically with a scheduler using the cron module.
const CronJob = require('cron').CronJob;
const fs = require("fs");
const csv = require("fast-csv")
new CronJob('30 2 * * *', async function() {
await parseCSV();
this.stop();
}, function() {
this.start()
}, true);
Next, the parseCSV() function code is as follow:
(I have simplify some of the data)
function parseCSV() {
let buffer = [];
let stream = fs.createReadStream("data.csv");
csv.fromStream(stream, {headers:
[
"lot", "order", "cwotdt"
]
, trim:true})
.on("data", async (data) =>{
let data = { "order": data.order, "lot": data.lot, "date": data.cwotdt};
// Only add product that fulfill the following condition
if (data.cwotdt !== "000000"){
let product = {"order": data.order, "lot": data.lot}
// Check whether product exist in database or not
await db.Product.find(product, function(err, foundProduct){
if(foundProduct && foundProduct.length !== 0){
console.log("Product exists")
} else{
buffer.push(product);
console.log("Product not exists")
}
})
}
})
.on("end", function(){
db.Product.find({}, function(err, productAvailable){
// Check whether database exists or not
if(productAvailable.length !== 0){
// console.log("Database Exists");
// Add subsequent onward
db.Product.insertMany(buffer)
buffer = [];
} else{
// Add first time
db.Product.insertMany(buffer)
buffer = [];
}
})
});
}
It is not a problem if it's just a few line of rows in the csv file but just only reaching 2k rows, I encountered a problem. The culprit is due to the if condition checking when listening to the event handler on, it needs to check every single row to see whether the database contains the data already or not.
The reason I'm doing this is that the csv file will have new data added into it and I need to add all the data for the first time if database is empty or look into every single row and only add those new data into mongoose.
The 1st approach I did from here (as in the code),was using async/await to make sure that all the datas have been read before proceeding to the event handler end. This helps but I see from time to time (with mongoose.set("debug", true);), some data are being queried twice, which I have no idea why.
The 2nd approach was not to use the async/await feature, this has some downside since the data was not fully queried, it proceeded straight to the event handler end and then insertMany some of the datas which were able to get pushed into the buffer.
If i stick with the current approach, it is not an issue, but the query will take 1 to 2 minutes, not to mention even more if the database keeps growing. So, during those few minutes of querying, the event queue got blocked and therefore when sending request to the server, the server time out.
I used stream.pause() and stream.resume() before this code but I can't get it to work as it just jump straight to the end event handler first. This cause the buffer to be empty every single time since end event handler runs before the on event handler
I cant' remember the links that I used but the fundamentals that I got from is through this.
Import CSV Using Mongoose Schema
I saw these threads:
Insert a large csv file, 200'000 rows+, into MongoDB in NodeJS
Can't populate big chunk of data to mongodb using Node.js
to be similar to what I need but it's a bit too complicated for me to understand what is going on. Seems like using socket or a child process maybe? Furthermore, I still need to check conditions before adding into the buffer
Anyone care to guide me on this?
Edit: await is removed from console.log as it is not asynchronous
Forking a child process approach:
When web service got a request of csv data file save it somewhere in app
Fork a child process -> child process example
Pass the file url to the child_process to run the insert checks
When child process finish processing the csv file, delete the file
Like what Joe said, indexing the DB would speed up the processing time by a lot when there are lots(millions) of tuples.
If you create an index on order and lot. The query should be very fast.
db.Product.createIndex( { order: 1, lot: 1 }
Note: This is a compound index and may not be the ideal solution. Index strategies
Also, your await on console.log is weird. That may be causing your timing issues. console.log is not async. Additionally the function is not marked async
// removing await from console.log
let product = {"order": data.order, "lot": data.lot}
// Check whether product exist in database or not
await db.Product.find(product, function(err, foundProduct){
if(foundProduct && foundProduct.length !== 0){
console.log("Product exists")
} else{
buffer.push(product);
console.log("Product not exists")
}
})
I would try with removing the await on console.log (that may be a red herring if console.log is for stackoverflow and hiding the actual async method.) However, be sure to mark the function with async if that is the case.
Lastly, if the problem still exists. I may look into a 2 tiered approach.
Insert all lines from the CSV file into a mongo collection.
Process that mongo collection after the CSV has been parsed. Removing the CSV from the equation.

How to return promise to the router callback in NodeJS/ExpressJS

I am new to nodejs/expressjs and mongodb. I am trying to create an API that exposes data to my mobile app that I am trying to build using Ionic framework.
I have a route setup like this
router.get('/api/jobs', (req, res) => {
JobModel.getAllJobsAsync().then((jobs) => res.json(jobs)); //IS THIS THe CORRECT WAY?
});
I have a function in my model that reads data from Mongodb. I am using the Bluebird promise library to convert my model functions to return promises.
const JobModel = Promise.promisifyAll(require('../models/Job'));
My function in the model
static getAllJobs(cb) {
MongoClient.connectAsync(utils.getConnectionString()).then((db) => {
const jobs = db.collection('jobs');
jobs.find().toArray((err, jobs) => {
if(err) {
return cb(err);
}
return cb(null, jobs);
});
});
}
The promisifyAll(myModule) converts this function to return a promise.
What I am not sure is,
If this is the correct approach for returning data to the route callback function from my model?
Is this efficient?
Using promisifyAll is slow? Since it loops through all functions in the module and creates a copy of the function with Async as suffix that now returns a promise. When does it actually run? This is a more generic question related to node require statements. See next point.
When do all require statements run? When I start the nodejs server? Or when I make a call to the api?
Your basic structure is more-or-less correct, although your use of Promise.promisifyAll seems awkward to me. The basic issue for me (and it's not really a problem - your code looks like it will work) is that you're mixing and matching promise-based and callback-based asynchronous code. Which, as I said, should still work, but I would prefer to stick to one as much as possible.
If your model class is your code (and not some library written by someone else), you could easily rewrite it to use promises directly, instead of writing it for callbacks and then using Promise.promisifyAll to wrap it.
Here's how I would approach the getAllJobs method:
static getAllJobs() {
// connect to the Mongo server
return MongoClient.connectAsync(utils.getConnectionString())
// ...then do something with the collection
.then((db) => {
// get the collection of jobs
const jobs = db.collection('jobs');
// I'm not that familiar with Mongo - I'm going to assume that
// the call to `jobs.find().toArray()` is asynchronous and only
// available in the "callback flavored" form.
// returning a new Promise here (in the `then` block) allows you
// to add the results of the asynchronous call to the chain of
// `then` handlers. The promise will be resolved (or rejected)
// when the results of the `job().find().toArray()` method are
// known
return new Promise((resolve, reject) => {
jobs.find().toArray((err, jobs) => {
if(err) {
reject(err);
}
resolve(jobs);
});
});
});
}
This version of getAllJobs returns a promise which you can chain then and catch handlers to. For example:
JobModel.getAllJobs()
.then((jobs) => {
// this is the object passed into the `resolve` call in the callback
// above. Do something interesting with it, like
res.json(jobs);
})
.catch((err) => {
// this is the error passed into the call to `reject` above
});
Admittedly, this is very similar to the code you have above. The only difference is that I dispensed with the use of Promise.promisifyAll - if you're writing the code yourself & you want to use promises, then do it yourself.
One important note: it's a good idea to include a catch handler. If you don't, your error will be swallowed up and disappear, and you'll be left wondering why your code is not working. Even if you don't think you'll need it, just write a catch handler that dumps it to console.log. You'll be glad you did!

How do I make a large but unknown number of REST http calls in nodejs?

I have an orientdb database. I want to use nodejs with RESTfull calls to create a large number of records. I need to get the #rid of each for some later processing.
My psuedo code is:
for each record
write.to.db(record)
when the async of write.to.db() finishes
process based on #rid
carryon()
I have landed in serious callback hell from this. The version that was closest used a tail recursion in the .then function to write the next record to the db. However, I couldn't carry on with the rest of the processing.
A final constraint is that I am behind a corporate proxy and cannot use any other packages without going through the network administrator, so using the native nodejs packages is essential.
Any suggestions?
With a completion callback, the general design pattern for this type of problem makes use of a local function for doing each write:
var records = ....; // array of records to write
var index = 0;
function writeNext(r) {
write.to.db(r, function(err) {
if (err) {
// error handling
} else {
++index;
if (index < records.length) {
writeOne(records[index]);
}
}
});
}
writeNext(records[0]);
The key here is that you can't use synchronous iterators like .forEach() because they won't iterate one at a time and wait for completion. Instead, you do your own iteration.
If your write function returns a promise, you can use the .reduce() pattern that is common for iterating an array.
var records = ...; // some array of records to write
records.reduce(function(p, r) {
return p.then(function() {
return write.to.db(r);
});
}, Promsise.resolve()).then(function() {
// all done here
}, function(err) {
// error here
});
This solution chains promises together, waiting for each one to resolve before executing the next save.
It's kinda hard to tell which function would be best for your scenario w/o more detail, but I almost always use asyncjs for this kind of thing.
From what you say, one way to do it would be with async.map:
var recordsToCreate = [...];
function functionThatCallsTheApi(record, cb){
// do the api call, then call cb(null, rid)
}
async.map(recordsToCreate, functionThatCallsTheApi, function(err, results){
// here, err will be if anything failed in any function
// results will be an array of the rids
});
You can also check out other ones to enable throttling, which is probablya good idea.

MongoDB NodeJS driver, how to know when .update() 's are complete

As the code is quite large to posted in here, I append my github repo https://github.com/DiegoGallegos4/Mongo
I am trying to use de NodeJS driver to update some records fulfilling a criteria but first I have to find some records fulfilling another criteria. On the update part, the records found and filter from the find operation are used. This is,
file: weather1.js
MongoClient.connect(some url, function(err,db){
db.collection(collection_name).find({},{},sort criteria).toArray(){
.... find the data and append to an array
.... this data inside a for loop
db.collection(collection_name).update(data[i], {$set...}, callback)
}
})
That´s the structure used to solve the problem, relating when to close the connection , it is when the length of the data array equals the number of callbacks on the update operation. For more details you can refer to the repo.
file: weather.js
On the other approach, Instead of toArray is used .each to iterate on the cursor.
I've looked up for a solution to this for a week now on several forums.
I've read about pooling connections but I want to know what is my conceptual error on my code. I would appreciate a deep insight on this topic.
The way you pose your question is very misleading. All you want to know is "When is the processing complete so I can close?".
The answer to that is you need to respect the callbacks generally only move through the cursor of results once each update is complete.
The simple way without other dependencies is to use the stream interface suported by the driver:
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect('mongodb://localhost:27017/data',function(err,db){
if(err) throw err;
coll = db.collection('weather');
console.log('connection established')
var stream = coll.find().sort([['State',1],['Temperature',-1]])
stream.on('err',function(err) {
throw err;
});
stream.on('end',function() {
db.close();
});
var month_highs = [];
var state = '';
var length = 0;
stream.on('data',function(doc) {
stream.pause(); // pause processing documents
if (err) throw err;
if (doc) {
length = month_highs.length
if(state != doc['State']){
month_highs.push(doc['State']);
//console.log(doc);
}
state = doc['State']
if(month_highs.length > length){
coll.update(doc, {$set : {'month_high':true} }, function(err, updated){
if (err) throw err;
console.log(updated)
stream.resume(); // resume processing documents
});
} else {
stream.resume();
}
} else {
stream.resume();
}
});
});
That's just a copy of the code from your repo, refactored to use a stream. So all the important parts are where the word "stream" appears, and most importantly where they are being called.
In a nutshell the "data" event is emitted by each document from the cursor results. First you call .pause() so new documents do not overrun the processing. Then you do your .update() and within it's callback on return you call .resume(), and the flow continues with the next document.
Eventually "end" is emitted when the cursor is depleted, and that is where you call db.close().
That is basic flow control. For other approaches, look at the node async library as a good helper. But do not loop arrays with no async control, and do not use .each() which is DEPRECATED.
You need to signal when the .update() callback is complete to follow a new "loop iteration" at any rate. This is the basic no additional dependancy approach.
P.S I am a bit suspect about the general logic of your code, especially testing if the length of something is greater when you read it without possibly changing that length. But this is all about how to implement "flow control", and not to fix the logic in your code.

Block function whilst waiting for response

I've got a NodeJS app i'm building (using Sails, but i guess that's irrelevant).
In my action, i have a number of requests to other services, datasources etc that i need to load up. However, because of the huge dependency on callbacks, my code is still executing long after the action has returned the HTML.
I must be missing something silly (or not quite getting the whole async thing) but how on earth do i stop my action from finishing until i have all my data ready to render the view?!
Cheers
I'd recommend getting very intimate with the async library
The docs are pretty good with that link above, but it basically boils down to a bunch of very handy calls like:
async.parallel([
function(){ ... },
function(){ ... }
], callback);
async.series([
function(){ ... },
function(){ ... }
]);
Node is inherently async, you need to learn to love it.
It's hard to tell exactly what the problem is but here is a guess. Assuming you have only one external call your code should look like this:
exports.myController = function(req, res) {
longExternalCallOne(someparams, function(result) {
// you must render your view inside the callback
res.render('someview', {data: result});
});
// do not render here as you don't have the result yet.
}
If you have more than two external calls your code will looks like this:
exports.myController = function(req, res) {
longExternalCallOne(someparams, function(result1) {
longExternalCallTwo(someparams, function(result2) {
// you must render your view inside the most inner callback
data = {some combination of result1 and result2};
res.render('someview', {data: data });
});
// do not render here since you don't have result2 yet
});
// do not render here either as you don't have neither result1 nor result2 yet.
}
As you can see, once you have more than one long running async call things start to get tricky. The code above is just for illustration purposes. If your second callback depends on the first one then you need something like it, but if longExternalCallOne and longExternalTwo are independent of each other you should be using a library like async to help parallelize the requests https://github.com/caolan/async
You cannot stop your code. All you can do is check in all callbacks if everything is completed. If yes, go on with your code. If no, wait for the next callback and check again.
You should not stop your code, but rather render your view in your other resources callback, so you wait for your resource to be reached before rendering. That's the common pattern in node.js.
If you have to wait for several callbacks to be called, you can check manually each time one is called if the others have been called too (with simple bool for example), and call your render function if yes. Or you can use async or other cool libraries which will make the task easier. Promises (with the bluebird library) could be an option too.
I am guessing here, since there is no code example, but you might be running into something like this:
// let's say you have a function, you pass it an argument and callback
function myFunction(arg, callback) {
// now you do something asynchronous with the argument
doSomethingAsyncWithArg(arg, function() {
// now you've got your arg formatted or whatever, render result
res.render('someView', {arg: arg});
// now do the callback
callback();
// but you also have stuff here!
doSomethingElse();
});
});
So, after you render, your code keeps running. How to prevent it? return from there.
return callback();
Now your inner function will stop processing after it calls callback.

Resources