Using node.js and promise to fetch paginated data - node.js

Please keep in mind that I am new to node.js and I am used with android development.
My scenario is like this:
Run a query against the database that returns either null or a value
Call a web service with that database value, that offers info paginated, meaning that on a call I get a parameter to pass for the next call if there is more info to fetch.
After all the items are retrieved, store them in a database table
If everything is well, for each item received previously, I need to make another web call and store the retrieved info in another table
if fetching any of the data set fails, all data must be reverted from the database
So far, I've tried this:
getAllData: function(){
self.getMainWebData(null)
.then(function(result){
//get secondary data for each result row and insert it into database
}
}
getMainWebData: function(nextPage){
return new Promise(function(resolve, reject) {
module.getWebData(nextPage, function(errorReturned, response, values) {
if (errorReturned) {
reject(errorReturned);
}
nextPage = response.nextPageValue;
resolve(values);
})
}).then(function(result) {
//here I need to insert the returned values in database
//there's a new page, so fetch the next set of data
if (nextPage) {
//call again getMainWebData?
self.getMainWebData(nextPage)
}
})
There are a few things missing, from what I've tested, getAllData.then fires only one for the first set of items and not for others, so clearly handling the returned data in not right.
LATER EDIT: I've edited the scenario. Given some more research my feeling is that I could use a chain or .then() to perform the operations in a sequence.

Yes it is happening as you are resolving the promise on the first call itself. You should put resolve(value) inside an if statement which checks if more data is needed to be fetched. You will also need to restructure the logic as node is asynchronous. And the above code will not work unless you do change the logic.
Solution 1:
You can either append the paginated response to another variable outside the context of the calls you are making. And later use that value after you are done with the response.
getAllData: function(){
self.getMainWebData(null)
.then(function(result){
// make your database transaction if result is not an error
}
}
function getList(nextpage, result, callback){
module.getWebData(nextPage, function(errorReturned, response, values) {
if(errorReturned)
callback(errorReturned);
result.push(values);
nextPage = response.nextPageValue;
if(nextPage)
getList(nextPage, result, callback);
else
callback(null, result);
})
}
getMainWebData: function(nextPage){
return new Promise(function(resolve, reject) {
var result = [];
getList(nextpage, result, function(err, results){
if(err)
reject(err);
else{
// Here all the items are retrieved, you can store them in a database table
// for each item received make your web call and store it into another variable or result set
// suggestion is to make the database transaction only after you have retrieved all your data
// other wise it will include database rollback which will depend on the database which you are using
// after all this is done resolve the promise with the returning value
resolve(results);
}
});
})
}
I have not tested it but something like this should work. If problem persists let me know in comments.
Solution 2:
You can remove promises and try the same thing with callback as they are easier to follow and will make sense to the programmers who are familiar with structural languages.

Looking at your problem, I have created a code that would loop through promises.
and would only procede if there is more data to be fetched, the stored data would still be available in an array.
I hope this help. Dont forget to mark if it helps.
let fetchData = (offset = 0, limit= 10) => {
let addresses = [...Array(100).keys()];
return Promise.resolve(addresses.slice(offset, offset + limit))
}
// o => offset & l => limit
let o = 0, l = 10;
let results = [];
let process = p => {
if (!p) return p;
return p.then(data => {
// Process with data here;
console.log(data);
// increment the pagination
o += l;
results = results.concat(data);
// while there is data equal to limit set then fetch next page
// otherwise return the collected result
return (data.length == l)? process(fetchAddress(o, l)).then(data => data) : results;
})
}
process(fetchAddress(o, l))
.then(data => {
// All the fetched data will be here
}).catch(err => {
// Handle Error here.
// All the retrieved data from database will be available in "results" array
});
if You want to do it more often I have also created a gist for reference.
If You dont want to use any global variable, and want to do it in very functional way. You can check this example. However it requires little more complication.

Related

How to get a specific item from JSON

I'm new to node.js and DialogFlow. I am using DynamoDB to store data and I'm creating skills on Google. I'm trying to write a code to retrieve a specific item on that table.
I've got it working to show all items where ID is equal = 1, but how would I make it so I can just get attribute 'name'?
My idea is a user provides an id then the code will retrieve name where id was 1 and store that as a variable and use agent.add('hello $(name)'); to display it back as speech to the user.
function readdata(agent){
let dbread = new aws.DynamoDB.DocumentClient();
const id = agent.parameters.id;
let read = function(){
var parameters = {
TableName:"Dynamodb",
Key:{
"id":id
}
};
dbread.get(parameters, function(err,data){
if(err){
console.log("error",JSON.stringify(data,null,2));
}else{
console.log("success",JSON.stringify(data,null,2));
}
});
agent.add(`hello ${name}`);
};
read();
}
Once you have the data back from the get() call, the data object will contain an Item attribute. The value of this attribute will be another object that contains the attribute/value pairs for this record, or be empty if the record isn't found.
The debugging you have in place that shows JSON.stringify(data) should show this.
Assuming you knew all the fields were there, you could do something like
const name = data.Item.name;
a more robust way using current JavaScript would be to make sure everything was assigned, otherwise return undefined at any point. So something like this would work
const name = data && data.Item && data.Item.name;
However - you will have a problem doing this with Dialogflow
You don't show which Dialogflow library you're using, but most of them require you to return a Promise to indicate that it needs to wait for asynchronous calls (such as the call to DynamoDB) to complete. You're using get() with a callback function instead of a Promise. So you need to do one of the following:
Wrap the call in a Promise
Since get() returns an AWS.Request you can use the promise() method of this to get a Promise that you can return and which has then portions that generate the response - similar to how you're doing your callbacks now.
Under this scheme, your call might look something like this (untested):
return dbread.get(parameters).promise()
.then( data => {
console.log("success",JSON.stringify(data,null,2));
const name = data && data.Item && data.Item.name;
if( name ){
agent.add( `Hello ${name}` );
} else {
agent.add( "I don't know who you are." );
}
})
.catch( err => {
console.log("error",JSON.stringify(data,null,2));
agent.add( "There was an error" );
});

NodeJS wait for variable to be assigned from sql.get before continuing

I know the sqlite (not sqlite3) module does requests in batches. I need to grab a value out of a database, assign it, then do some processing with it. However the function that fetches the value from the database isn't returning until after all the processing has taken place.
I need one of 3 things to happen:
The function to return right away
Halt the code until myEvent.id has been assigned
Make sql.get return right away
myEvent.id = generateEventID();
///do stuff with myEvent.id
function generateEventID() {
return sql.get('SELECT * FROM settings WHERE name = "eventID"').then(row => {
if (!row) return message.reply("a database error occured while generating an ID");
currentID = row.intValue + 1;
console.log("New eventID created: " + currentID);
sql.run(`UPDATE settings SET intValue = ${row.intValue + 1} WHERE name = "eventID"`);
return currentID;
});
}
Unfortunately you can't return an Id straight away from the function because sql.get(...) is asynchronous.
One possible way to get around this would be to pass in a callback function as a parameter to generateEventID() and then make the caller to wait for the callback function to be invoked before progressing further.
Therefore your would look like this.
var doStuffWithEventId = function(eventId) {
// Do more stuff.
}
generateEventID(doStuffWithEventId);
function generateEventID(callback) {
return sql.get('SELECT * FROM settings WHERE name = "eventID"').then(row => {
if (!row) return message.reply("a database error occured while generating an ID");
currentID = row.intValue + 1;
console.log("New eventID created: " + currentID);
sql.run(`UPDATE settings SET intValue = ${row.intValue + 1} WHERE name = "eventID"`);
return callback(currentID); // Call function doStuffWithEventId with value of currentID variable
});
}
This is one way of accomplishing what you're trying to do.
Side note here.
You generally would wait for the update operation to go through before returning anything.
You are anticipating that the Id would always be an incremental of 1. Databases don't always do that. So you can get caught off guard.
Your code assumes that the database operations are going to be always successful. You typically would want to handle potential exception that may raised when making external calls.
As you already said, your code should provide three main functionalities in a particular order:
Obtain a result from your data storage (SQL database in this case).
Perform a particular operation over the data obtained (referred as
//do more stuffin your sniippet)
Perform an UPDATE query over your data storage.
Therefore, the implementation logic you provide is wrong as it fails to perform the three operations in that particular order. Don't forget that all Node calls are asynchronous by default and, in this case, you are failing to enforce the synchronization you need. To fix this, you should perform the following steps:
Obtain a result from your data storage. (Already done).
Perform a particular operation over the data obtained. This must be done inside the sql.get() callback or once the promise has resolved (since you are using promises in this case).
The method that performs operation over your data (i.e step 2) should return a new promise. Once this promise is resolved you will be able to call your UPDATE operation and in this way you will ensure that your implementation will meet all your requirements.
Now, this is how I would do it:
generateEventID();
function doMoreStuff(eventId){
return new Promise(function(resolve, reject){
//do your stuff
if(err){
reject(err)
}else{
resolve()
}
})
}
function generateEventID() {
return sql.get('SELECT * FROM settings WHERE name = "eventID"').then(row => {
if (!row) return message.reply("a database error occured while generating an ID");
currentID = row.intValue + 1;
console.log("New eventID created: " + currentID);
doMoreStuff(currentID).then(function(){ //At this point your operations with currentId will have finished already
sql.run(`UPDATE settings SET intValue = ${row.intValue + 1} WHERE name = "eventID"`);
});
});
}
I really hope this helps!
Edit: Don't forget to implement the error handling for promises with .catch()

Google Cloud Datastore, how to query for more results

Straight and simple, I have the following function, using Google Cloud Datastore Node.js API:
fetchAll(query, result=[], queryCursor=null) {
this.debug(`datastoreService.fetchAll, queryCursor=${queryCursor}`);
if (queryCursor !== null) {
query.start(queryCursor);
}
return this.datastore.runQuery(query)
.then( (results) => {
result=result.concat(results[0]);
if (results[1].moreResults === _datastore.NO_MORE_RESULTS) {
return result;
} else {
this.debug(`results[1] = `, results[1]);
this.debug(`fetch next with queryCursor=${results[1].endCursor}`);
return this.fetchAll(query, result, results[1].endCursor);
}
});
}
The Datastore API object is in the variable this.datastore;
The goal of this function is to fetch all results for a given query, notwithstanding any limits on the number of items returned per single runQuery call.
I have not yet found out about any definite hard limits imposed by the Datastore API on this, and the documentation seems somewhat opaque on this point, but I only noticed that I always get
results[1] = { moreResults: 'MORE_RESULTS_AFTER_LIMIT' },
indicating that there are still more results to be fetched, and the results[1].endCursor remains stuck on constant value that is passed on again on each iteration.
So, given some simple query that I plug into this function, I just go on running the query iteratively, setting the query start cursor (by doing query.start(queryCursor);) to the endCursor obtained in the result of the previous query. And my hope is, obviously, to obtain the next bunch of results on each successive query in this iteration. But I always get the same value for results[1].endCursor. My question is: Why?
Conceptually, I cannot see a difference to this example given in the Google Documentation:
// By default, google-cloud-node will automatically paginate through all of
// the results that match a query. However, this sample implements manual
// pagination using limits and cursor tokens.
function runPageQuery (pageCursor) {
let query = datastore.createQuery('Task')
.limit(pageSize);
if (pageCursor) {
query = query.start(pageCursor);
}
return datastore.runQuery(query)
.then((results) => {
const entities = results[0];
const info = results[1];
if (info.moreResults !== Datastore.NO_MORE_RESULTS) {
// If there are more results to retrieve, the end cursor is
// automatically set on `info`. To get this value directly, access
// the `endCursor` property.
return runPageQuery(info.endCursor)
.then((results) => {
// Concatenate entities
results[0] = entities.concat(results[0]);
return results;
});
}
return [entities, info];
});
}
(except for the fact, that I don't specify a limit on the size of the query result by myself, which I have also tried, by setting it to 1000, which does not change anything.)
Why does my code run into this infinite loop, stuck on each step at the same "endCursor"? And how do I correct this?
Also, what is the hard limit on the number of results obtained per call of datastore.runQuery()? I have not found this information in the Google Datastore documentation thus far.
Thanks.
Looking at the API documentation for the Node.js client library for Datastore there is a section on that page titled "Paginating Records" that may help you. Here's a direct copy of the code snippet from the section:
var express = require('express');
var app = express();
var NUM_RESULTS_PER_PAGE = 15;
app.get('/contacts', function(req, res) {
var query = datastore.createQuery('Contacts')
.limit(NUM_RESULTS_PER_PAGE);
if (req.query.nextPageCursor) {
query.start(req.query.nextPageCursor);
}
datastore.runQuery(query, function(err, entities, info) {
if (err) {
// Error handling omitted.
return;
}
// Respond to the front end with the contacts and the cursoring token
// from the query we just ran.
var frontEndResponse = {
contacts: entities
};
// Check if more results may exist.
if (info.moreResults !== datastore.NO_MORE_RESULTS) {
frontEndResponse.nextPageCursor = info.endCursor;
}
res.render('contacts', frontEndResponse);
});
});
Maybe you can try using one of the other syntax options (instead of Promises). The runQuery method can take a callback function as an argument, and that callback's parameters include explicit references to the entities array and the info object (which has the endCursor as a property).
And there are limits and quotas imposed on calls to the Datastore API as well. Here are links to official documentation that address them in detail:
Limits
Quotas

Node.js: async.map getting slower

Hello,
I use Node.js to provide an API for storing data on a MongoDB database.
I ran multiple tests on a read method, which takes ids and returns the corresponding documents. The point is that I must return these documents in the specified order. To ensure that, I use the following code:
// Sequentially fetch every element
function read(ids, callback) {
var i = 0;
var results = [];
function next() {
db.findOne(ids[i], function (err, doc) {
results.push(err ? null : doc);
if (ids.length > ++i) {
return next();
}
callback(results);
});
}
next();
}
This way, documents are fetched one-by-one, in the right order. It takes about 11s on my laptop to retrieve 27k documents.
However, I thought that it was possible to improve this method:
// Asynchronously map the whole array
var async = require('async');
function read(ids, callback) {
async.map(ids, db.findOne.bind(db), callback):
}
After running a single test, I was quite satisfied seeing that the 27k documents were retrieved in only 8s using simpler code.
The problem happens when I repeat the same request: the response time keeps growing (proportionally to the number of elements retrieved): 9s 10s 11s 12s.... This problem does not happen in the sequential version.
I tried two versions of Node.js, v6.2.0 and v0.10.29. The problem is the same. What causes this latency and how could I suppress it?
Try to use async.mapLimit to prevent overload. You need some tests to tune limit value with your environment.
But find({_id: {$in: list}}) is always better, because single database request instead of multiple.
I suggest you to try to perform restore of original order client-side.
Something like this:
function read(ids, cb) {
db.find(
{_id: {$in: ids.map(id => mongoose.Types.ObjectId(id))}},
process
);
function process(err, docs) {
if (err) return cb(err);
return cb(null, docs.sort(ordering))
}
function ordering(a, b) {
return ids.indexOf(b._id.toString()) - ids.indexOf(a._id.toString());
}
}
May be, find query needs to be corrected, I can't to know what exact mongodb driver you use.
This code is first-try, more manual sorting can improve performance alot. [].indexOf is heavy too(O(n)).
But I'm almost sure, even as-is now, it will work much faster.
Possible ordering replacement:
var idHash = {};
for(var i = 0; i < ids.length; i++)
idHash[ids[i]] = i;
function ordering(a, b) {
return idHash[b._id.toString()] - idHash[a._id.toString()];
}
Any sort algorithm has O(nlogn) in best case, but we already know result position of each found document, so, we can restore original order by O(n):
var idHash = ids.reduce((c, id, i) => (c[id] = i, c), {});
function process(err, docs) {
if (err) return cb(err);
return cb(null,
docs.reduce(
(c, doc) => (c[idHash[doc._id.toString()]] = doc, c),
ids.map(id => null))) //fill not_found docs by null
}
Functional style makes code flexier. For example this code can be easy modified to use async.reduce to be less sync-blocking.

Iterate through Array, update/create Objects asynchronously, when everything is done call callback

I have a problem, but I have no idea how would one go around this.
I'm using loopback, but I think I would've face the same problem in mongodb sooner or later. Let me explain what am I doing:
I fetch entries from another REST services, then I prepare entries for my API response (entries are not ready yet, because they don't have id from my database)
Before I send response I want to check if entry exist in database, if it doesn't:
Create it, if it does (determined by source_id):
Use it & update it to newer version
Send response with entries (entries now have database ids assigned to them)
This seems okay, and easy to implement but it's not as far as my knowledge goes. I will try to explain further in code:
//This will not work since there are many async call, and fixedResults will be empty at the end
var fixedResults = [];
//results is array of entries
results.forEach(function(item) {
Entry.findOne({where: {source_id: item.source_id}}, functioN(err, res) {
//Did we find it in database?
if(res === null) {
//Create object, another async call here
fixedResults.push(newObj);
} else {
//Update object, another async call here
fixedResults.push(updatedObj);
}
});
});
callback(null, fixedResults);
Note: I left some of the code out, but I think its pretty self explanatory if you read through it.
So I want to iterate through all objects, create or update them in database, then when all are updated/created, use them. How would I do this?
You can use promises. They are callbacks that will be invoked after some other condition has completed. Here's an example of chaining together promises https://coderwall.com/p/ijy61g.
The q library is a good one - https://github.com/kriskowal/q
This question how to use q.js promises to work with multiple asynchronous operations gives a nice code example of how you might build these up.
This pattern is generically called an 'async map'
var fixedResults = [];
var outstanding = 0;
//results is array of entries
results.forEach(function(item, i) {
Entry.findOne({where: {source_id: item.source_id}}, functioN(err, res) {
outstanding++;
//Did we find it in database?
if(res === null) {
//Create object, another async call here
DoCreateObject(function (err, result) {
if (err) callback(err);
fixedResults[i] = result;
if (--outstanding === 0) callback (null, fixedResults);
});
} else {
//Update object, another async call here
DoOtherCall(function (err, result) {
if(err) callback(err);
fixedResults[i] = result;
if (--outstanding === 0) callback (null, fixedResults);
});
}
});
});
callback(null, fixedResults);
You could use async.map for this. For each element in the array, run the array iterator function doing what you want to do to each element, then run the callback with the result (instead of fixedResults.push), triggering the map callback when all are done. Each iteration ad database call would then be run in parallel.
Mongo has a function called upsert.
http://docs.mongodb.org/manual/reference/method/db.collection.update/
It does exactly what you ask for without needing the checks. You can fire all three requests asnc and just validate the result comes back as true. No need for additional processing.

Resources