Google Cloud Datastore, how to query for more results

Google Cloud Datastore, how to query for more results - node.js

Straight and simple, I have the following function, using Google Cloud Datastore Node.js API:
fetchAll(query, result=[], queryCursor=null) {
this.debug(`datastoreService.fetchAll, queryCursor=${queryCursor}`);
if (queryCursor !== null) {
query.start(queryCursor);
}
return this.datastore.runQuery(query)
.then( (results) => {
result=result.concat(results[0]);
if (results[1].moreResults === _datastore.NO_MORE_RESULTS) {
return result;
} else {
this.debug(`results[1] = `, results[1]);
this.debug(`fetch next with queryCursor=${results[1].endCursor}`);
return this.fetchAll(query, result, results[1].endCursor);
}
});
}
The Datastore API object is in the variable this.datastore;
The goal of this function is to fetch all results for a given query, notwithstanding any limits on the number of items returned per single runQuery call.
I have not yet found out about any definite hard limits imposed by the Datastore API on this, and the documentation seems somewhat opaque on this point, but I only noticed that I always get
results[1] = { moreResults: 'MORE_RESULTS_AFTER_LIMIT' },
indicating that there are still more results to be fetched, and the results[1].endCursor remains stuck on constant value that is passed on again on each iteration.
So, given some simple query that I plug into this function, I just go on running the query iteratively, setting the query start cursor (by doing query.start(queryCursor);) to the endCursor obtained in the result of the previous query. And my hope is, obviously, to obtain the next bunch of results on each successive query in this iteration. But I always get the same value for results[1].endCursor. My question is: Why?
Conceptually, I cannot see a difference to this example given in the Google Documentation:
// By default, google-cloud-node will automatically paginate through all of
// the results that match a query. However, this sample implements manual
// pagination using limits and cursor tokens.
function runPageQuery (pageCursor) {
let query = datastore.createQuery('Task')
.limit(pageSize);
if (pageCursor) {
query = query.start(pageCursor);
}
return datastore.runQuery(query)
.then((results) => {
const entities = results[0];
const info = results[1];
if (info.moreResults !== Datastore.NO_MORE_RESULTS) {
// If there are more results to retrieve, the end cursor is
// automatically set on `info`. To get this value directly, access
// the `endCursor` property.
return runPageQuery(info.endCursor)
.then((results) => {
// Concatenate entities
results[0] = entities.concat(results[0]);
return results;
});
}
return [entities, info];
});
}
(except for the fact, that I don't specify a limit on the size of the query result by myself, which I have also tried, by setting it to 1000, which does not change anything.)
Why does my code run into this infinite loop, stuck on each step at the same "endCursor"? And how do I correct this?
Also, what is the hard limit on the number of results obtained per call of datastore.runQuery()? I have not found this information in the Google Datastore documentation thus far.
Thanks.

Looking at the API documentation for the Node.js client library for Datastore there is a section on that page titled "Paginating Records" that may help you. Here's a direct copy of the code snippet from the section:
var express = require('express');
var app = express();
var NUM_RESULTS_PER_PAGE = 15;
app.get('/contacts', function(req, res) {
var query = datastore.createQuery('Contacts')
.limit(NUM_RESULTS_PER_PAGE);
if (req.query.nextPageCursor) {
query.start(req.query.nextPageCursor);
}
datastore.runQuery(query, function(err, entities, info) {
if (err) {
// Error handling omitted.
return;
}
// Respond to the front end with the contacts and the cursoring token
// from the query we just ran.
var frontEndResponse = {
contacts: entities
};
// Check if more results may exist.
if (info.moreResults !== datastore.NO_MORE_RESULTS) {
frontEndResponse.nextPageCursor = info.endCursor;
}
res.render('contacts', frontEndResponse);
});
});
Maybe you can try using one of the other syntax options (instead of Promises). The runQuery method can take a callback function as an argument, and that callback's parameters include explicit references to the entities array and the info object (which has the endCursor as a property).
And there are limits and quotas imposed on calls to the Datastore API as well. Here are links to official documentation that address them in detail:
Limits
Quotas

Related

Poor Performance Writing to Firebase Realtime Database from Google Cloud Function

I have been developing a game where on the user submitting data, the client writes some data to a Firebase Realtime Database. I then have a Google Cloud Function which is triggered onUpdate. That function checks the submissions from various players in a particular game and if certain criteria are met, the function writes an update to the DB which causes the clients to move to the next round of the game.
This is all working, however, I have found performance to be quite poor.
I've added logging to the function and can see that the function takes anywhere from 2-10ms to complete, which is acceptable. The issue is that the update is often written anywhere from 10-30 SECONDS after the function has returned the update.
To determine this, my function obtains the UTC epoch timestamp from just before writing the update, and stores this as a key with the value being the Firebase server timestamp.
I then have manually checked the two timestamps to arrive at the time between return and database write:
The strange thing is that I have another cloud function which is triggered by an HTTP request, and found that the update from that function is typically from 0.5-2 seconds after the function calls the DB update() API.
The difference between these two functions, aside from how they are triggered, is how the data is written back to the DB.
The onUpdate() function writes data by returning:
return after.ref.update(updateToWrite);
Whereas the HTTP request function writes data by calling the update API:
dbRef.update({
// object to write
});
I've provided a slightly stripped out version of my onUpdate function here (same structure but sanitised function names) :
exports.onGameUpdate = functions.database.ref("/games/{gameId}")
.onUpdate(async(snapshot, context) => {
console.time("onGameUpdate");
let id = context.params.gameId;
let after = snapshot.after;
let updatedSnapshot = snapshot.after.val();
if (updatedSnapshot.serverShouldProcess && !isFinished) {
processUpdate().then((res)=>{
// some logic to check res, and if criteria met, move on to promises to determine update data
determineDataToWrite().then((updateToWrite)=>{
// get current time
var d = new Date();
let triggerTime = d.getTime();
// I use triggerTime as the key, and firebase.database.ServerValue.TIMESTAMP as the value
if(updateToWrite["timestamps"] !== null && updateToWrite["timestamps"] !== undefined){
let timestampsCopy = updateToWrite["timestamps"];
timestampsCopy[triggerTime] = firebase.database.ServerValue.TIMESTAMP;
updateToWrite["timestamps"][triggerTime] = firebase.database.ServerValue.TIMESTAMP;
}else{
let timestampsObj = {};
timestampsObj[triggerTime] = firebase.database.ServerValue.TIMESTAMP;
updateToWrite["timestamps"] = timestampsObj;
}
// write the update to the database
return after.ref.update(updateToWrite);
}).catch((error)=>{
// error handling
})
}
// this is just here because ES Linter complains if there's no return
return null;
})
.catch((error) => {
// error handling
})
}
});
I'd appreciate any help! Thanks :)

Compare API response against itself

I am trying to:
Poll a public API every 5 seconds
Store the resulting JSON in a variable
Store the next query to this same API in a second variable
Compare the first variable to the second
Print the second variable if it is different from the first
Else: Print the phrase: 'The objects are the same' if they haven't changed
Unfortunately, the comparison part appears to fail. I am realizing that this implementation is probably lacking the appropriate variable scoping but I can't put my finger on it. Any advice would be highly appreciated.
data: {
chatters: {
viewers: {
},
},
},
};
//prints out pretty JSON
function prettyJSON(obj) {
console.log(JSON.stringify(obj, null, 2));
}
// Gets Users from Twitch API endpoint via axios request
const getUsers = async () => {
try {
return await axios.get("http://tmi.twitch.tv/group/user/sixteenbitninja/chatters");
} catch (error) {
console.error(error);
}
};
//Intended to display
const displayViewers = async (previousResponse) => {
const usersInChannel = await getUsers();
if (usersInChannel.data.chatters.viewers === previousResponse){
console.log("The objects are the same");
} else {
if (usersInChannel.data.chatters) {
prettyJSON(usersInChannel.data.chatters.viewers);
const previousResponse = usersInChannel.data.chatters.viewers;
console.log(previousResponse);
intervalFunction(previousResponse);
}
}
};
// polls display function every 5 seconds
const interval = setInterval(function () {
// Calls Display Function
displayViewers()
}, 5000);```

The issue is that you are using equality operator === on objects. two objects are equal if they have the same reference. While you want to know if they are identical. Check this:
console.log({} === {})
For your usecase you might want to store stringified version of the previousResponse and compare it with stringified version of the new object (usersInChannel.data.chatters.viewers) like:
console.log(JSON.stringify({}) === JSON.stringify({}))
Note: There can be issues with this approach too, if the order of property changes in the response. In which case, you'd have to check individual properties within the response objects.

May be you can use npm packages like following
https://www.npmjs.com/package/#radarlabs/api-diff

Using node.js and promise to fetch paginated data

Please keep in mind that I am new to node.js and I am used with android development.
My scenario is like this:
Run a query against the database that returns either null or a value
Call a web service with that database value, that offers info paginated, meaning that on a call I get a parameter to pass for the next call if there is more info to fetch.
After all the items are retrieved, store them in a database table
If everything is well, for each item received previously, I need to make another web call and store the retrieved info in another table
if fetching any of the data set fails, all data must be reverted from the database
So far, I've tried this:
getAllData: function(){
self.getMainWebData(null)
.then(function(result){
//get secondary data for each result row and insert it into database
}
}
getMainWebData: function(nextPage){
return new Promise(function(resolve, reject) {
module.getWebData(nextPage, function(errorReturned, response, values) {
if (errorReturned) {
reject(errorReturned);
}
nextPage = response.nextPageValue;
resolve(values);
})
}).then(function(result) {
//here I need to insert the returned values in database
//there's a new page, so fetch the next set of data
if (nextPage) {
//call again getMainWebData?
self.getMainWebData(nextPage)
}
})
There are a few things missing, from what I've tested, getAllData.then fires only one for the first set of items and not for others, so clearly handling the returned data in not right.
LATER EDIT: I've edited the scenario. Given some more research my feeling is that I could use a chain or .then() to perform the operations in a sequence.

Yes it is happening as you are resolving the promise on the first call itself. You should put resolve(value) inside an if statement which checks if more data is needed to be fetched. You will also need to restructure the logic as node is asynchronous. And the above code will not work unless you do change the logic.
Solution 1:
You can either append the paginated response to another variable outside the context of the calls you are making. And later use that value after you are done with the response.
getAllData: function(){
self.getMainWebData(null)
.then(function(result){
// make your database transaction if result is not an error
}
}
function getList(nextpage, result, callback){
module.getWebData(nextPage, function(errorReturned, response, values) {
if(errorReturned)
callback(errorReturned);
result.push(values);
nextPage = response.nextPageValue;
if(nextPage)
getList(nextPage, result, callback);
else
callback(null, result);
})
}
getMainWebData: function(nextPage){
return new Promise(function(resolve, reject) {
var result = [];
getList(nextpage, result, function(err, results){
if(err)
reject(err);
else{
// Here all the items are retrieved, you can store them in a database table
// for each item received make your web call and store it into another variable or result set
// suggestion is to make the database transaction only after you have retrieved all your data
// other wise it will include database rollback which will depend on the database which you are using
// after all this is done resolve the promise with the returning value
resolve(results);
}
});
})
}
I have not tested it but something like this should work. If problem persists let me know in comments.
Solution 2:
You can remove promises and try the same thing with callback as they are easier to follow and will make sense to the programmers who are familiar with structural languages.

Looking at your problem, I have created a code that would loop through promises.
and would only procede if there is more data to be fetched, the stored data would still be available in an array.
I hope this help. Dont forget to mark if it helps.
let fetchData = (offset = 0, limit= 10) => {
let addresses = [...Array(100).keys()];
return Promise.resolve(addresses.slice(offset, offset + limit))
}
// o => offset & l => limit
let o = 0, l = 10;
let results = [];
let process = p => {
if (!p) return p;
return p.then(data => {
// Process with data here;
console.log(data);
// increment the pagination
o += l;
results = results.concat(data);
// while there is data equal to limit set then fetch next page
// otherwise return the collected result
return (data.length == l)? process(fetchAddress(o, l)).then(data => data) : results;
})
}
process(fetchAddress(o, l))
.then(data => {
// All the fetched data will be here
}).catch(err => {
// Handle Error here.
// All the retrieved data from database will be available in "results" array
});
if You want to do it more often I have also created a gist for reference.
If You dont want to use any global variable, and want to do it in very functional way. You can check this example. However it requires little more complication.

Node.js: async.map getting slower

Hello,
I use Node.js to provide an API for storing data on a MongoDB database.
I ran multiple tests on a read method, which takes ids and returns the corresponding documents. The point is that I must return these documents in the specified order. To ensure that, I use the following code:
// Sequentially fetch every element
function read(ids, callback) {
var i = 0;
var results = [];
function next() {
db.findOne(ids[i], function (err, doc) {
results.push(err ? null : doc);
if (ids.length > ++i) {
return next();
}
callback(results);
});
}
next();
}
This way, documents are fetched one-by-one, in the right order. It takes about 11s on my laptop to retrieve 27k documents.
However, I thought that it was possible to improve this method:
// Asynchronously map the whole array
var async = require('async');
function read(ids, callback) {
async.map(ids, db.findOne.bind(db), callback):
}
After running a single test, I was quite satisfied seeing that the 27k documents were retrieved in only 8s using simpler code.
The problem happens when I repeat the same request: the response time keeps growing (proportionally to the number of elements retrieved): 9s 10s 11s 12s.... This problem does not happen in the sequential version.
I tried two versions of Node.js, v6.2.0 and v0.10.29. The problem is the same. What causes this latency and how could I suppress it?

Try to use async.mapLimit to prevent overload. You need some tests to tune limit value with your environment.
But find({_id: {$in: list}}) is always better, because single database request instead of multiple.
I suggest you to try to perform restore of original order client-side.
Something like this:
function read(ids, cb) {
db.find(
{_id: {$in: ids.map(id => mongoose.Types.ObjectId(id))}},
process
);
function process(err, docs) {
if (err) return cb(err);
return cb(null, docs.sort(ordering))
}
function ordering(a, b) {
return ids.indexOf(b._id.toString()) - ids.indexOf(a._id.toString());
}
}
May be, find query needs to be corrected, I can't to know what exact mongodb driver you use.
This code is first-try, more manual sorting can improve performance alot. [].indexOf is heavy too(O(n)).
But I'm almost sure, even as-is now, it will work much faster.
Possible ordering replacement:
var idHash = {};
for(var i = 0; i < ids.length; i++)
idHash[ids[i]] = i;
function ordering(a, b) {
return idHash[b._id.toString()] - idHash[a._id.toString()];
}
Any sort algorithm has O(nlogn) in best case, but we already know result position of each found document, so, we can restore original order by O(n):
var idHash = ids.reduce((c, id, i) => (c[id] = i, c), {});
function process(err, docs) {
if (err) return cb(err);
return cb(null,
docs.reduce(
(c, doc) => (c[idHash[doc._id.toString()]] = doc, c),
ids.map(id => null))) //fill not_found docs by null
}
Functional style makes code flexier. For example this code can be easy modified to use async.reduce to be less sync-blocking.

collection.all().limit(n) not working in FOXX

I have this code inside my FOXX play application
var geodata = new Geodata(
applicationContext.collection('geodata'),
{model: Geodatum}
);
/** Lists of all geodata.
*
* This function simply returns the list of all Geodatum.
*/
controller.get('/', function (req, res) {
var parameters = req.parameters;
var limit = parameters['limit'];
var result = geodata.all().limit(10);
if (limit != "undefined") {
result = result.slice(0, limit);
}
res.json(_.map(result, function (model) {
return model.forClient();
}));
});
According to the docs I should be able to use pagination here - I want to limit the search results by the given 'limit' parameter but this gives me an error
2016-05-16T14:17:58Z [6354] ERROR TypeError: geodata.all(...).limit is not a function
https://docs.arangodb.com/SimpleQueries/Pagination.html

The documentation refers to collections. You seem to be using a Foxx repository. Foxx repositories are wrappers around collections that provide most of the same methods but instead of returning plain documents (or cursors) they wrap the results in Foxx models.
In your case it looks like you probably don't want to use Foxx models at all (you're just converting them back to documents, likely just removing a few attributes like _rev and _id) so you could simply forego the repository completely and use the collection you're passing into it directly:
var geodata = applicationContext.collection('geodata');
/** Lists of all geodata.
*
* This function simply returns the list of all Geodatum.
*/
controller.get('/', function (req, res) {
var parameters = req.parameters;
var limit = parameters['limit'];
var result = geodata.all().limit(10);
if (limit != "undefined") {
result = result.slice(0, limit);
}
res.json(_.map(result, function (doc) {
return _.omit(doc, ['_id', '_rev']);
}));
});
You're not the first person to be confused by the distinction between repositories and collections, which is why repositories and models will go away in the upcoming 3.0 release (but you can still use them in legacy 2.8-compatible services if you need to).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Google Cloud Datastore, how to query for more results - node.js

Related

Poor Performance Writing to Firebase Realtime Database from Google Cloud Function

Compare API response against itself

Using node.js and promise to fetch paginated data

Node.js: async.map getting slower

collection.all().limit(n) not working in FOXX

Categories

Resources