how to improve the view with map/reduce in couchdb and nodejs - node.js

I'm using nodejs with the module cradle to interact with the couchdb server, the question is to let me understanding the reduce process to improve the view query...
For example, I should get the user data from his ID with a view like this:
map: function (doc) { emit(null, doc); }
And in node.js (with cradle):
db.view('users/getUserByID', function (err, resp) {
var found = false;
resp.forEach(function (key, row, id) {
if (id == userID) {
found = true;
userData = row;
}
});
if (found) {
//good, works
}
});
As you can see, this is really bad for large amount of documents (users in the database), so I need to improve this view with a reduce but I don't know how because I don't understand of reduce works.. thank you

First of all, you're doing views wrong. View are indexes at first place and you shouldn't use them for full-scan operations - that's ineffective and wrong. Use power of Btree index with key, startkey and endkey query parameters and emit field you like to search for as key value.
In second, your example could be easily transformed to:
db.get(userID, function(err, body) {
if (!err) {
// found!
}
});
Since in your loop you're checking row's document id with your userID value. There is no need for that loop - you may request document by his ID directly.
In third, if your userID value isn't matches document's ID, your view should be:
function (doc) { emit(doc.userID, null); }
and your code will be looks like:
db.view('users/getUserByID', {key: userID}, function (err, resp) {
if (!err) {
// found!
}
});
Simple. Effective. Fast. If you need matched doc, use include_docs: true query parameter to fetch it.

Related

Google Datastore NodeJS combine (union) multiple sets of keys only results

I am working with NodeJS on Google App Engine with the Datastore database.
Note that this question is an extension of (not a duplicate) my original post.
Due to the fact that Datastore does not have support the OR operator, I need to run multiple queries and combine the results.
Here is my approach (based on the selected answer from my original post):
Use Keys-only queries in the 1st stage
Perform the combination of the keys obtained into a single list (including deduplication)
Obtaining the entities simply by key lookup
I have achieved #1 by running two separate queries with the async parallel module. I need help with step 2.
Question: How to combine (union) two lists of entity keys into a single list (including de-duplication) efficiently?
The code I have below successfully performs both the queries and returns two objects getEntities.requesterEntities and getEntities.dataownerEntities.
//Requirement: Get entities for Transfer Requests that match either the Requester OR the Dataowner
async.parallel({
requesterEntities: function(callback) {
getEntitiesByUsername('TransferRequest', 'requester_username', loggedin_username, (treqs_by_requester) => {
//Callback pass in response as parameter
callback(null, treqs_by_requester)
});
},
dataownerEntities: function(callback) {
getEntitiesByUsername('TransferRequest', 'dataowner_username', loggedin_username, (treqs_by_dataowner) => {
//Callback pass in response as parameter
callback(null, treqs_by_dataowner)
});
}
}, function(err, getEntities) {
console.log(getEntities.requesterEntities);
console.log(getEntities.dataownerEntities);
//***HOW TO COMBINE (UNION) BOTH OBJECTS CONTAINING DATASTORE KEYS?***//
});
function getEntitiesByUsername(kind, property_type, loggedin_username, getEntitiesCallback) {
//Create datastore query
const treq_history = datastore.createQuery(kind);
//Set query conditions
treq_history.filter(property_type, loggedin_username);
treq_history.select('__key__');
//Run datastore query
datastore.runQuery(treq_history, function(err, entities) {
if(err) {
console.log('Transfer Request History JSON unable to return data results for Transfer Request. Error message: ', err);
} else {
getEntitiesCallback(entities);
}
});
}
I was able to combine the two separate sets of entity keys by iterating over the arrays and comparing the ID values for each entity key and creating a new array with the unique keys.
Note: The complete solution is posted as an answer to my original post.
//Union of two arrays of objects entity keys
function unionEntityKeys(arr1, arr2) {
var arr3 = [];
for(var i in arr1) {
var shared = false;
for (var j in arr2)
if (arr2[j][datastore.KEY]['id'] == arr1[i][datastore.KEY]['id']) {
shared = true;
break;
}
if(!shared) {
arr3.push(arr1[i])
}
}
arr3 = arr3.concat(arr2);
return arr3;
}

Is there a way to know if there are more documents available for pagination in MongoDB?

I am writing a Node.JS app with MongoDB. One of the things I need to implement is the listing of the objects. I've already implemented the pagination using the skip() and limit() functions, thanks to this answer:
myModel.find()
.skip(offset)
.limit(limit)
.exec(function (err, doc) {
// do something with objects
})
The thing is, I want my endpoint to return metadata, and one of the fields I need is a boolean representing if there are more objects that can be loaded.
The most simple way to implement this is just loading one more object (to determine if there are more objects to display) and not showing it to the user, something like that:
myModel.find()
.skip(offset)
.limit(limit + 1)
.exec(function (err, doc) {
var moreAvailable; // are there more documents?
if (doc.length > limit) {
moreAvailable = true;
doc.length = limit; // don't show the last object to the user
} else {
moreAvailable = false;
}
})
But I'm pretty sure that there should be more clever way to do this. Is it?
Use db.collection.find(<query>).count() https://docs.mongodb.com/manual/reference/method/cursor.count/
var total = db.collection.find(<query>).count();
var is_more = total > skip + limit;

How to query data from different collections and sort by date?

I stumbled upon problem that my search results are of a mixed data, which is located in different collections (posts/venues/etc), currently Im doing separate requests to retrieve this data, but its obviously sorted among its types (posts array, venues array)
How can I query multiple collections (posts/venues) and sort them by date/any other parameter (via mongoose)?
or maybe there is a better solution?
Thanks
I believe its not possible with Mongoose, you can in the meanwhile do something like this:
var async = require('async');
function getPosts(cb) {
Post.find({"foo": "bar"}, function(err, posts) {
cb(err, posts);
})
}
function getVenues(cb) {
Venue.find({"foo": "bar"}, function(err, venues) {
cb(err, venues);
})
}
async.parallel([getPosts, getVenues], function(err, results) {
if(err) {
next(err);
}
res.send(results.sort(function(a, b) {
//if default sorting is not enough you can change it here
return a.date < b.date ? -1 : a.date > b.date ? 1 : 0;
}));
});
This code assumes you are inside an express route and that both Posts and Venues have a common attribute; date. In case you named these dates attributes differently you would have to improve the sort algorithm.

Creating incrementing numbers with mongoDB

We have an order system where every order has an id. For accounting purposes we need a way to generate invoices with incremening numbers. What is the best way to do this without using an sql database?
We are using node to implement the application.
http://www.mongodb.org/display/DOCS/How+to+Make+an+Auto+Incrementing+Field
The first approach is keeping counters in a side document:
One can keep a counter of the current _id in a side document, in a
collection dedicated to counters. Then use FindAndModify to atomically
obtain an id and increment the counter.
The other approach is to loop optimistically and handle dup key error code of 11000 by continuing and incrementing the id for the edge case of collisions. That works well unless there's high concurrency writes to a specific collection.
One can do it with an optimistic concurrency "insert if not present"
loop.
But be aware of the warning on that page:
Generally in MongoDB, one does not use an auto-increment pattern for
_id's (or other fields), as this does not scale up well on large database clusters. Instead one typically uses Object IDs.
Other things to consider:
Timestamp - unique long but not incrementing (base on epoch)
Hybrid Approach - apps don't necessarily have to pick one storage option.
Come up with your id mechanism based on things like customer, date/time parts etc... that you generate and handle collisions for. Depending on the scheme, collisions can be much less likely. Not necessarily incrementing but is unique and has a well defined readable pattern.
I did not find any working solution, so I implemented the "optimistic loop" in node.js to get Auto-Incrementing Interger ID fields. Uses the async module to realize the while loop.
// Insert the document to the targetCollection. Use auto-incremented integer IDs instead of UIDs.
function insertDocument(targetCollection, document, callback) {
var keepRunning = true;
var seq = 1;
// $type 16/18: Integer Values
var isNumericQuery = {$or : [{"_id" : { $type : 16 }}, {"_id" : { $type : 18 }}]};
async.whilst(testFunction, mainFunction, afterFinishFunction);
// Called before each execution of mainFunction(). Works like the stop criteria of a while function.
function testFunction() {
return keepRunning;
}
// Called each time the testFunction() passes. It is passed a function (next) which must be called after it has completed.
function mainFunction(next) {
findCursor(targetCollection, findCursorCallback, isNumericQuery, { _id: 1 });
function findCursorCallback(cursor) {
cursor.sort( { _id: -1 } ).limit(1);
cursor.each(cursorEachCallback);
}
function cursorEachCallback(err, doc) {
if (err) console.error("ERROR: " + err);
if (doc != null) {
seq = doc._id + 1;
document._id = seq;
targetCollection.insert(document, insertCallback);
}
if (seq === 1) {
document._id = 1;
targetCollection.insert(document, insertCallback);
}
}
function insertCallback(err, result) {
if (err) {
console.dir(err);
}
else {
keepRunning = false;
}
next();
}
}
// Called once after the testFunction() fails and the loop has ended.
function afterFinishFunction(err) {
callback(err, null);
}
}
// Call find() with optional query and projection criteria and return the cursor object.
function findCursor(collection, callback, optQueryObject, optProjectionObject) {
if (optProjectionObject === undefined) {
optProjectionObject = {};
}
var cursor = collection.find(optQueryObject, optProjectionObject);
callback(cursor);
}
Call with
insertDocument(db.collection(collectionName), documentToSave, function() {if(err) console.error(err);});

How can I pass parameters to a view using cradle (CouchDB)

Using cradle, how am I able to pass parameters to a view in CouchDB?
Update
Say I want to return documents which match other properties than _key (the default)...
// document format
{
_key,
postHeading,
postBody,
postDate
}
What if I wanted to match documents against the postHeading property... How would I go about this? What would the view look like, and how would I pass a search string to that view?
At the moment I'm doing this...
database.get("980f2ba66d5c8f9c91b9204a4d00022a", function (error, document)
{
});
I would like to access a view instead, and instead of the 40 character long auto-generated key, I'd like to pass a string, matching another property.
Something along the lines of this...
database.save("_design/posts", {
single: {
map: function (document)
{
if (document.postHeading == PARAMETER_PASSED_GOES_HERE)
emit(null, document);
}
}
});
database.view("posts/single", function (error, documents)
{
});
If you are querying a view try to pass second parameter as options object with your settings, for example:
db.view('characters/all', {descending: true}, function (err, res) {
res.forEach(function (row) {
sys.puts(row.name + " is on the " +
row.force + " side of the force.");
});
});
Also be aware of this:
Some query string parameters' values
have to be JSON-encoded.
EDIT:
As far as I know you can't create a view in CouchDB where you pass your custom parameter which will be used in map/reduce function code. You have to emit keys from your map function and based on them you can query the view with parameters like startkey and endkey. Try to look at Database Queries the CouchDB Way article.
db.get('vader', function (err, doc) {
doc.name; // 'Darth Vader'
assert.equal(doc.force, 'dark');
});
It looks like the searched value (parameter) here is 'dark' out of all force keys?
Cradle is also able to fetch multiple
documents if you have a list of ids,
just pass an array to get:
db.get(['luke', 'vader'], function
(err, doc) { ... });

Resources