Count related documents in CouchDB - couchdb

I'm pretty new to CouchDB and I still have some problems wrapping my head around the whole MapReduce way of querying my data...
To stay with the traditional "Blog" example, let's say I have 2 types of documents: post and comment... each comment document has a post_id field...
Is there a way I can get a list of posts with the number of comments for each of these posts with only 1 query? Let's say I want to display a list of post titles with the number of comments for each post like this:
My First Post: 4 comments
My Second Post: 6 comments
....
I know I can do the following:
function(doc) {
if(doc.type == "comment") {
emit(doc.post_id, 1);
}
}
and then reduce it like this:
function (key, values, rereduce) {
return sum(values);
}
which gives me a list of each blog post id, with the number of comments for each posts. But then I need to fetch the blog posts titles separately since the only thing I have right now is their id...
So, is there a way I could retrive a list of each blog post titles, with the number of comments for each posts, by doing only 1 query?

Have a look at View Collation:
http://wiki.apache.org/couchdb/View_collation?action=show&redirect=ViewCollation

You could do something like this:
function(doc) {
if(doc.type == "post") {
emit([doc._id, 'title', doc.title], 0);
}
if(doc.type == "comment") {
emit([doc.post_id, 'comments'], 1);
}
}
Then you'd get a view where each post gets two rows, one with the title and one with the comments.
You can merge the rows together on the client, or you can use a "list" function to merge these groups of rows together within couchdb:
http://wiki.apache.org/couchdb/Formatting_with_Show_and_List
function list(head, req) {
var post;
var row;
var outputRow = function() {
if(post) { send(post); }
}
while(row = getRow()) {
if(!post || row.key[0] != post.id) {
outputRow();
post = {id:row.key[0]};
}
/* If key is a triple, use part 3 as the value, otherwise assume its a count */
var value = row.key.length === 3 ? row.key[2] : row.value;
post[row.key[1]] = value;
}
outputRow();
}
Note: not tested code!

My experience is that in most "normal" cases you are better off having one big document containing both the post and the comments.
Of course, I am aware that it's not a good idea if you have thousands of comments. That's why I said "most normal cases". Don't throw out this option right off, as "improper".
You get all kinds of goodies like being able to count comments count in the map view, easy (one request) retrieval of the whole page from the database, ACID per post (with comments) etc. Plus, you don't need to think about trickeries like view collation right now.
If it gets slow, you can always transform your data structure later on (hell, we used to do it every day with RDBMS).
If your use case is not totally unsuitable for this, I really advise you to try it. It works remarkably well.

Related

Reduce output must shrink more rapidly -- Reducing to a list of documents

I have a few documents in my couch db with json as below. The cId will change for each. And I have created a view with map/reduce function to filter out few documents and return a list of json documents.
Document structure -
{
"_id": "ccf8a36e55913b7cf5b015d6c50009f7",
"_rev": "8-586130996ad60ccef54775c51599e73f",
"cId": 1,
"Status": true
}
Here is the sample map:
function(doc) {
if(doc.Key && doc.Value && doc.Status == true)
emit(null, doc);
}
Here is the sample reduce:
function(key, values, rereduce){
var kv = [];
values.forEach(function(value){
if(value.cId != <some_val>){
kv.push({"k": value.cId, "v" : value});
}
});
return kv;
}
If there are two documents and reduce output has list containing 1 document, this works fine. But if I add one more document (with cId = 2), it throws the errors - "reduce output must shrink more rapidly". Why is this caused? And how can I achieve what I intend to do?
The cause of the error is, that the reduce function does not actually reduce anything (it rather is collecting objects). The documentation mentions this:
The way the B-tree storage works means that if you don’t actually
reduce your data in the reduce function, you end up having CouchDB
copy huge amounts of data around that grow linearly, if not faster
with the number of rows in your view.
CouchDB will be able to compute the final result, but only for views
with a few rows. Anything larger will experience a ridiculously slow
view build time. To help with that, CouchDB since version 0.10.0 will
throw an error if your reduce function does not reduce its input
values.
It is unclear to me, what you intend to achieve.
Do you want to retrieve a list of docs based on certain criteria? In this case, a view without reduce should suffice.
Edit: If the desired result depends on a value stored in a certain document, then CouchDB has a feature called list. It is a design function, that provides access to all docs of a given view, if you pass include_docs=true.
A list URL follow this pattern:
/db/_design/foo/_list/list-name/view-name
Like views, lists are defined in a design document:
{
"_id" : "_design/foo",
"lists" : {
"bar" : "function(head, req) {
var row;
while (row = getRow()) {
if (row.doc._id === 'baz') // Do stuff based on a certain doc
}
}"
},
... // views and other design functions
}

How to return multiple Mongoose collections in one get request?

I am trying to generate a response that returns the same collection sorted by 3 different columns. Here's the code I currently have:
var findRoute = router.route("/find")
findRoute.get(function(req, res) {
Box.find(function(err, boxes) {
res.json(boxes)
}).sort("-itemCount");
});
As you can see, we're making a single get request, querying for the Boxes, and then sorting them by itemCount at the end. This does not work for me because the request only returns a single JSON collection that is sorted by itemCount.
What can I do if I want to return two more collections sorted by, say, name and size properties -- all in the same request?
Crete an object to encapsulate the information and chain your find queries, like:
var findRoute = router.route("/find");
var json = {};
findRoute.get(function(req, res) {
Box.find(function(err, boxes) {
json.boxes = boxes;
Collection2.find(function (error, coll2) {
json.coll2 = coll2;
Collection3.find(function (error, coll3) {
json.coll3 = coll3;
res.json(json);
}).sort("-size");
}).sort("-name");
}).sort("-itemCount");
});
Just make sure to do the appropriate error checking.
This is kind of uggly and makes your code kind of difficult to read. Try to adapt this logic using modules like async or even promises (Q and bluebird are good examples).
If I understand well, you want something like that : return Several collections with mongodb
Tell me if that helps.
Bye.
Have you tried ?
Box.find().sort("-itemCount").exec(function(err, boxes) {
res.json(boxes)
});
Also for sorting your results based on 2 or more fields you can use :
.sort({name: 1, size: -1})
Let me know if that helps.

How can I query multiple key criteria?

Using couchdb, with the following json:
{"total_rows":3,"offset":0,"rows":[ {"id":"bc26e5eae7f8c8c3486818e7e7971df0","key":{"user":"lili#abc.com","pal":["igol ≠ eagle"],"fecha":"10/5/2014"},"value":null},{"id":"cf0dc2e2874776958c59f2f544b5a750","key":{"user":"lili#abc.com","pal":["kat ≠cat"],"fecha":"10/6/2014"},"value":null},{"id":"df4ec96088ed52096db064f2ebd2310b","key":{"user":"dum#ghi.com","pal":["dok ≠ duck"],"fecha":"10/7/2014"},"value":null}]}
I would like to query for specific user AND specific date:
for example:
?user="lili#def.com"&fecha:"10/6/2014"
I also tried:
?user%3Dlili%40def.com%26fecha%3A10%2F6%2F2014
Needless to say, it isn't currently working as I expected (all results are shown, not only the register needed).
my view func is:
function(doc) {
if (doc.USER){
emit({user:doc.USER, pal:doc.palabras, fecha:doc.fecha});
}
}
Regards.
Remember that CouchDB views are simply key/value lookups that are built at index-time, not query time. At the minute you are emitting a key with no value. If you want to look something up by two values, you'll need to emit a composite key (an array):
function(doc) {
if (doc.USER) {
emit([doc.USER, doc.fecha], doc);
}
}
Then you can look up matching documents by passing the array as the key:
?key=%5B%22lili%40def.com%22%2C%20%2210%2F6%2F2014%22%5D
There are optimisations you can make to this (e.g. emitting a null value and using include_docs to reduce the size of the view) but this should set you off on the right track.
I do the same thing as Ant P but I tend to use strings.
function ( doc ) {
if ( doc.USER ) {
emit( 'user-' + doc.USER + '-' + doc.fecha, doc );
}
}
I would also highly recommend emitting null instead of doc as a value.
Remember, you can always emit more than once depending on what kind of queries you need.
For example, if you're looking for all posts by a specific user between two dates, you could do the following view.
function ( doc ) {
if ( doc.type == "post" ) {
emit( 'user-' + doc.nombre, null );
emit( 'fecha-' + doc.fecha, null );
}
}
Then you would query the view twice _view/posts?key="user-miUsario", and _view/posts?start_key="fecha-1413040000000"&end_key="fecha-1413049452904". Then, once you have all of the ids from both views, you take the intersection and use _all_docs to get your original documents.
You end up making three requests but it saves disk space in the view, the payloads are smaller because you return null, and your code is simpler because you can query the same view multiple ways.

How do I $inc an entire object full of properties without building the query with a loop?

I have a collection of documents for the form:
{ name:String, groceries:{ apples:Number, cherries:Number, prunes:Number } }
Now, every query I have to increment with positive and/or negative values for each element in "groceries". It is not important what keys or how many, I just added some examples.
I could do a :
var dataToBeIncremented = stuff;
var $inc = {};
for each( var index in dataToBeIncremented )
{
$inc[ "groceries." + index ] = dataToBeIncremented[ index ];
}
then
db.update( { _id:targetID }, { $inc : query } )
however, I might have thousands of grocery elements and find doing this loop at each update to be ugly and unoptimized.
I would like to know how to void this or why it can't be optimized.
Actually there is no way to avoid it, because there is no such command that can increment all the values inside the subdocument.
So the only way to do it is to do something like you have done:
{
"$inc": {
"groceries.apples" : 1,
"groceries.cherries" : 1,
"groceries.prunes" : 1
}
}
Because you do not know what are the fields exactly, you need to find them beforehand and to create the $inc statement. There is one good thing about these updates: no matter how may elements do you have, you will still need only 2 queries (find what to update and to actually perform update).
I was also thinking how to achieve a better results with a different schema, but apparently you have to cope with what you have.

emit doc twice with different key in couchdb

Say I have a doc to save with couchDB and the doc looks like this:
{
"email": "lorem#gmail.com",
"name": "lorem",
"id": "lorem",
"password": "sha1$bc5c595c$1$d0e9fa434048a5ae1dfd23ea470ef2bb83628ed6"
}
and I want to be able to query the doc either by 'id' or 'email'. So when save this as a view I write so:
db.save('_design/users', {
byId: {
map: function(doc) {
if (doc.id && doc.email) {
emit(doc.id, doc);
emit(doc.email, doc);
}
}
}
});
And then I could query like this:
db.view('users/byId', {
key: key
}, function(err, data) {
if (err || data.length === 0) return def.reject(new Error('not found'));
data = data[0] || {};
data = data.value || {};
self.attrs = _.clone(data);
delete self.attrs._rev;
delete self.attrs._id;
def.resolve(data);
});
And it works just fine. I could load the data either by id or email. But I'm not sure if I should do so.
I have another solution which by saving the same doc with two different view like byId and byEmail, but in this way I save the same doc twice and obviously it will cost space of the database.
Not sure which solution is better.
The canonical solution would be to have two views, one by email and one by id. To not waste space for the document, you can just emit null as the value and then use the include_docs=true query paramter when you query the view.
Also, you might want to use _id instead of id. That way, CouchDB ensures that the ID will be unique and you don't have to use a view to loop up documents.
I'd change to the two separate views. That's explicit and clear. When you emit the same doc twice in a single view – by an id and e-mail you're effectively combining the 2 views into one. You may think of it as a search tree with the 2 root branches. I don't see any reason of doing that, and would suggest leaving the data access and storage optimization job to the database.
The views combination may also yield tricky bugs, when for some reason you confuse an id and an e-mail.
There is absolutely nothing wrong with emitting the same document multiple times with a different key. It's about what makes most sense for your application.
If id and email are always valid and interchangeable ways to identify a user then a single view is perfect. For example, when id is some sort of unique account reference and users are allowed to use that or their (more memorable) email address to login.
However, if you need to differentiate between the two values, e.g. id is only meant for application administrators, then separate views are probably better. (You could probably use a complex key instead ... but that's another answer.)

Resources