CouchDB reduce function useful in this scenario? - couchdb

I want to store votes in CouchDB. To get round the problem of incrementing a field in one document and having millions of revisions, each vote will be a seperate document:
{
_id: "xyz"
type: "thumbs_up"
vote_id: "test"
}
So the actual document itself is the vote. The result I'd like is basically an array of: vote_id, sumOfThumbsUp, sumOfThumbsDown
Now I think my map function would need to look like:
if(type=="thumbs_up" | type =="thumbs_down"){
emit(vote_id, type)
}
Now here's the bit I can't figure out what to do, should I build a reduce function to somehow sum the vote types, keeping in mind there's two types of votes.
Or should I just take what's been emited from the map function and put it straight into an array to work on, ignoring the reduce function completely?

This is a perfect case for map-reduce! Having each document represent a vote is the right way to go in my opinion, and will work with CouchDB's strengths.
I would recommend a document structure like this:
Documents
UPVOTE
{
"type": "vote",
"vote_id": "test",
"vote": 1
}
DOWNVOTE
{
"type": "vote",
"vote_id": "test",
"vote": -1
}
I would use a document type of "vote", so you can have other document types in your database (like the vote category information, user information, etc)
I kept "vote_id" the same
I made the value field called "vote", and just used 1/-1 instead of "thumbs_up" or "thumbs_down" (really doesn't matter, you can do whatever you want and it will work just fine)
View
Map
function (doc) {
if (doc.type === "vote") {
emit(doc.vote_id, doc.vote);
}
}
Reduce
_sum
You end up with a result like this for your map function:
And if you reduce it:
As you add more vote documents with more vote_id variety, you can query for a specific vote_id by using: /:db/_design/:ddoc/_view/:view?reduce=true&group=true&key=":vote_id"

Related

Conditionally update an array in mongoose [duplicate]

Currently I am working on a mobile app. Basically people can post their photos and the followers can like the photos like Instagram. I use mongodb as the database. Like instagram, there might be a lot of likes for a single photos. So using a document for a single "like" with index seems not reasonable because it will waste a lot of memory. However, I'd like a user add a like quickly. So my question is how to model the "like"? Basically the data model is much similar to instagram but using Mongodb.
No matter how you structure your overall document there are basically two things you need. That is basically a property for a "count" and a "list" of those who have already posted their "like" in order to ensure there are no duplicates submitted. Here's a basic structure:
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3")
"photo": "imagename.png",
"likeCount": 0
"likes": []
}
Whatever the case, there is a unique "_id" for your "photo post" and whatever information you want, but then the other fields as mentioned. The "likes" property here is an array, and that is going to hold the unique "_id" values from the "user" objects in your system. So every "user" has their own unique identifier somewhere, either in local storage or OpenId or something, but a unique identifier. I'll stick with ObjectId for the example.
When someone submits a "like" to a post, you want to issue the following update statement:
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": { "$ne": ObjectId("54bb2244a3a0f26f885be2a4") }
},
{
"$inc": { "likeCount": 1 },
"$push": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
Now the $inc operation there will increase the value of "likeCount" by the number specified, so increase by 1. The $push operation adds the unique identifier for the user to the array in the document for future reference.
The main important thing here is to keep a record of those users who voted and what is happening in the "query" part of the statement. Apart from selecting the document to update by it's own unique "_id", the other important thing is to check that "likes" array to make sure the current voting user is not in there already.
The same is true for the reverse case or "removing" the "like":
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": ObjectId("54bb2244a3a0f26f885be2a4")
},
{
"$inc": { "likeCount": -1 },
"$pull": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
The main important thing here is the query conditions being used to make sure that no document is touched if all conditions are not met. So the count does not increase if the user had already voted or decrease if their vote was not actually present anymore at the time of the update.
Of course it is not practical to read an array with a couple of hundred entries in a document back in any other part of your application. But MongoDB has a very standard way to handle that as well:
db.photos.find(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
},
{
"photo": 1
"likeCount": 1,
"likes": {
"$elemMatch": { "$eq": ObjectId("54bb2244a3a0f26f885be2a4") }
}
}
)
This usage of $elemMatch in projection will only return the current user if they are present or just a blank array where they are not. This allows the rest of your application logic to be aware if the current user has already placed a vote or not.
That is the basic technique and may work for you as is, but you should be aware that embedded arrays should not be infinitely extended, and there is also a hard 16MB limit on BSON documents. So the concept is sound, but just cannot be used on it's own if you are expecting 1000's of "like votes" on your content. There is a concept known as "bucketing" which is discussed in some detail in this example for Hybrid Schema design that allows one solution to storing a high volume of "likes". You can look at that to use along with the basic concepts here as a way to do this at volume.

Reduce output must shrink more rapidly -- Reducing to a list of documents

I have a few documents in my couch db with json as below. The cId will change for each. And I have created a view with map/reduce function to filter out few documents and return a list of json documents.
Document structure -
{
"_id": "ccf8a36e55913b7cf5b015d6c50009f7",
"_rev": "8-586130996ad60ccef54775c51599e73f",
"cId": 1,
"Status": true
}
Here is the sample map:
function(doc) {
if(doc.Key && doc.Value && doc.Status == true)
emit(null, doc);
}
Here is the sample reduce:
function(key, values, rereduce){
var kv = [];
values.forEach(function(value){
if(value.cId != <some_val>){
kv.push({"k": value.cId, "v" : value});
}
});
return kv;
}
If there are two documents and reduce output has list containing 1 document, this works fine. But if I add one more document (with cId = 2), it throws the errors - "reduce output must shrink more rapidly". Why is this caused? And how can I achieve what I intend to do?
The cause of the error is, that the reduce function does not actually reduce anything (it rather is collecting objects). The documentation mentions this:
The way the B-tree storage works means that if you don’t actually
reduce your data in the reduce function, you end up having CouchDB
copy huge amounts of data around that grow linearly, if not faster
with the number of rows in your view.
CouchDB will be able to compute the final result, but only for views
with a few rows. Anything larger will experience a ridiculously slow
view build time. To help with that, CouchDB since version 0.10.0 will
throw an error if your reduce function does not reduce its input
values.
It is unclear to me, what you intend to achieve.
Do you want to retrieve a list of docs based on certain criteria? In this case, a view without reduce should suffice.
Edit: If the desired result depends on a value stored in a certain document, then CouchDB has a feature called list. It is a design function, that provides access to all docs of a given view, if you pass include_docs=true.
A list URL follow this pattern:
/db/_design/foo/_list/list-name/view-name
Like views, lists are defined in a design document:
{
"_id" : "_design/foo",
"lists" : {
"bar" : "function(head, req) {
var row;
while (row = getRow()) {
if (row.doc._id === 'baz') // Do stuff based on a certain doc
}
}"
},
... // views and other design functions
}

Sorting CouchDB result by value

I'm brand new to CouchDB (and NoSQL in general), and am creating a simple Node.js + express + nano app to get a feel for it. It's a simple collection of books with two fields, 'title' and 'author'.
Example document:
{
"_id": "1223e03eade70ae11c9a3a20790001a9",
"_rev": "2-2e54b7aa874059a9180ac357c2c78e99",
"title": "The Art of War",
"author": "Sun Tzu"
}
Reduce function:
function(doc) {
if (doc.title && doc.author) {
emit(doc.title, doc.author);
}
}
Since CouchDB sorts by key and supports a 'descending=true' query param, it was easy to implement a filter in the UI to toggle sort order on the title, which is the key in my results set. Here's the UI:
List of books with link to sort title by ascending or descending
But I'm at a complete loss on how to do this for the author field.
I've seen this question, which helped a poster sort by a numeric reduce value, and I've read a blog post that uses a list to also sort by a reduce value, but I've not seen any way to do this on a string value without a reduce.
If you want to sort by a particular property, you need to ensure that that property is the key (or, in the case of an array key, the first element in the array).
I would recommend using the sort key as the key, emitting a null value and using include_docs to fetch the full document to allow you to display multiple properties in the UI (this also keeps the deserialized value consistent so you don't need to change how you handle the return value based on sort order).
Your map functions would be as simple as the following.
For sorting by author:
function(doc) {
if (doc.title && doc.author) {
emit(doc.author, null);
}
}
For sorting by title:
function(doc) {
if (doc.title && doc.author) {
emit(doc.title, null);
}
}
Now you just need to change which view you call based on the selected sort order and ensure you use the include_docs=true parameter on your query.
You could also use a single view for this by emitting both at once...
emit(["by_author", doc.author], null);
emit(["by_title", doc.title], null);
... and then using the composite key for your query.

In a MongoDB will an index help when a field is just being tested on its length?

I am creating a routine to check for interrupted processing and to carry on, during the startup I'm performing the following search:
.find({"DocumentsPath": {$exists: true, $not: {$size: 0}}})
I want it to be as fast as possible, however the documentation suggests that the index is for scanning within the data. I never need to search within the "DocumentsPath" just use it if its there. Creating an index seems like an overhead I don't want. However having the index might speed up the size test.
My question is whether this field should be indexed within the DB?
Thought of commenting but this does deserve an answer. Should this be indexed? Well probably, but for other purposes. Does this make a difference here? No it does not.
The big point to make is your query terms are redundant ( or could be better ) in this case. Let's look at the example:
{ "DocumentsPath": { "$exists": true } }
That will tell you if there is actually an element in a document that matches the property specified. No it does not an cannot use an index. You can use a "sparse" index though and not even need to call that.
{ "DocumentsPath": { "$not": { "$size" : 0 } } }
This is cute one. Yes it tests the length of an array, but what you are really asking here is "I don't want the array to be empty".
So for the better solution.
Use a "sparse" index:
db.collection.ensureIndex({ "DocumentsPath": 1 }, { "sparse": true })
Query for the zeroth element of an index
{ "DocumentsPath.0": { "$exists": true } }
Still no index for "matching" really, but at least the "sparse" index sorted out some of that my excluding documents and the "dot notation" form here is actually more efficient than evaluating via $size.

emit doc twice with different key in couchdb

Say I have a doc to save with couchDB and the doc looks like this:
{
"email": "lorem#gmail.com",
"name": "lorem",
"id": "lorem",
"password": "sha1$bc5c595c$1$d0e9fa434048a5ae1dfd23ea470ef2bb83628ed6"
}
and I want to be able to query the doc either by 'id' or 'email'. So when save this as a view I write so:
db.save('_design/users', {
byId: {
map: function(doc) {
if (doc.id && doc.email) {
emit(doc.id, doc);
emit(doc.email, doc);
}
}
}
});
And then I could query like this:
db.view('users/byId', {
key: key
}, function(err, data) {
if (err || data.length === 0) return def.reject(new Error('not found'));
data = data[0] || {};
data = data.value || {};
self.attrs = _.clone(data);
delete self.attrs._rev;
delete self.attrs._id;
def.resolve(data);
});
And it works just fine. I could load the data either by id or email. But I'm not sure if I should do so.
I have another solution which by saving the same doc with two different view like byId and byEmail, but in this way I save the same doc twice and obviously it will cost space of the database.
Not sure which solution is better.
The canonical solution would be to have two views, one by email and one by id. To not waste space for the document, you can just emit null as the value and then use the include_docs=true query paramter when you query the view.
Also, you might want to use _id instead of id. That way, CouchDB ensures that the ID will be unique and you don't have to use a view to loop up documents.
I'd change to the two separate views. That's explicit and clear. When you emit the same doc twice in a single view – by an id and e-mail you're effectively combining the 2 views into one. You may think of it as a search tree with the 2 root branches. I don't see any reason of doing that, and would suggest leaving the data access and storage optimization job to the database.
The views combination may also yield tricky bugs, when for some reason you confuse an id and an e-mail.
There is absolutely nothing wrong with emitting the same document multiple times with a different key. It's about what makes most sense for your application.
If id and email are always valid and interchangeable ways to identify a user then a single view is perfect. For example, when id is some sort of unique account reference and users are allowed to use that or their (more memorable) email address to login.
However, if you need to differentiate between the two values, e.g. id is only meant for application administrators, then separate views are probably better. (You could probably use a complex key instead ... but that's another answer.)

Resources