How can I display CouchDB Complex output like this in one request? - couchdb

I just the beginner of couchdb so I may be misunderstand point of view so you can teach and discuss with me
Doc Type
- User
- Topic
- Comment
Requirement
- I want to webboard
- 1 Request to get this complex doc
Output I need KEY "topic-id" , VALUE {
_id : "topic-id", created_at:"2011-05-30 19:50:22", title:"Hello
World", user: {_id :
"user-1",type:"user",username:"dominixz",signature:"http://dominixz.com"}
comments: [ {_id:"comment-1", text:"Comment 1",created_at:"2011-05-30
19:50:22",user: {_id :
"user-1",type:"user",username:"dominixz",signature:"http://dominixz.com"}},
{_id:"comment-2", text:"Comment 2",created_at:"2011-05-30
19:50:23",user: {_id :
"user-2",type:"user",username:"dominixz2",signature:"http://dominixz1.com"}},
{_id:"comment-3", text:"Comment 3",created_at:"2011-05-30
19:50:24",user: {_id :
"user-3",type:"user",username:"dominixz3",signature:"http://dominixz2.com"}},
] }
I have "user" data like this
{_id:"user-1",type:"user",username:"dominixz",signature:"http://dominixz.com"}
{_id:"user-2",type:"user",username:"dominixz2",signature:"http://dominixz1.com"}
{_id:"user-3",type:"user",username:"dominixz3",signature:"http://dominixz2.com"}
"Topic" data like this {_id : "topic-id",created_at:"2011-05-30
19:50:22",title:"Hello World",user:"user-1"}
"Comment" data like this {_id:"comment-1",type:"comment" ,
text:"Comment 1", created_at:"2011-05-30 19:50:22" , user: "user-1" ,
topic:"topic-id"} {_id:"comment-2",type:"comment" , text:"Comment 2",
created_at:"2011-05-30 19:50:23" , user: "user-2" , topic:"topic-id"}
{_id:"comment-3",type:"comment" , text:"Comment 3",
created_at:"2011-05-30 19:50:24" , user: "user-3" , topic:"topic-id"}
How can I write map,reduce,list for achieve this complex data ? and how about when I wanna use LIMIT , OFFSET like in db
Thank in advance

It's a bit hard to tell what you're looking for here, but I think you're asking for a classic CouchDB join as documented in this web page.
I'd recommend reading the whole thing, but the punchline looks something like this (translated for your data):
function (doc) {
if (doc.type === 'topic') {
emit([doc._id, 0, doc.created_at], null);
} else if (doc.type === 'comment') {
emit([doc._id, 1, doc.created_at], null);
}
}
That map will return the topic ID followed by all of its comments in chronological order. The null prevents the index from getting too large, you can always add include_docs=true on your request to pull full docs when you need them, or you can use index best practices of including the bits that are interesting there.

CouchDB is a document database, not a relational database. As such it is best suited to deal with documents that encompass all the related data. While you can normalize your schema relational-style like you did, I'd argue that this isn't be best use case for Couch.
If I were to design your CMS in Couch I'd keep the topic, content and comments all in a single document. That would directly solve your problem.
You're free of course to use document stores to emulate relational databases, but that's not their natural use case, which leads to questions like this one.

Related

how do i get the top 20 most searched query in elasticsearch?

I have stored sentences in elasticsearch for autosuggestion.
format:
{
"text": "what is temperature in chicago"
}
it suggests correctly when w or wha or what typed. but I am wondering if there is any way I can fetch most search sentences from elasticsearch.
Sounds like what you need is terms aggregations:
Your request body should look something like this:
{
"query": {
//your query
},
"aggs": {
"common" : {
"terms" : { "field" : "text.keyword", "size": 20 }
}
}
}
If I get your question correctly you want most common searches done wrt to input query, a simple solution can be implemented.
Just track what user finally selects (document of ES) and then increment its counter by 1 keeping mapping of _id.
Running a batch system/sync/indexing this data in ES data will have counter value in your data.
Use this while giving suggestions i.e sort with count field.
This will start working properly as users start using.
Your ES document would look like.
{ "text":"what is temperature in chicago",
"count":10
}
I would suggest this is very raw solution there can be many, but nice to start with.

How can I reduce the number of calls to a MongoDB instance when using a role-based application?

I'm specifically talking about NodeJS with MongoDB (I know MongoDB is schema-less, but let's be realistic about the importance of structuring data for a moment).
Is there some magic solution to minimising the number of queries to a database in regards to authenticating users? For example, if the business logic of my application needs to ensure that a user has the necessary privileges to update/retrieve data from a certain Document or Collection, is there any way of doing this without two calls to the database? One to check the user has the rights, and the other to retrieve the necessary data?
EDIT:
Another question closed by the trigger-happy SO moderators. I agree the question is abstract, but I don't see how it is "not a real question". To put it simply:
What is the best way to reduce the number of calls to a database in role-based applications, specifically in the context of NodeJS + MongoDB? Can it be done? Or is role-based access control for NodeJS + MongoDB inefficient and clumsy?
Obviously, you know wich document holds which rigths. I would guess that it is a field in the document, like :
{ 'foo':'bar'
'canRead':'sales' }
At the start of the session you could query the roles a user has. Say
{ 'user':'shennan',
'roles':[ 'users','west coast','sales'] }
You could store that list of roles in the user's session. With that in hand, all that's left to do is add the roles with an $in operator, like this :
db.test.find({'canRead':{'$in':['users','west coast','sales']})
Where the value for the $in operator is taken from the user's session. Here is code to try it out on your own, in the mongo console :
db.test.insert( { 'foo':'bar', 'canRead':'sales' })
db.test.insert( { 'foo2':'bar2', 'canRead':['hr','sales'] })
db.test.insert( { 'foo3':'bar3', 'canRead':'hr' })
> db.test.find({}, {_id:0})
{ "foo" : "bar", "canRead" : "sales" }
{ "foo2" : "bar2", "canRead" : [ "hr", "sales" ] }
{ "foo3" : "bar3", "canRead" : "hr" }
Document with 'foo3' can't be read by someone in sales :
> db.test.find({'canRead':{'$in':['users','west coast','sales']}}, {_id:0})
{ "foo" : "bar", "canRead" : "sales" }
{ "foo2" : "bar2", "canRead" : [ "hr", "sales" ] }
Definitely do-able, but w/o more context it's hard to determine what's best.
One simple solution that comes to mind is to cache users and their permissions in memory so no DB lookup is required. At this point you can just issue the query for documents where permission match and...
Let me know if you need a few more ideas.

Show message owner

Help me with understanding mongodb, please.
have three collections: threads, messages and users.
thread
{ "title" : "1212", "message" : "12121", "user_id" : "50ffdfa42437e00223000001", "date" : ISODate("2013-04-11T19:48:36.878Z"), "_id" : ObjectId("51671394e5b854b042000003") }
message
{ "message" : "text", "image" : null, "thread_id" : "51671394e5b854b042000003", "user_id" : "516d08a7772d141766000001", "date" : ISODate("2013-04-17T15:58:07.021Z"), "_id" : ObjectId("516ec68fb91b762476000001") }
user
{ "user" : "admin", "date" : ISODate("2013-04-16T08:15:35.497Z"), "status" : 1, "_id" : ObjectId("516d08a7772d141766000001") }
How can I display all messages for current thread and get user name (for comment) from users collection?
this code get only messages without user name
exports.getMessages = function(id, skip, callback) {
var skip = parseInt(skip);
messages.find({thread_id: id}).sort({date: 1}).skip(skip).limit(20).toArray(
function(e, res) {
if (e) {
callback(e)}
else callback(null, res)
});
};
Node.js and mongo native
Generally, Mongo uses embedded documents or references to maintain relationships. Here is a link from the mongo docs worth reading.
What you are currently doing is storing a manual reference to the user collection within your message collection. Mongo manual references require additional queries in order to get that referenced data. In this case, using a reference based relationship will work, but it would force the N+1 query problem. Meaning you will have to make an addition query for every message you wish to display plus the original query for messages. References are explained in further detail here. One solution would be to incorporate DBRefs, which would require language specific driver support.
Another alternative would be use embedded documents. In this case you would store the related user object embedded within the messages object. Here is another link to the mongo docs with a great example. In this case, you would make a single query, which will return all of the messages, with each related user object embedded inside. Although embedded documents encourage duplicate data, in many cases they provide performance benefits. All of this information is explained in the mongo docs and can be read in detail to further understand the data modeling of mongo.
Additionally, the mongoose library is pretty awesome and has a populate function which is helpful for references.

MongoDB query by foreign key

I have two collections:
USERS:
{ id:"aaaaaa" age:19 , sex:"f" }
{ id:"bbbbbb" age:30 , sex:"m" }
REVIEWS:
{ id:777777 , user_id:"aaaaaa" , text:"some review data" }
{ id:888888 , user_id:"aaaaaa" , text:"some review data" }
{ id:999999 , user_id:"bbbbbb" , text:"some review data" }
I would like to findAll REVIEWS Where sex=f and age>18
( I dont want to nest because the reviews collection will be huge )
You should include user's data into each review (a.k.a. as denormalizing):
{ id:777777 , user: { id:"aaaaaa", age:19 , sex:"f" } , text:"some review data" }
{ id:888888 , user: { id:"aaaaaa", age:19 , sex:"f" } , text:"some other review data" }
{ id:999999 , user: { id:"bbbbbb", age:20 , sex:"m" } , text:"mome review data" }
Here, read this link on MongoDB Data Modeling:
A Note on Denormalization
Relational purists may be feeling uneasy already, as if we were
violating some universal law. But let's bear in mind that MongoDB
collections are not equivalent to relational tables; each serves a
unique design objective. A normalized table provides an atomic,
isolated chunk of data. A document, however, more closely represents
an object as a whole. In the case of a social news site, it can be
argued that a username is intrinsic to the story being posted.
What about updates to the username? It's true that such updates will
be expensive; happily, in this case, they'll be rare. The read savings
achieved in denormalizing will surely outweigh the costs of the
occasional update. Alas, this is not hard and fast rule: ultimately,
developers must evaluate their applications for the appropriate level
of normalization.
Unless you de-normalize REVIEWS collection with your search attributes, MongoDB does not support querying another collection in a single query. See this post.

Whats the best way of saving a document with revisions in a key value store?

I'm new to Key-Value Stores and I need your recommendation. We're working on a system that manages documents and their revisions. A bit like a wiki does. We're thinking about saving this data in a key value store.
Please don't give me a recommendation that is the database you prefer because we want to hack it so we can use many different key value databases. We're using node.js so we can easily work with json.
My Question is: What should the structure of the database look like? We have meta data for each document(timestamp, lasttext, id, latestrevision) and we have data for each revision (the change, the author, timestamp, etc...). So, which key/value structure you recommend?
thx
Cribbed from the MongoDB groups. It is somewhat specific to MongoDB, however, it is pretty generic.
Most of these history implementations break down to two common strategies.
Strategy 1: embed history
In theory, you can embed the history of a document inside of the document itself. This can even be done atomically.
> db.docs.save( { _id : 1, text : "Original Text" } )
> var doc = db.docs.findOne()
> db.docs.update( {_id: doc._id}, { $set : { text : 'New Text' }, $push : { hist : doc.text } } )
> db.docs.find()
{ "_id" : 1, "hist" : [ "Original Text" ], "text" : "New Text" }
Strategy 2: write history to separate collection
> db.docs.save( { _id : 1, text : "Original Text" } )
> var doc = db.docs.findOne()
> db.docs_hist.insert ( { orig_id : doc._id, ts : Math.round((new Date()).getTime() / 1000), data : doc } )
> db.docs.update( {_id:doc._id}, { $set : { text : 'New Text' } } )
Here you'll see that I do two writes. One to the master collection and
one to the history collection.
To get fast history lookup, just grab the original ID:
> db.docs_hist.ensureIndex( { orig_id : 1, ts : 1 })
> db.docs_hist.find( { orig_id : 1 } ).sort( { ts : -1 } )
Both strategies can be enhanced by only displaying diffs
You could hybridize by adding a link from history collection to original collection
Whats the best way of saving a document with revisions in a key value store?
It's hard to say there is a "best way". There are obviously some trade-offs being made here.
Embedding:
atomic changes on a single doc
can result in large documents, may break the reasonable size limits
probably have to enhance code to avoid returning full hist when not necessary
Separate collection:
easier to write queries
not atomic, needs two operations (do you have transactions?)
more storage space (extra indexes on original docs)
I'd keep a hierarchy of the real data under each document with the revision data attached, for instance:
{
[
{
"timestamp" : "2011040711350621",
"data" : { ... the real data here .... }
},
{
"timestamp" : "2011040711350716",
"data" : { ... the real data here .... }
}
]
}
Then use the push operation to add new versions and periodically remove the old versions. You can use the last (or first) filter to only get the latest copy at any given time.
I think there are multiple approaches and this question is old but I'll give my two cents as I was working on this earlier this year. I have been using MongoDB.
In my case, I had a User account that then had Profiles on different social networks. We wanted to track changes to social network profiles and wanted revisions of them so we created two structures to test out. Both methods had a User object that pointed to foreign objects. We did not want to embed objects from the get-go.
A User looked something like:
User {
"tags" : [Tags]
"notes" : "Notes"
"facebook_profile" : <combo_foreign_key>
"linkedin_profile" : <same as above>
}
and then, for the combo_foreign_key we used this pattern (Using Ruby interpolation syntax for simplicity)
combo_foreign_key = "#{User.key}__#{new_profile.last_updated_at}"
facebook_profiles {
combo_foreign_key: facebook_profile
... and you keep adding your foreign objects in this pattern
}
This gave us O(1) lookup of the latest FacebookProfile of a User but required us to keep the latest FK stored in the User object. If we wanted all of the FacebookProfiles we would then ask for all keys in the facebook_profiles collection with the prefix of "#{User.key}__" and this was O(N)...
The second strategy we tried was storing an array of those FacebookProfile keys on the User object so the structure of the User object changed from
"facebook_profile" : <combo_foreign_key>
to
"facebook_profile" : [<combo_foreign_key>]
Here we'd just append on the new combo_key when we added a new profile variation. Then we'd just do a quick sort of the "facebook_profile" attribute and index on the largest one to get our latest profile copy. This method had to sort M strings and then index the FacebookProfile based on the largest item in that sorted list. A little slower for grabbing the latest copy but it gave us the advantage knowing every version of a Users FacebookProfile in one swoop and we did not have to worry about ensuring that foreign_key was really the latest profile object.
At first our revision counts were pretty small and they both worked pretty well. I think I prefer the first one over the second now.
Would love input from others on ways they went about solving this issue. The GIT idea suggested in another answer actually sounds really neat to me and for our use case would work quite well... Cool.

Resources