I have two collections:
USERS:
{ id:"aaaaaa" age:19 , sex:"f" }
{ id:"bbbbbb" age:30 , sex:"m" }
REVIEWS:
{ id:777777 , user_id:"aaaaaa" , text:"some review data" }
{ id:888888 , user_id:"aaaaaa" , text:"some review data" }
{ id:999999 , user_id:"bbbbbb" , text:"some review data" }
I would like to findAll REVIEWS Where sex=f and age>18
( I dont want to nest because the reviews collection will be huge )
You should include user's data into each review (a.k.a. as denormalizing):
{ id:777777 , user: { id:"aaaaaa", age:19 , sex:"f" } , text:"some review data" }
{ id:888888 , user: { id:"aaaaaa", age:19 , sex:"f" } , text:"some other review data" }
{ id:999999 , user: { id:"bbbbbb", age:20 , sex:"m" } , text:"mome review data" }
Here, read this link on MongoDB Data Modeling:
A Note on Denormalization
Relational purists may be feeling uneasy already, as if we were
violating some universal law. But let's bear in mind that MongoDB
collections are not equivalent to relational tables; each serves a
unique design objective. A normalized table provides an atomic,
isolated chunk of data. A document, however, more closely represents
an object as a whole. In the case of a social news site, it can be
argued that a username is intrinsic to the story being posted.
What about updates to the username? It's true that such updates will
be expensive; happily, in this case, they'll be rare. The read savings
achieved in denormalizing will surely outweigh the costs of the
occasional update. Alas, this is not hard and fast rule: ultimately,
developers must evaluate their applications for the appropriate level
of normalization.
Unless you de-normalize REVIEWS collection with your search attributes, MongoDB does not support querying another collection in a single query. See this post.
Related
I am new to CouchDB and the NoSQL scene and coming from a SQL background. I have some questions on referential integrity, for example I have a product document as below
{
type: "product"
name: "Sweet Necklace"
category: "necklace"
}
And each category have their own document
{
type: "category",
name: "necklace",
custom_attr: ".."
}
Just for the sake of the argument, what happens when the stakeholder chose to rename the category from "necklace" to "accessories", what should happen on the products that have the category field set as "necklace"? Do I do a bulk update on all products with category equal to necklace? (I don't think CouchDB allows us to perform a "UPDATE ALL WHERE" kinda statement)
What is the best practice on handling such situation?
P/S: I chose to save the category name in the product document instead of a category ID since NoSQL encourages denormalization anyway.
If you're maintaining a separate document for the category then you've not denormalized your data at all. In fact, by doing what you're doing, you're getting the worst of both worlds - no normalization and no denormalization.
Try something like this:
Product document:
{
_id:"product:first_product",
name:"First Product"
category:"category:category_1"
}
Category document:
{
_id:"category:category_1",
name:"Category 1",
custom_attr: {}
}
This way, when you change the name of the category, you're still referring to the correct document from all the product documents that have this category.
Note: you can still have a type field and let the _id remain as it is currently.
Edit:
To get the product/category info, you can define a map function like so:
function(doc){
if(doc.id.indexOf('product:') === 0){
// or if(doc.type === 'product') if you use the type field
emit(doc, {'_id': doc.category});
}
}
Now whenever you use this view and you set include_docs to true, the category information will be included in your results.
I'm just starting out using MongoDB and I'm a bit lost at how to structure my documents to tackle the following problem (note this is just a basic example that matches what I'm having difficulty with).
The Problem:
Let's say I have a top-level document called Family, these documents are stored in a collection called families and contain some basic information about a family, i.e.
{
_id: ObjectId("foobar"),
familyName: "Simpson",
address: "743 Evergreen Terrace, SpringField",
children: [
{
_id: ObjectId("barfoo"),
firstName: "Bartholomew",
preferredName: "Bart",
middleName: "JoJo"
}
// ... etc.
]
}
Now, let's say I'm adding another top-level document to my application: School, stored in a collection called schools. I need to relate each child to their school, and each child may only attend one school at any point in time. How would I approach this in Mongo? I've come from a very heavy RDBMS background and I'm having a bit of difficulty figuring this out. The main issues that I've come up against in my solutioning revolve around the fact that I'll need to efficiently handle the following use-cases:
View a school and be able to see all Children enrolled there
View a family and see all of the Schools that their children are enrolled in
What I've tried:
Storing the child references in the `School`
The first solution I went with was to make a enrollments array in my School document which referenced the _id of a child as well as their full name for convenience, i.e.
{
_id: ObjectId("asdadssa"),
name: "Springfield Elementary",
enrollments: [
{
child_id: ObjectId("barfoo"), // Bart Simpson
fullName: "Bart Simpson" // concatenation of preferredName and familyName
}
]
}
This seemed fantastic for the first use-case, which just needed to display all of the students enrolled at a particular school.
However when I turned to the second use-case I realised I may have made a mistake. How on earth would you figure out which school each child in a Family belonged to? The only way I could see would be to actually traverse every single school in the schools collection, drill down into their enrollments and see if the child_id matched a child in the family...doesn't seem very efficient does it? That led to my next attempt.
Storing a reference to the school in a child object
Because each child can only belong to one school I figured I could maybe just store a reference to the School document in each child sub-doc, i.e. Bart's document would now become:
{
_id: ObjectId("barfoo"),
firstName: "Bartholomew",
preferredName: "Bart",
middleName: "JoJo",
school_id: ObjectId("asdadssa")
}
Now the second use-case is happy, but the first is unsatisfied.
Conclusion
The only way I can see both use-cases being satisfied is if I employ both solutions simultaneously, i.e. store the school_id in the child sub-doc and also store the child_id in the enrollments array.
This just seems clunky to me, it means you'll need to do at least two writes per enrollment change (to remove from the school and change the child). As far as I'm aware MongoDB only has atomic writes and no transaction support so this looks like a place where data integrity could potentially suffer.
If any MongoDB gurus could propose an alternate solution that'd be great. I'm aware that this particular problem really screams "RDBMS!!!!", but this is only a small part of the application and some of the other data really lends itself to a document store.
I'm only in the planning stage now so I'm not 100% committed to Mongo, but I thought I'd give it a crack since I've been hearing some good things about it.
For the small use case that you've described, I would switch up the families collection to a people or students collection
{
"_id" : ObjectId("barfoo"),
"family_id" : ObjectId("spamneggs"),
"name" : {
"first" : "Bartholomew",
"last" : "Simpson"
},
"school_id" : ObjectId("asdadssa")
}
that stores students as separate documents but unites them with a common family_id (I also snuck in another way to store names). Schools can just have school information without enrollments. I'll give example code in the mongo shell for your two use cases. To find all the children enrolled in Bart's school and Bart's school's document:
> db.students.find({ "school_id" : ObjectId("asdadssa") })
> db.schools.find({ "_id" : ObjectId("asdadssa") })
To find all of the schools Bart's family has children enrolled in:
> var schools = []
> db.students.find({ "family_id" : ObjectId("spamneggs") }, { "_id" : 0, "school_id" : 1 }).forEach(function(doc) {
schools.push(doc.school_id)
})
> db.schools.find({ "_id" : { "$in" : schools } })
Both of these are simple application-side joins and should work fine in your case because one family won't have zillions of children. Indexes on school_id and family_id will help.
For writes, only the student document needs to be updated with the proper school_id.
In a contact management app, each user will have his own database. When users wish to share certain categories of contacts with others, a backend will initiate a replication. Each contact is its own document, but also has various children documents such as notes and appointments.
Here is an example...
Contact:
{
"_id": 123,
"type": "contact",
"owner": "jimmy",
"category": "customer",
"name": "Bob Jones",
"email": "bob#example.com"
}
Note:
{
"_id": 456,
"type": "note",
"owner": "jimmy",
"contact_id": 123,
"timestamp": 1383919278,
"content": "This is a note about Bob Jones"
}
So let's say Jimmy wants to share his only his customers with sales manager Kevin, while his personal contacts remain private. When the note passes through the replication filter, is it possible to access the linked contact's category field?
Or do I have to duplicate the category field in every single child of a contact? I would prefer not to have to do this, as each contact may have many children which I would have to update manually every time the category changes.
Here is some pseudo-code for the filter function:
function(doc, req)
{
if(doc.type == “contact”) {
if(doc.category == req.query.category) {
return true;
}
}
else if(doc.contact_id) {
if(doc.contact.category == req.query.category) {
return true;
}
}
return false;
}
If this is possible, please describe how to do it. Thanks!
There are some other options.
There's a not-so-well-known JOIN trick in CouchDB. Instead of using replication, however, you'll have to share the results of a MapReduce View -- unfortunately you can use a view as a filter for replication. If you're using Cloudant (disclaimer: I'm employed by Cloudant) you can use chained-MapReduce to output the result to another database that you could then replication from...
Additionally, I think this SO post/answer on document structures and this join trick could be helpful: Modeling relationships on CouchDB between documents?
No, this is not possible. Each document must be consistent so it has no any explicit relations with others documents. Having contact_id value as reference is just an agreement from your side - CouchDB isn't aware about this.
You need to literally have category document be nested within contact one to do such trick e.g. have single document to process by filter function. This is good solution from point when you need to have consistent state of contact document.
I'm specifically talking about NodeJS with MongoDB (I know MongoDB is schema-less, but let's be realistic about the importance of structuring data for a moment).
Is there some magic solution to minimising the number of queries to a database in regards to authenticating users? For example, if the business logic of my application needs to ensure that a user has the necessary privileges to update/retrieve data from a certain Document or Collection, is there any way of doing this without two calls to the database? One to check the user has the rights, and the other to retrieve the necessary data?
EDIT:
Another question closed by the trigger-happy SO moderators. I agree the question is abstract, but I don't see how it is "not a real question". To put it simply:
What is the best way to reduce the number of calls to a database in role-based applications, specifically in the context of NodeJS + MongoDB? Can it be done? Or is role-based access control for NodeJS + MongoDB inefficient and clumsy?
Obviously, you know wich document holds which rigths. I would guess that it is a field in the document, like :
{ 'foo':'bar'
'canRead':'sales' }
At the start of the session you could query the roles a user has. Say
{ 'user':'shennan',
'roles':[ 'users','west coast','sales'] }
You could store that list of roles in the user's session. With that in hand, all that's left to do is add the roles with an $in operator, like this :
db.test.find({'canRead':{'$in':['users','west coast','sales']})
Where the value for the $in operator is taken from the user's session. Here is code to try it out on your own, in the mongo console :
db.test.insert( { 'foo':'bar', 'canRead':'sales' })
db.test.insert( { 'foo2':'bar2', 'canRead':['hr','sales'] })
db.test.insert( { 'foo3':'bar3', 'canRead':'hr' })
> db.test.find({}, {_id:0})
{ "foo" : "bar", "canRead" : "sales" }
{ "foo2" : "bar2", "canRead" : [ "hr", "sales" ] }
{ "foo3" : "bar3", "canRead" : "hr" }
Document with 'foo3' can't be read by someone in sales :
> db.test.find({'canRead':{'$in':['users','west coast','sales']}}, {_id:0})
{ "foo" : "bar", "canRead" : "sales" }
{ "foo2" : "bar2", "canRead" : [ "hr", "sales" ] }
Definitely do-able, but w/o more context it's hard to determine what's best.
One simple solution that comes to mind is to cache users and their permissions in memory so no DB lookup is required. At this point you can just issue the query for documents where permission match and...
Let me know if you need a few more ideas.
I just the beginner of couchdb so I may be misunderstand point of view so you can teach and discuss with me
Doc Type
- User
- Topic
- Comment
Requirement
- I want to webboard
- 1 Request to get this complex doc
Output I need KEY "topic-id" , VALUE {
_id : "topic-id", created_at:"2011-05-30 19:50:22", title:"Hello
World", user: {_id :
"user-1",type:"user",username:"dominixz",signature:"http://dominixz.com"}
comments: [ {_id:"comment-1", text:"Comment 1",created_at:"2011-05-30
19:50:22",user: {_id :
"user-1",type:"user",username:"dominixz",signature:"http://dominixz.com"}},
{_id:"comment-2", text:"Comment 2",created_at:"2011-05-30
19:50:23",user: {_id :
"user-2",type:"user",username:"dominixz2",signature:"http://dominixz1.com"}},
{_id:"comment-3", text:"Comment 3",created_at:"2011-05-30
19:50:24",user: {_id :
"user-3",type:"user",username:"dominixz3",signature:"http://dominixz2.com"}},
] }
I have "user" data like this
{_id:"user-1",type:"user",username:"dominixz",signature:"http://dominixz.com"}
{_id:"user-2",type:"user",username:"dominixz2",signature:"http://dominixz1.com"}
{_id:"user-3",type:"user",username:"dominixz3",signature:"http://dominixz2.com"}
"Topic" data like this {_id : "topic-id",created_at:"2011-05-30
19:50:22",title:"Hello World",user:"user-1"}
"Comment" data like this {_id:"comment-1",type:"comment" ,
text:"Comment 1", created_at:"2011-05-30 19:50:22" , user: "user-1" ,
topic:"topic-id"} {_id:"comment-2",type:"comment" , text:"Comment 2",
created_at:"2011-05-30 19:50:23" , user: "user-2" , topic:"topic-id"}
{_id:"comment-3",type:"comment" , text:"Comment 3",
created_at:"2011-05-30 19:50:24" , user: "user-3" , topic:"topic-id"}
How can I write map,reduce,list for achieve this complex data ? and how about when I wanna use LIMIT , OFFSET like in db
Thank in advance
It's a bit hard to tell what you're looking for here, but I think you're asking for a classic CouchDB join as documented in this web page.
I'd recommend reading the whole thing, but the punchline looks something like this (translated for your data):
function (doc) {
if (doc.type === 'topic') {
emit([doc._id, 0, doc.created_at], null);
} else if (doc.type === 'comment') {
emit([doc._id, 1, doc.created_at], null);
}
}
That map will return the topic ID followed by all of its comments in chronological order. The null prevents the index from getting too large, you can always add include_docs=true on your request to pull full docs when you need them, or you can use index best practices of including the bits that are interesting there.
CouchDB is a document database, not a relational database. As such it is best suited to deal with documents that encompass all the related data. While you can normalize your schema relational-style like you did, I'd argue that this isn't be best use case for Couch.
If I were to design your CMS in Couch I'd keep the topic, content and comments all in a single document. That would directly solve your problem.
You're free of course to use document stores to emulate relational databases, but that's not their natural use case, which leads to questions like this one.