Mongoose - "object" in "object" [duplicate] - node.js

I want to design a question structure with some comments. Which relationship should I use for comments: embed or reference?
A question with some comments, like stackoverflow, would have a structure like this:
Question
title = 'aaa'
content = 'bbb'
comments = ???
At first, I thought of using embedded comments (I think embed is recommended in MongoDB), like this:
Question
title = 'aaa'
content = 'bbb'
comments = [ { content = 'xxx', createdAt = 'yyy'},
{ content = 'xxx', createdAt = 'yyy'},
{ content = 'xxx', createdAt = 'yyy'} ]
It is clear, but I'm worried about this case: If I want to edit a specified comment, how do I get its content and its question? There is no _id to let me find one, nor question_ref to let me find its question. (Is there perhaps a way to do this without _id and question_ref?)
Do I have to use ref rather than embed? Do I then have to create a new collection for comments?

This is more an art than a science. The Mongo Documentation on Schemas is a good reference, but here are some things to consider:
Put as much in as possible
The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure (this means that you can take the part of the document that you need, so document size shouldn't worry you much) there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document.
Separate data that can be referred to from multiple places into its own collection.
This is not so much a "storage space" issue as it is a "data consistency" issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places.
Document size considerations
MongoDB imposes a 4MB (16MB with 1.8) size limit on a single document. In a world of GB of data this sounds small, but it is also 30 thousand tweets or 250 typical Stack Overflow answers or 20 flicker photos. On the other hand, this is far more information than one might want to present at one time on a typical web page. First consider what will make your queries easier. In many cases concern about document sizes will be premature optimization.
Complex data structures:
MongoDB can store arbitrary deep nested data structures, but cannot search them efficiently. If your data forms a tree, forest or graph, you effectively need to store each node and its edges in a separate document. (Note that there are data stores specifically designed for this type of data that one should consider as well)
It has also been pointed out than it is impossible to return a subset of elements in a document. If you need to pick-and-choose a few bits of each document, it will be easier to separate them out.
Data Consistency
MongoDB makes a trade off between efficiency and consistency. The rule is changes to a single document are always atomic, while updates to multiple documents should never be assumed to be atomic. There is also no way to "lock" a record on the server (you can build this into the client's logic using for example a "lock" field). When you design your schema consider how you will keep your data consistent. Generally, the more that you keep in a document the better.
For what you are describing, I would embed the comments, and give each comment an id field with an ObjectID. The ObjectID has a time stamp embedded in it so you can use that instead of created at if you like.

In general, embed is good if you have one-to-one or one-to-many relationships between entities, and reference is good if you have many-to-many relationships.

Well, I'm a bit late but still would like to share my way of schema creation.
I have schemas for everything that can be described by a word, like you would do it in the classical OOP.
E.G.
Comment
Account
User
Blogpost
...
Every schema can be saved as a Document or Subdocument, so I declare this for each schema.
Document:
Can be used as a reference. (E.g. the user made a comment -> comment has a "made by" reference to user)
Is a "Root" in you application. (E.g. the blogpost -> there is a page about the blogpost)
Subdocument:
Can only be used once / is never a reference. (E.g. Comment is saved in the blogpost)
Is never a "Root" in you application. (The comment just shows up in the blogpost page but the page is still about the blogpost)

I came across this small presentation while researching this question on my own. I was surprised at how well it was laid out, both the info and the presentation of it.
http://openmymind.net/Multiple-Collections-Versus-Embedded-Documents
It summarized:
As a general rule, if you have a lot of [child documents] or if they are large, a separate collection might be best.
Smaller and/or fewer documents tend to be a natural fit for embedding.

Actually, I'm quite curious why nobody spoke about the UML specifications. A rule of thumb is that if you have an aggregation, then you should use references. But if it is a composition, then the coupling is stronger, and you should use embedded documents.
And you will quickly understand why it is logical. If an object can exist independently of the parent, then you will want to access it even if the parent doesn't exist. As you just can't embed it in a non-existing parent, you have to make it live in it's own data structure. And if a parent exist, just link them together by adding a ref of the object in the parent.
Don't really know what is the difference between the two relationships ?
Here is a link explaining them:
Aggregation vs Composition in UML

If I want to edit a specified comment, how to get its content and its question?
You can query by sub-document: db.question.find({'comments.content' : 'xxx'}).
This will return the whole Question document. To edit the specified comment, you then have to find the comment on the client, make the edit and save that back to the DB.
In general, if your document contains an array of objects, you'll find that those sub-objects will need to be modified client side.

Yes, we can use the reference in the document. To populate another document just like SQL i joins. In MongoDB, they don't have joins to map one to many relationship documents. Instead that we can use populate to fulfil our scenario.
var mongoose = require('mongoose')
, Schema = mongoose.Schema
var personSchema = Schema({
_id : Number,
name : String,
age : Number,
stories : [{ type: Schema.Types.ObjectId, ref: 'Story' }]
});
var storySchema = Schema({
_creator : { type: Number, ref: 'Person' },
title : String,
fans : [{ type: Number, ref: 'Person' }]
});
The population is the process of automatically replacing the specified paths in the document with the document(s) from other collection(s). We may populate a single document, multiple documents, plain objects, multiple plain objects, or all objects returned from a query. Let's look at some examples.
Better you can get more information please visit: http://mongoosejs.com/docs/populate.html

I know this is quite old but if you are looking for the answer to the OP's question on how to return only specified comment, you can use the $ (query) operator like this:
db.question.update({'comments.content': 'xxx'}, {'comments.$': true})

MongoDB gives freedom to be schema-less and this feature can result in pain in the long term if not thought or planned well,
There are 2 options either Embed or Reference. I will not go through definitions as the above answers have well defined them.
When embedding you should answer one question is your embedded document going to grow, if yes then how much (remember there is a limit of 16 MB per document) So if you have something like a comment on a post, what is the limit of comment count, if that post goes viral and people start adding comments. In such cases, reference could be a better option (but even reference can grow and reach 16 MB limit).
So how to balance it, the answer is a combination of different patterns, check these links, and create your own mix and match based on your use case.
https://www.mongodb.com/blog/post/building-with-patterns-a-summary
https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

If I want to edit a specified comment, how do I get its content and
its question?
If you had kept track of the number of comments and the index of the comment you wanted to alter, you could use the dot operator (SO example).
You could do f.ex.
db.questions.update(
{
"title": "aaa"
},
{
"comments.0.contents": "new text"
}
)
(as another way to edit the comments inside the question)

Related

How should i structure my MongoDb database? A lot of small documents or embedded fewer documents?

I am new to python and very new to MongoDb. I made an application to store trivia questions, currently in a json. This is how the overall structure looks like:
This is an example of single answer questions, in art category:
And this is an example of multiple choice questions in art category :
As you can see, in both cases i use the question itself as key and it's answer as value. So, to get the answer to a question i would just do: answers = dictionary["multiple"]["art"]["What is a sitar?"] and i would get :
["Instrument",
"Food",
"Insect",
"Temple"]
My application runs as you would expect. When i get a new question, i know it's subject (art, biology, etc). If question doesn't exist i just add it in the right category.
I want to move all my saved questions and answers in a MongoDb database. But if i add the whole json as a single document in a collection, whenever i do a query to look for a question : answer pair, the whole document is returned, since is the only one.
If i try to make 2 documents("single", "multiple"), it will still return the whole "single" document. If i go even lower and do just "art", "biology", etc. documents i will have duplicates since i have "art" for both singleAnswer and multipleChoice. Should i just name the documents "single.art", "multiple.art". If so, what would a query for the below condition look like?
`if not "What is a sitar?" in dictionary["multiple"]["art"]:
dictionary["multiple"]["art"]["What is a sitar?"] :
["Instrument","Food","Insect","Temple"]
`
I have done all of this scenarios mentioned above except the last one, and i found that every time i query, it returns the whole object when all i need is a single question and it's answer(if it exists). Am i missing something or maybe i expect this to still work as a json(dictionary)? Thank you!
Edit:
Found this in MongoDb documentation. Would my scenario qualify as a hierarchical relationship? Meaning that every question belongs in a certain subject and every subject in it's own category (single, multiple)
Documents can be nested to express hierarchical relationships and to store structures such as arrays.
I don't think your design is very smart. Dynamic field names are usually difficult to handle, the queries are complex and it is very hard to index them.
I would propose data model like this:
{
type: "multiple",
categroy: "art",
questions: [
{
question: "What is a sitar?",
choice: ["Instrument","Food","Insect","Temple","Village"],
answers: ["Instrument","Village"]
},
{
question: ...,
choice: ...
}
]
}
Or even one document for each question.

Which mongoose model would be more efficient?

I am new to no-sql. I am trying to build a simple e-commerce app in nodejs. Now for the product i need to build CRUD operations so that only owner can edit them, rest have READ-ONLY access. The main question is which would be a better implementation ?
The current code that i have is like
var mongoose = require('mongoose');
module.exports = mongoose.model('product',new mongoose.Schema({
owner : {type: String},
title : {type: String},
...
}));
The owner is actually the _id from my user model. Basically this is something like a foreign key. Is this the valid method to go around or should i add an array inside the user model to store the list of objects that he owns ?
Also i would like to have validation if what i just did for owner, storing UID in String is best practice or should i do something else to reference the user model.
Thanks in advance for helping me out.
The whole point of document databases is you shouldn't have foreign relationships; All the data your document needs should be denormalized in the document.
So inside your product document, you should be duplicating all the owner details you need. You can store their _id as well for lookup, but don't use a string for this, use an actual ObjectId().
For more about denormalization see The Little MongoDB Book
Yet another alternative to using joins is to denormalize your data. Historically, denormalization was reserved for performance-sensitive code, or when data should be snapshotted (like in an audit log). However, with the ever- growing popularity of NoSQL, many of which don’t have joins, denormalization as part of normal modeling is becoming increasingly common. This doesn’t mean you should duplicate every piece of information in every document. However, rather than letting fear of duplicate data drive your design decisions, consider modeling your data based on what information belongs to what document.

MongoDB query comments along with user information

I am creating an application with nodejs and mongod(Not mongoose). I have a problem that gave me headache over few days, anyone please suggest a way for this!!.
I have a mongodb design like this
post{
_id:ObjectId(...),
picture: 'some_url',
comments:[
{_id:ObjectId(...),
user_id:Object('123456'),
body:"some content"
},
{_id:ObjectId(...),
user_id:Object('...'),
body:"other content"
}
]
}
user{
_id:ObjectId('123456'),
name: 'some name', --> changable at any times
username: 'some_name', --> changable at any times
picture: 'url_link' --> changable at any times
}
I want to query the post along with all the user information so the query will look like this:
[{
_id:ObjectId(...),
picture: 'some_url',
comments:[
{_id:ObjectId(...),
user_id:Object('123456'),
user_data:{
_id:ObjectId('123456'),
name: 'some name',
username: 'some_name',
picture: 'url_link'
}
body:"some content"
},
{_id:ObjectId(...),
user_id:Object('...'),
body:"other content"
}
]
}]
I tried to use loop to manually get the user data and add to comment but it proves to be difficult and not achievable by my coding skill :(
Please anybody got any suggestion, I would be really appreciated.
P/s I am trying another approach that I would embedded all the user data in to the comment and whenever the user update their username, name or picture. They will update it in all the comment as well
The problem(s)
As written before, there are several problems when over-embedding:
Problem 1: BSON size limit
As of the time of this writing, BSON documents are limited to 16MB. If that limit is reached, MongoDB would throw an exception and you simply could not add more comments and in worst case scenarios not even change the (user-)name or the picture if the change would increase the size of the document.
Problem 2: Query limitations and performance
It is not easily possible to query or sort the comments array under certain conditions. Some things would require a rather costly aggregation, others rather complicated statements.
While one could argue that once the queries are in place, this isn't much of a problem, I beg to differ. First, the more complicated a query is, the harder it is to optimize, both for the developer and subsequently MongoDBs query optimizer. I have had the best results with simplyfying data models and queries, speeding up responses by a factor of 100 in one instance.
When scaling, the ressources needed for complicated and/or costly queries might even sum up to whole machines when compared to a simpler data model and according queries.
Problem 3: Maintainability
Last but not least you might well run into problems maintaining your code. As a simple rule of thumb
The more complicated your code becomes, the harder it is to maintain. The harder code is to maintain, the more time it needs to maintain the code. The more time it needs to maintain code, the more expensive it gets.
Conclusion: Complicated code is expensive.
In this context, "expensive" both refers to money (for professional projects) and time (for hobby projects).
(My!) Solution
It is pretty easy: simplify your data model. Consequently, your queries will become less complicated and (hopefully) faster.
Step 1: Identify your use cases
That's going to be a wild guess for me, but the important thing here is to show you the general method. I'd define your use cases as follows:
For a given post, users should be able to comment
For a given post, show the author and the comments, along with the commenters and authors username and their picture
For a given user, it should be easily possible to change the name, username and picture
Step 2: Model your data accordingly
Users
First of all, we have a straightforward user model
{
_id: new ObjectId(),
name: "Joe Average",
username: "HotGrrrl96",
picture: "some_link"
}
Nothing new here, added just for completeness.
Posts
{
_id: new ObjectId()
title: "A post",
content: " Interesting stuff",
picture: "some_link",
created: new ISODate(),
author: {
username: "HotGrrrl96",
picture: "some_link"
}
}
And that's about it for a post. There are two things to note here: first, we store the author data we immediately need when displaying a post, since this saves us a query for a very common, if not ubiquitous use case. Why don't we save the comments and commenters data acordingly? Because of the 16 MB size limit, we are trying to prevent the storage of references in a single document. Rather, we store the references in comment documents:
Comments
{
_id: new ObjectId(),
post: someObjectId,
created: new ISODate(),
commenter: {
username: "FooBar",
picture: "some_link"
},
comment: "Awesome!"
}
The same as with posts, we have all the necessary data for displaying a post.
The queries
What we have achieved now is that we circumvented the BSON size limit and we don't need to refer to the user data in order to be able to display posts and comments, which should save us a lot of queries. But let's come back to the use cases and some more queries
Adding a comment
That's totally straightforward now.
Getting all or some comments for a given post
For all comments
db.comments.find({post:objectIdOfPost})
For the 3 lastest comments
db.comments.find({post:objectIdOfPost}).sort({created:-1}).limit(3)
So for displaying a post and all (or some) of its comments including the usernames and pictures we are at two queries. More than you needed before, but we circumvented the size limit and basically you can have an indefinite number of comments for every post. But let's get to something real
Getting the latest 5 posts and their latest 3 comments
This is a two step process. However, with proper indexing (will come back to that later) this still should be fast (and hence resource saving):
var posts = db.posts.find().sort({created:-1}).limit(5)
posts.forEach(
function(post) {
doSomethingWith(post);
var comments = db.comments.find({"post":post._id}).sort("created":-1).limit(3);
doSomethingElseWith(comments);
}
)
Get all posts of a given user sorted from newest to oldest and their comments
var posts = db.posts.find({"author.username": "HotGrrrl96"},{_id:1}).sort({"created":-1});
var postIds = [];
posts.forEach(
function(post){
postIds.push(post._id);
}
)
var comments = db.comments.find({post: {$in: postIds}}).sort({post:1, created:-1});
Note that we have only two queries here. Although you need to "manually" make the connection between posts and their respective comments, that should be pretty straightforward.
Change a username
This presumably is a rare use case executed. However, it isn't very complicated with said data model
First, we change the user document
db.users.update(
{ username: "HotGrrrl96"},
{
$set: { username: "Joe Cool"},
$push: {oldUsernames: "HotGrrrl96" }
},
{
writeConcern: {w: "majority"}
}
);
We push the old username to an according array. This is a security measure in case something goes wrong with the following operations. Furthermore, we set the write concern to a rather high level in order to make sure the data is durable.
db.posts.update(
{ "author.username": "HotGrrrl96"},
{ $set:{ "author.username": "Joe Cool"} },
{
multi:true,
writeConcern: {w:"majority"}
}
)
Nothing special here. The update statement for the comments looks pretty much the same. While those queries take some time, they are rarely executed.
The indices
As a rule of thumb, one can say that MongoDB can only use one index per query. While this is not entirely true since there are index intersections, it is easy to deal with. Another thing is that individual fields in a compound index can be used independently. So an easy approach to index optimization is to find the query with the most fields used in operations which make use of indices and create a compound index of them. Note that the order of occurrence in the query matters. So, let's go ahead.
Posts
db.posts.createIndex({"author.username":1,"created":-1})
Comments
db.comments.createIndex({"post":1, "created":-1})
Conclusion
A fully embedded document per post admittedly is the the fastest way of loading it and it's comments. However, it does not scale well and due to the nature of possibly complex queries necessary to deal with it, this performance advantage may be leveraged or even eliminated.
With the above solution, you trade some speed (if!) against basically unlimited scalability and a much more straightforward way of dealing with the data.
Hth.
You are following Normalized data model approach. if you are following this model means, you have to write another query to get the user info or If you uses the embedded document store then all the user doc must change whenever updates on user doc.
http://docs.mongodb.org/v3.0/reference/database-references/ read this link for more information.

Mongo DB relations between documents in different collections

I'm not yet ready to let this go, which is why I re-thought the problem and edited the Q (original below).
I am using mongoDB for a weekend project and it requires some relations in the DB, which is what the misery is all about:
I have three collections:
Users
Lists
Texts
A user can have texts and lists - lists 'contain' texts. Texts can be in multiple lists.
I decided to go with separate collections (not embeds) because child documents don't always appear in context of their parent (eg. all texts, without being in a list).
So what needs to be done is reference the texts that belong into certain lists with exactly those lists. There can be unlimited lists and texts, though lists will be less in comparison.
In contrast to what I first thought of, I could also put the reference in every single text-document and not all text-ids in the list-documents. It would actually make a difference, because I could get away with one query to find every snippet in a list. Could even index that reference.
var TextSchema = new Schema({
_id: Number,
name: String,
inListID: { type : Array , "default" : [] },
[...]
It is also rather seldom the case that texts will be in MANY lists, so the array would not really explode. The question kind of remains though, is there a chance this scales or actually a better way of implementing it with mongoDB? Would it help to limit the amount of lists a text can be in (probably)? Is there a recipe for few:many relations?
It would even be awesome to get references to projects where this has been done and how it was implemented (few:many relations). I can't believe everybody shies away from mongo DB as soon as some relations are needed.
Original Question
I'll break it down in two problems I see so far:
1) Let's assume a list consists of 5 texts. How do I reference the texts contained in a list? Just open an array and store the text's _ids in there? Seems like those arrays might grow to the moon and back, slowing the app down? On the other hand texts need to be available without a list, so embedding is not really an option. What if I want to get all texts of a list that contains 100 texts.. sounds like two queries and an array with 100 fields :-/. So is this way of referencing the proper way to do it?
var ListSchema = new Schema({
_id: Number,
name: String,
textids: { type : Array , "default" : [] },
[...]
Problem 2) I see with this approach is cleaning the references if a text is deleted. Its reference will still be in every list that contained the text and I wouldn't want to iterate through all the lists to clean out those dead references. Or would I? Is there a smart way to solve this? Just making the texts hold the reference (in which list they are) just moves the problem around, so that's not an option.
I guess I'm not the first with this sort of problem but I was also unable to find a definitive answer on how to do it 'right'.
I'm also interested in general thoughts on best-practice for this sort of referencing (many-to-many?) and especially scalability/performance.
Relations are usually not a big problem, though certain operations involving relations might be. That depends largely on the problem you're trying to solve, and very strongly on the cardinality of the result set and the selectivity of the keys.
I have written a simple testbed that generates data following a typical long-tail distribution to play with. It turns out that MongoDB is usually better at relations than people believe.
After all, there are only three differences to relational databases:
Foreign key constraints: You have to manage these yourself, so there's some risk for dead links
Transaction isolation: Since there are no multi-document transactions, there's some likelihood for creating invalid foreign key constraints even if the code is correct (in the sense that it never tries to create a dead link), but merely interrupted at runtime. Also, it is hard to check for dead links because you could be observing a race condition
Joins: MongoDB doesn't support joins, though a manual subquery with $in does scale well up to several thousand items in the $in-clause, provided the reference values are indexed, of course
Iff you need to perform large joins, i.e. if your queries are truly relational and you need large amount of the data joined accordingly, MongoDB is probably not a good fit. However, many joins required in relational databases aren't truly relational, they are required because you had to split up your object to multiple tables, for instance because it contains a list.
An example of a 'truly' relational query could be "Find me all customers who bought products that got >4 star reviews by customers that ranked high in turnover in June". Unless you have a very specialized schema that essentially was built to support this query, you'll most likely need to find all the orders, group them by customer ids, take the top n results, use these to query ratings using $in and use another $in to find the actual customers. Still, if you can limit yourself to the top, say 10k customers of June, this is three round-trips and some fast $in queries.
That will probably be in the range of 10-30ms on typical cloud hardware as long as your queries are supported by indexes in RAM and the network isn't completely congested. In this example, things get messy if the data is too sparse, i.e. the top 10k users hardly wrote >4 star reviews, which would force you to write program logic that is smart enough to keep iterating the first step which is both complicated and slow, but if that is such an important scenario, there is probably a better suited data structure anyway.
Using MongoDB with references is a gateway to performance issues. Perfect example of what not to use. This is a m:n kind of relation where m and n can scale to millions. MongoDB works well where we have 1:n(few), 1:n(many), m(few):n(many). But not in situations where you have m(many):n(many). It will obviously result in 2 queries and lot of housekeeping.
I am not sure that is this question still actual, but i have similar experience.
First of all i want to say what tells official mongo documentation:
Use embedded data models when: you have one-to-one or one-to-many model.
For model many-to-many use relationships with document references.
I think is the answer) but this answer provide a lot of problems because:
As were mentioned, mongo don't provide transactions at all.
And you don't have foreign key constraints.
Even if you have references (DBRefs) between documents, you will be faced with amazing problem how to dereference this documents.
Each this item - is huge piece of responsibility, even if you work at weekend project. And it might mean that you should be write many code to provide simple behaviour of your system (for example you can see how realize transaction in mongo here).
I have no idea how done foreign key constraints, and i don't saw something in this direction in mongo documentation, that's why i think that it amazing challenge (and risk for project).
And the last, mongo references - it isn't mysql join, and you dont receive all data from parent collection with data from child collection (like all fields from table and all fields from joined table in mysql), you will receive just REFERENCE to another document in another collection, and you will need to do something with this reference (dereference).
It can be easily reached in node by callback, but only in case when you need just one text from one list, but if you need all texts in one list - it's terrible, but if you need all texts in more than one list - it's become nightmare...
Perhaps it's my not the best experience... but i think you should think about it...
Using array in MongoDB is generally not preferable, and generally not advised by experts.
Here is a solution that came to my mind :
Each document of Users is always unique. There can be Lists and Texts for individual document in Users. So therefore, Lists and Texts have a Field for USER ID, which will be the _id of Users.
Lists always have an owner in Users so they are stored as they are.
Owner of Texts can be either Users or List, so you should keep a Field of LIST ID also in it, which will be _id of Lists.
Now mind that Texts cannot have both USER ID and LIST ID, so you will have to keep a condition that there should be only ONE out of both, the other should be null so that we can easily know who is the primary owner of the Texts.
Writing an answer as I want to explain how I will proceed from here.
Taking into consideration the answers here and my own research on the topic, it might actually be fine storing those references (not really relations) in an array, trying to keep it relativley small: less than 1000 fields is very likely in my case.
Especially because I can get away with one query (which I first though I couldn't) that doen't even require using $in so far, I'm confident that the approach will scale. After all it's 'just a weekend-project', so if it doesn't and I end up re-writing - that's fine.
With a text-schema like this:
var textSchema = new Schema({
_id: {type: Number, required: true, index: { unique: true }},
...
inList: { type : [Number] , "default" : [], index: true }
});
I can simply get all texts in a list with this query, where inList is an indexed array containing the _ids of the texts in the list.
Text.find({inList: listID}, function(err, text) {
...
});
I will still have to deal with foreign key constraints and write my own "clean-up" functions that take care of removing references if a list is removed - remove reference in every text that was in the list.
Luckily this will happen very rarely, so I'm okay with going through every text once in a while.
On the other hand I don't have to care about deleting references in a list-document if a text is removed, because I only store the reference on one side of the relation (in the text-document). Quite an important point in my opinion!
#mnemosyn: thanks for the link and pointing out that this is indeed not a large join or in other words: just a very simple relation. Also some numbers on how long those complex operations take (ofc. hardware dependet) is a big help.
PS: Grüße aus Bielefeld.
What I found most helpful during my own research was this vid, where Alvin Richards also talks about many-to-many relations at around min. 17. This is where I got the idea of making the relation one-sided to save myself some work cleaning up the dead references.
Thanks for the help guys
👍

What is the best practice for mongoDB to handle 1-n n-n relationships?

In relational database, 1-n n-n relationships mean 2 or more tables.
But in mongoDB, since it is possible to directly store those things into one model like this:
Article{
content: String,
uid: String,
comments:[Comment]
}
I am getting confused about how to manage those relations. For example, in article-comments model, should I directly store all the comments into the article model and then read out the entire article object into JSON every time? But what if the comments grow really large? Like if there is 1,000 comments in an article object, will such strategy make the GET process very slow every time?
I am by no means an expert on this, however I've worked through similar situations before.
From the few demos I've seen yes you should store all the comments directly in line. This is going to give you the best performance (unless you're expecting some ridiculous amount of comments). This way you have everything in your document.
In the future if things start going great and you do notice things going slower you could do a few things. You Could look to store the latest (insert arbitrary number) of comments with a reference to where the other comments are stored, then map-reduce old comments out into a "bucket" to keep loading times quick.
However initially I'd store it in one document.
So would have a model that looked maybe something like this:
Article{
content: String,
uid: String,
comments:[
{"comment":"hi", "user":"jack"},
{"comment":"hi", "user":"jack"},
]
"oldCommentsIdentifier":12345
}
Then only have oldCommentsIdentifier populated if you did move comments out of your comment string, however I really wouldn't do this for less then 1000 comments and maybe even more. Would take a bit of testing here to see what the "sweet" spot would be.
I think a large part of the answer depends on how many comments you are expecting. Having a document that contains an array that could grow to an arbitrarily large size is a bad idea, for a couple reasons. First, the $push operator tends to be slow because it often increases the size of the document, forcing it to be moved. Second, there is a maximum BSON size of 16MB, so eventually you will not be able to grow the array any more.
If you expect each article to have a large number of comments, you could create a separate "comments" collection, where each document has an "article_id" field that contains the _id of the article that it is tied to (or the uid, or some other field unique to the article). This would make retrieving all comments for a specific article easy, by querying the "comments" collection for any documents whose "article_id" field matches the article's _id. Indexing this field would make the query very fast.
The link that limelights posted as a comment on your question is also a great reference for general tips about schema design.
But if solve this problem by linking article and comments with _id, won't it kinda go back to the relational database design? And somehow lose the essence of being NoSQL?
Not really, NoSQL isn't all about embedding models. Infact embedding should be considered carefully for your scenario.
It is true that the aggregation framework solves quite a few of the problems you can get from embedding objects that you need to use as documents themselves. I define subdocuments that need to be used as documents as:
Documents that need to be paged in the interface
Documents that might exist across multiple root documents
Document that require advanced sorting within their group
Documents that when in a group will exceed the root documents 16meg limit
As I said the aggregation framework does solve this a little however your still looking at performing a query that, in realtime or close to, would be much like performing the same in SQL on the same number of documents.
This effect is not always desirable.
You can achieve paging (sort of) of suboducments with normal querying using the $slice operator, but then this can house pretty much the same problems as using skip() and limit() over large result sets, which again is undesirable since you cannot fix it so easily with a range query (aggregation framework would be required again). Even with 1000 subdocuments I have seen speed problems with not just me but other people too.
So let's get back to the original question: how to manage the schema.
Now the answer, which your not going to like, is: it all depends.
Do your comments satisfy the needs that they should separate? Is so then that probably is a good bet.
There is no best way to this. In MongoDB you should be designing your collections according to application that is going to use it.
If your application needs to display comments with article, then I can say it is better to embed these comments in article collection. Otherwise, you will end up with several round trips to your database.
There is one scenario where embedding does not work. As far as I know, document size is limited to 16 MB in MongoDB. This is quite large actually. However, If you think your document size can exceed this limit it is better to have separate collection.

Resources