In below example, I want to retrieve post document filled with corresponding comments.
Do I need to hold references to comments in post as an array? Holding array with reference to comments would mean that the post document would be updated quite regularly whenever a new document is updated.
var Post = new schema({
content:String,
author:{ type: Schema.ObjectId, ref: 'User' },
createdOn:Date
});
var Comment = new Schema({
post : { type: Schema.ObjectId, ref: 'Post' },
commentText : String,
author: { type: Schema.ObjectId, ref: 'User' },
createdOn:Date
});
mongoose.model('Post',Post);
mongoose.model('Comment',Comment);
If I hold references of Comment._Id in Post as an array, then in mongoose I can populate as below.
Post.findById(postid).populate('Comments').exec( callback);
But I do not want to update post document whenever a new a comment is created as you need to push the comment id into post document. Comment has reference to Post which it belongs, so technically speaking, it is possible to retrieve comments belonging to a particular post, but if you want to retrieve or send single json document is it possible without having array containing references to comments in post document?
As always with data modelling there are just multiple ways to do things with advantages/disadvantages. Often more than one way is good enough for your app.
The question you are asking is a typical 1:n relationship, so you may:
store backrefs from the children to the parent
store an array of refs on the parent to the children
store two-way referencing by doing 1. and 2.
store the comments directly inside the post object as an array of sub-documents (comment-objects) inside the post-document.
Each of these is correct and works. Which one is the best really depends on your usage - your app. Examples:
If you have lots of concurrent creates of comments for one post, then 1. is probably a good choice, because you don't need to concurrently update the post object. But you will always need two queries to display a Post with its comments.
If you are never displaying a comment without its post, and if one comment always belongs to exactly one post, 4. may be a good choice - probably the typical choice with mongodb (a no-sql choice). -> Only one query to display a post with comments.
If you have single posts with lots of comments and it is important for your performance to be able to load only a subset of comments, 4. is a bad choice.
is probably best if you need all flexibilty, whenever you cannot predict the usage and performance is not an issue.
further info here:
http://docs.mongodb.org/manual/core/data-model-design/
Related
I am working on a web application that uses a mongoDB database and express/nodeJS. I want to create a project in which I have users, and users can have posts, which can have many attributes, such as title, creator, and date. I am confused how to do this so that I avoid replication in my database. I tried references by using ids in a list of all the users posts like this idea: [postID1, postID2, postID3, etc...]. The problem is that I want to be able to use query back to all the users posts and display them in an ejs template, but I don't know how to do that. How would I use references? What should I do to make a this modeling system optimal for relationships?
Any help would be greatly appreciated!
Thank you!
This is a classic parent-child relationship, and your problem is that you're storing the relationship in the wrong record :-). The parent should never contain the reference to the children. Instead, each child should have a reference to the parent. Why not the other way around? It's a bit of a historical quirk: it's done that way because a classic relational table can't have multiple values for a single field, which means you can't store multiple child IDs easily in a relational table, whereas since each child will only ever have one parent, it's easy to setup a single field in the child. A Mongo document can have multiple values within a single field by using arrays, but unless you really have a good reason to do so, it's just better to follow the historical paradigm.
How does this apply in your situation? What you're trying to do is to store references to all the children (i.e. the post IDs) as a list in the parent (i.e. an array in the user document). This is not the usual way to do this. Instead, in each child (i.e. in each post), have a field called user_id, and store the userID there.
Next, make sure you create an index on the user_id field.
With that setup, it's easy to take a post and figure out who the user was (just look at the user_id field). And if you want to find all of a user's posts, just do posts.find({user_id: 'XXXX'}). If you have an index on that field, the find will execute quickly.
Storing parent references in the child is almost always better than storing child references in the parent. Even though Mongo is flexible enough to allow you to structure it either way, it's not preferred unless you have a real reason for it.
EDIT
If you do have a valid reason for storing the child references in the parent, then assuming a structure like this:
user = {posts: [postID1, postID2, postID3, ...]}
You can find the user for a specific post by user.find({posts: "XXXX"}). MongoDB is smart enough to know that you're searching for a user in which the post array contains element "XXX". And if you create an index on the posts field, then the query should be pretty quick.
I would like to mention that, there is nothing wrong in Parent containing Child references in NoSQL databases at least. It all depends on what suits your needs.
You have One-to-many relationship between users and post, and you can model your data in following 3 ways
Embedded Data Model
{
user: "username",
post: [
{
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24")
},
{
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24")
}
]
}
Parent containing child references
{
user: "username",
posts: [123456789, 234567890, ...]
}
{
_id: 123456789,
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24")
}
{
_id: 234567890,
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24")
}
Child containing parent reference
{
_id: "U123",
name: "username"
}
{
_id: 123456789,
title: "Title-1",
creator: "creator",
published_date: ISODate("2010-09-24"),
user: "U123"
}
{
_id: 23456789,
title: "Title-2",
creator: "creator",
published_date: ISODate("2010-09-24"),
user: "U123"
}
According to the MongoDB docs (I have edited the below paragraph according to your case)
When using references, the growth of the relationships determine where
to store the reference. If the number of posts per user is small
with limited growth, storing the post reference inside the user
document may sometimes be useful. Otherwise, if the number of posts
per user is unbounded, this data model would lead to mutable,
growing arrays.
Reference: https://docs.mongodb.com/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
Now you have to decide what is best for your project keeping in mind that your model should satisfy all the test cases
Peace,
I am using node.js, express and mongodb with mongoose to build out a ReSTful api. I have 2 questions regarding ReSTful best practice in conjunction with mongodb:
I have 2 collections: Chapters and Pages. The Chapters collection contains an embedded array of section documents. The (simplified) schema is as follows:
var sectionSchema = new Schema({
name: { type: String },
pages: { type: [Schema.Types.ObjectId], ref: 'Page' }
});
var chapterSchema = new Schema({
name: { type: String },
sections: [sectionSchema]
});
The contents of each section are irrelevant for now. My question is, would it make sense in my scenario to create a controller or group of routes for getting/updating an individual section element. My intuition is that I would be breaking some kind of ReSTful api best practice,
but perhaps not. Data access in my case would frequently need to get and update individual section elements from the sections array in the chapters collection.
Following on from my first question, as mentioned above, I have a 'pages' collection. If I build out routes for sections I want to make use of mongooses pseudo joins (or perhaps mongos $lookup) to return a single section element with the pages referenced its pages array. Would it make my api somehow less ReSTful considering I am accessing multiple collections. For reference the pages schema is as follows:
var pageSchema = new Schema({
chapterId: { type: Schema.Types.ObjectId },
name: { type: String },
text: { type: String }
});
Of I can accomplish all this with two controllers (Chapters and Pages) and no joins and simply fetch the referenced resources with multiple calls from the client side, but that requires more on the part of the api consumer, more requests over the wire and possibly more database hits.
In case you're wondering why I have a chapterId in each page document it's because a page belongs to a chapter and is unassigned before it is shoved somewhere into a section's array of pages. It's also a quick and easy way to get all the pages belonging to a chapter without having to query 2 collections. Feel free to comment on this aspect of my design too. Pages also require their own collection because 1. individual pages are accessed frequently and 2. pages could potentially get rather large and cumbersome.
Mostly I am using this project as an introduction into the node.js/mongodb world and web programming in general. Thanks in advance!
I'm just learning NoSQL, specifically MongoDB, and more specifically mongoose under Node; but this a somewhat agnostic design question.
What I'm seeing in various tutorials is a data design that has a two-way linkage between the child and parent, and the parent stores a collection of the children as an ObjectId array. Mongoose can then pull in the actual child objects with populate(). For example:
var PostSchema = new mongoose.Schema({
title: String,
comments: [{type: mongoose.Schema.Types.ObjectId, ref: 'Comment'}]
});
var CommentSchema = new mongoose.Schema({
comment: String,
post: {type: mongoose.Schema.Types.ObjectId, ref: 'Post'}
});
To me this seems to create the following problems:
1) Inserting a new comment now also requires an additional update to the Post record to add the comment id to the comments collection. Same is true for deleting a comment.
2) There is no referential integrity, the burden is on the application itself to ensure that no comments get orphaned and no posts contain invalid comment ids.
3) The populate() method is part of mongoose, not MongoDB, so if I need to access this data with something else, how do I get the child objects out?
I always (perhaps mis-)understood the benefit of NoSQL was that you could just store a whole object graph as one entity. So without looking at these tutorials, I would have naively just stored the "comments" as the full objects along with the post, and used a projection to avoid loading them when I didn't need them. Now having played with it, I don't understand why you wouldn't want to do it that way. I ask my fellow StackOverflowians for edification.
I'm using mongoose with Node.js for an application. I have a Document class, which has a Review subdocument. I also have a User class.
I want the user to be able to see all the reviews they've done, while I also want the Document to be able to easily get all of its reviews. Searching through all the documents and all their reviews to find ones matching a user seems horribly ineffecient. So, how do I allow the Review to be owned by both a Document and a User?
If this is impossible, how else can I efficiently have two documents know about one subdocument.
If you don't want to deal with consistency issues I don't think there's any way except for normalization to assign two parents for a document. Your issue is a common one for social networks, when developers have to deal with friends, followers, etc. Usually the best solution depends on what queries you are gonna run, what data is volatile and what is not and how many children a document might have. Usually it turns out to be a balance between embedding and referencing. Here's what I would do if I were you:
Let's assume Documents usually have 0-5 Reviews. Which is a few, so we might consider embed Reviews into Documents. Also we would often need to display reviews every time a Document is queried, this is one more reason for embedding. Now we need a way to query all reviews by a User efficiently. Assume we don't run this query as often as the first one but still it is important. Let's also assume that when we query for User's Reviews we just want to display Review titles as links to Review page or even Document page as probably it's hard to read a review without seeing the actual Document. So the best way here would be to store { document_id, review_id, reviewTitle }. ReviewTitle should not be volatile in this case. So now when you have a User object, you can easily query for reviews. Using document_id you will filter out most documents and it will be super fast. Then you can get required Reviews either on the client side or by using MapReduce to turn Reviews into separate list of documents.
This example contains many assumption so it might not be exactly what you need by my goal was to show you the most important things to consider while designing your collections and the logic you should follow. So just to sum up, consider QUERIES, HOW VOLATILE SOME DATA IS and HOW MANY CHILDREN A DOCUMENT IS GONNA HAVE, and find a balance between embeding and referencing
Hope it helps!
This is an old question, but here is a solution that, I think, works well:
var mongoose = require('mongoose'),
Schema = mongoose.Schema;
var DocumentSchema = new Schema({
title: String,
...
});
mongoose.model('Document', DocumentSchema);
var UserSchema = new Schema({
name: String,
...
});
mongoose.model('User', UserSchema);
var ReviewSchema = new Schema({
created: {
type: Date,
default: Date.now
},
document: {
type: Schema.ObjectId,
ref: 'Document'
},
reviewer: {
type: Schema.ObjectId,
ref: 'User'
},
title: String,
body: String
});
Then, you could efficiently query the Reviews to get all reviews for a User, or for a Document.
Hope this helps someone.
I planned to use MongoDB NoSQL database for a video game, but I'm wondering about some things that I don't understand really clearly and I didn't find answer about them so far.
I understood that it was possible to store a document instance (a car for example) into another document instance (a user), but how does that works? Because if it's a copy by value, if I update my car, the user will have a car that is not up-to-date! So I guess it's a copy by reference. Or maybe it's not a copy but directely some kind of weird link such as we used to do with SGBD databases with the ID field.
But, another thing, if I update my schemas (and I will for sure), the new fields or the OLD fields that previously existed won't be updated in the existing data... It looks like it's a know problem and there is some solution, do you have any good links that explain how deal with that? I'm just thinking here, my DB is not wrote and I want to make the best choices about the design. I never used NoSQL stuff before and I'm trying to design it but I still have a lot of misunderstood and "bad" pratice from SGBD DB.
By the way, MongoDb is a security hole (no password by default, etc.), do you have links to protect a database with mongoDb? Thanks.
I am just learning Mongo myself, but I hope I can provide some help. Note the concept of Mongo being a schema-less database, which means that one User may have a car, another have no car, and others have a different car. So, if you want to update the car definition, you need to modify existing user documents accordingly. There is no central definition for the car - i.e. no relationship to a central car table like in an RDBMS.
You can add some structure to Mongo using Mongoose Schemas. This allows some flexibility for changes of schema, for example you can add new properties and apply a default value, meaning you don't need to update existing documents. i.e.:
BEFORE
var Book = new Mongoose.Schema({
title: {type: String}
});
AFTER
var Book = new Mongoose.Schema({
title: String
category: {type: String, default: 'fiction'}
});