I've got a complex Mongoose population issue that I'm trying to sort out, and wondered if someone could shed some light (yeah yeah, I know, could use a RDBMS, but most the other bits of the schema lend themselves nicely to Mongo).
I've got two models: a Study and a Participant.
Study:
var StudySchema = new mongoose.Schema({
name: String,
checklist: [
{
order: Number,
text: String
}
]
});
Participant:
var ParticipantSchema = new mongoose.Schema({
name: String,
checklist_items: [
{
isComplete: Boolean,
item: {
type: Schema.Types.ObjectId
}
}
]
});
When a participant is created (they're always part of a study), the checklist is copied over onto the participant, so we can keep track of that checklist on the individual participant. I'm simply pushing IDs into the Participant.checklist_items.item to link those back to the items on the Study. (These are referenced, not wholesale copied, so that text changes to the study checklist are propagated down naturally)
I want to populate this model when retrieving a participant. When I get them, I want item on checklist_items to be populated with the corresponding item from the study. Hope that makes sense.
I've tried things like:
Participant.findById(req.params.id)
.populate({path: 'checklist_items.item', populate: {model: 'Study', path: 'checklist'})
.exec()
But no dice. I've monkeyed around with this for awhile, and I'm not sure I'm grokking how to do this child-to-child type population.
Any ideas? Is this possible?
Edit: clarified title with correct terms
It appears this isn't possible with Mongoose, and represents a bit of antipattern. Leaving a reference to this issue for folks with this question in the future: https://github.com/Automattic/mongoose/issues/2772
You can simply try this:
Participant.findById(req.params.id)
.populate({path: 'checklist_items.item', model: 'Study'})
.exec()
This will fetch you requiredparticipant and populate all the item inside checklist_items.
See if it works for you.
Related
Let's assume I have mongoose models for books and pages like this:
mongoose.model("Book", new Schema({
title: String
});
and this
mongoose.model("Page", new Schema({
pageNumber: Number,
_bookId: {type: ObjectId, ref: "Book"}
});
Every page keeps track which book it belongs to. Now I want to have an array of books that have a page with pageNumber 500.
I could do the following:
Page.find({pageNumber: 500})
.populate("_bookId")
.then(function (pages) {
var books = [];
pages.forEach(function (page) {
books.push(page._bookId); // page._bookId now contains a Book document
});
return q(books);
}).then(function (books) {
// Do something with the books
});
Yet, the part where I loop over the pages seems cumbersome and that kind of extraction could probably be done by mongo. My question is how that would work.
Is using populate even the best way to go here?
I would like to keep the schemas the way they are though.
I think your schema design is the issue here. Why is pages a separate schema? You should use Mongo's embedding capabilities to make Page an array in Book:
mongoose.model("Book", new Schema({
title: String,
pages: [...]
});
Then you can search for books that have a page #N.
Additionally, if your pages is nothing more than a page number and an associated book you can just make pages a number representing the total number of pages.
Edit: If such a schema is just a simplification of your use case and really you cant do embedding, then you're out of luck. The abstraction your looking for is called a join, and Mongo doesn't support that because its not what Mongo is going for. If thats really a primary use case of yours you should look to using a relational database (or change your schema).
Here is my Mongoose Schema:
var SchemaA = new Schema({
field1: String,
.......
fieldB : { type: Schema.Types.ObjectId, ref: 'SchemaB' }
});
var SchemaB = new Schema({
field1: String,
.......
fieldC : { type: Schema.Types.ObjectId, ref: 'SchemaC' }
});
var SchemaC = new Schema({
field1: String,
.......
.......
.......
});
While i access schemaA using find query, i want to have fields/property
of SchemaA along with SchemaB and SchemaC in the same way as we apply join operation in SQL database.
This is my approach:
SchemaA.find({})
.populate('fieldB')
.exec(function (err, result){
SchemaB.populate(result.fieldC,{path:'fieldB'},function(err, result){
.............................
});
});
The above code is working perfectly, but the problem is:
I want to have information/properties/fields of SchemaC through SchemaA, and i don't want to populate fields/properties of SchemaB.
The reason for not wanting to get the properties of SchemaB is, extra population will slows the query unnecessary.
Long story short:
I want to populate SchemaC through SchemaA without populating SchemaB.
Can you please suggest any way/approach?
As an avid mongodb fan, I suggest you use a relational database for highly relational data - that's what it's built for. You are losing all the benefits of mongodb when you have to perform 3+ queries to get a single object.
Buuuuuut, I know that comment will fall on deaf ears. Your best bet is to be as conscious as you can about performance. Your first step is to limit the fields to the minimum required. This is just good practice even with basic queries and any database engine - only get the fields you need (eg. SELECT * FROM === bad... just stop doing it!). You can also try doing lean queries to help save a lot of post-processing work mongoose does with the data. I didn't test this, but it should work...
SchemaA.find({}, 'field1 fieldB', { lean: true })
.populate({
name: 'fieldB',
select: 'fieldC',
options: { lean: true }
}).exec(function (err, result) {
// not sure how you are populating "result" in your example, as it should be an array,
// but you said your code works... so I'll let you figure out what goes here.
});
Also, a very "mongo" way of doing what you want is to save a reference in SchemaC back to SchemaA. When I say "mongo" way of doing it, you have to break away from your years of thinking about relational data queries. Do whatever it takes to perform fewer queries on the database, even if it requires two-way references and/or data duplication.
For example, if I had a Book schema and Author schema, I would likely save the authors first and last name in the Books collection, along with an _id reference to the full profile in the Authors collection. That way I can load my Books in a single query, still display the author's name, and then generate a hyperlink to the author's profile: /author/{_id}. This is known as "data denormalization", and it has been known to give people heartburn. I try and use it on data that doesn't change very often - like people's names. In the occasion that a name does change, it's trivial to write a function to update all the names in multiple places.
SchemaA.find({})
.populate({
path: "fieldB",
populate:{path:"fieldC"}
}).exec(function (err, result) {
//this is how you can get all key value pair of SchemaA, SchemaB and SchemaC
//example: result.fieldB.fieldC._id(key of SchemaC)
});
why not add a ref to SchemaC on SchemaA? there will be no way to bridge to SchemaC from SchemaA if there is no SchemaB the way you currently have it unless you populate SchemaB with no other data than a ref to SchemaC
As explained in the docs under Field Selection, you can restrict what fields are returned.
.populate('fieldB') becomes populate('fieldB', 'fieldC -_id'). The -_id is required to omit the _id field just like when using select().
I think this is not possible.Because,when a document in A referring a document in B and that document is referring another document in C, how can document in A know which document to refer from C without any help from B.
I have a model "Category". Collection categories contains several objects.
I also a have model "Post". Collection posts may contain a lot of objects with users' posts. "Post" object may relate to 1+ categories. How to link "Post" object to 1+ "Category"-objects without placing "Post"-object inside "Category"-object as subdocument? Certainly, I need to have an option to find all posts related to certain category.
One of the ways I can imagine is to store in "Post"-object obj_id of all categories which it's related to. Smth like this:
var postSchema = mongoose.Schema({
title: String,
description: String,
category: [ObjectId],
created_time: Number,
})
and add category later...
post.category.push(obj_id);
but is it really a mongoose-way? Which way is correct? Thanks.
P.S. I've also read about population methods in mongoose docs, may it be useful in my case? Still not completely clear for me what is this.
Populate is a better tool for this since you are creating a many to many relationship between posts and categories. Subdocuments are appropriate when they belong exclusively to the parent object. You will need to change your postSchema to use a reference:
var postSchema = mongoose.Schema({
title: String,
description: String,
category: [{ type: Schema.Types.ObjectId, ref: 'Category' }],
created_time: Number,
});
You can add categories by pushing documents onto the array:
post.category.push(category1);
post.save(callback);
Then rehydrate them during query using populate:
Post.findOne({ title: 'Test' })
.populate('category')
.exec(function (err, post) {
if (err) return handleError(err);
console.log(post.category);
});
I am trying to create an array of nested objects. I am following an example from a book that does the following:
// Creates the Schema for the Features object (mimics ESRI)
var Phone = new Schema({
number: { type: Number, required: false },
...
personId: {type: Schema.Types.ObjectId}
}
);
// Creates the Schema for the Attachments object
var Person = new Schema({
name: { type: String },
phones: [Phone]
}
);
var Person = mongoose.model('Person', Person);
Which works just fine when storing multiple Phone #'s for a person. However I am not sure if there is a good/fast way to get a Phone object by _id. Since Phone is not a mongoose model you cannot go directly to Phone.findOne({...}); Right now I am stuck with getting a person by _id then looping over that persons phones and seeing if the id matches.
Then I stumbled upon this link:
http://mongoosejs.com/docs/populate.html
Is one way more right than the other? Currently when I delete a person his/her phones go away as well. Not really sure that works with 'populate', seems like I would need to delete Person and Phones.
Anyone want to attempt to explain the differences?
Thanks in advance
The general rule is that if you need to independently query Phones, then you should keep them in a separate collection and use populate to look them up from People when needed. Otherwise, embedding them is typically a better choice as it simplifies updates and deletion.
When using an embedded approach like you are now, note that Mongoose arrays provide an id method you can use to more easily look up an element by its _id value.
var phone = person.phones.id(id);
I have two collections:
Users
Uploads
Each upload has a User associated with it and I need to know their details when an Upload is viewed. Is it best practice to duplicate this data inside the the Uploads record, or use populate() to pull in these details from the Users collection referenced by _id?
OPTION 1
var UploadSchema = new Schema({
_id: { type: Schema.ObjectId },
_user: { type: Schema.ObjectId, ref: 'users'},
title: { type: String },
});
OPTION 2
var UploadSchema = new Schema({
_id: { type: Schema.ObjectId },
user: {
name: { type: String },
email: { type: String },
avatar: { type: String },
//...etc
},
title: { type: String },
});
With 'Option 2' if any of the data in the Users collection changes I will have to update this across all associated Upload records. With 'Option 1' on the other hand I can just chill out and let populate() ensure the latest User data is always shown.
Is the overhead of using populate() significant? What is the best practice in this common scenario?
If You need to query on your Users, keep users alone. If You need to query on your uploads, keep uploads alone.
Another question you should ask yourself is: Every time i need this data, do I need the embedded objects (and vice-versa)? How many time this data will be updated? How many times this data will be read?
Think about a friendship request:
Each time you need the request you need the user which made the request, then embed the request inside the user document.
You will be able to create an index on the embedded object too, and your search will be mono query / fast / consistent.
Just a link to my previous reply on a similar question:
Mongo DB relations between objects
I think this post will be right for you http://www.mongodb.org/display/DOCS/Schema+Design
Use Cases
Customer / Order / Order Line-Item
Orders should be a collection. customers a collection. line-items should be an array of line-items embedded in the order object.
Blogging system.
Posts should be a collection. post author might be a separate collection, or simply a field within posts if only an email address. comments should be embedded objects within a post for performance.
Schema Design Basics
Kyle Banker, 10gen
http://www.10gen.com/presentation/mongosf2011/schemabasics
Indexing & Query Optimization
Alvin Richards, Senior Director of Enterprise Engineering
http://www.10gen.com/presentation/mongosf-2011/mongodb-indexing-query-optimization
**These 2 videos are the bests on mongoddb ever seen imho*
Populate() is just a query. So the overhead is whatever the query is, which is a find() on your model.
Also, best practice for MongoDB is to embed what you can. It will result in a faster query. It sounds like you'd be duplicating a ton of data though, which puts relations(linking) at a good spot.
"Linking" is just putting an ObjectId in a field from another model.
Here is the Mongo Best Practices http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-SummaryofBestPractices
Linking/DBRefs http://www.mongodb.org/display/DOCS/Database+References#DatabaseReferences-SimpleDirect%2FManualLinking