Structuring one-to-one interdocument relationship MongoDB

Structuring one-to-one interdocument relationship MongoDB - node.js

I'm just starting out using MongoDB and I'm a bit lost at how to structure my documents to tackle the following problem (note this is just a basic example that matches what I'm having difficulty with).
The Problem:
Let's say I have a top-level document called Family, these documents are stored in a collection called families and contain some basic information about a family, i.e.
{
_id: ObjectId("foobar"),
familyName: "Simpson",
address: "743 Evergreen Terrace, SpringField",
children: [
{
_id: ObjectId("barfoo"),
firstName: "Bartholomew",
preferredName: "Bart",
middleName: "JoJo"
}
// ... etc.
]
}
Now, let's say I'm adding another top-level document to my application: School, stored in a collection called schools. I need to relate each child to their school, and each child may only attend one school at any point in time. How would I approach this in Mongo? I've come from a very heavy RDBMS background and I'm having a bit of difficulty figuring this out. The main issues that I've come up against in my solutioning revolve around the fact that I'll need to efficiently handle the following use-cases:
View a school and be able to see all Children enrolled there
View a family and see all of the Schools that their children are enrolled in
What I've tried:
Storing the child references in the `School`
The first solution I went with was to make a enrollments array in my School document which referenced the _id of a child as well as their full name for convenience, i.e.
{
_id: ObjectId("asdadssa"),
name: "Springfield Elementary",
enrollments: [
{
child_id: ObjectId("barfoo"), // Bart Simpson
fullName: "Bart Simpson" // concatenation of preferredName and familyName
}
]
}
This seemed fantastic for the first use-case, which just needed to display all of the students enrolled at a particular school.
However when I turned to the second use-case I realised I may have made a mistake. How on earth would you figure out which school each child in a Family belonged to? The only way I could see would be to actually traverse every single school in the schools collection, drill down into their enrollments and see if the child_id matched a child in the family...doesn't seem very efficient does it? That led to my next attempt.
Storing a reference to the school in a child object
Because each child can only belong to one school I figured I could maybe just store a reference to the School document in each child sub-doc, i.e. Bart's document would now become:
{
_id: ObjectId("barfoo"),
firstName: "Bartholomew",
preferredName: "Bart",
middleName: "JoJo",
school_id: ObjectId("asdadssa")
}
Now the second use-case is happy, but the first is unsatisfied.
Conclusion
The only way I can see both use-cases being satisfied is if I employ both solutions simultaneously, i.e. store the school_id in the child sub-doc and also store the child_id in the enrollments array.
This just seems clunky to me, it means you'll need to do at least two writes per enrollment change (to remove from the school and change the child). As far as I'm aware MongoDB only has atomic writes and no transaction support so this looks like a place where data integrity could potentially suffer.
If any MongoDB gurus could propose an alternate solution that'd be great. I'm aware that this particular problem really screams "RDBMS!!!!", but this is only a small part of the application and some of the other data really lends itself to a document store.
I'm only in the planning stage now so I'm not 100% committed to Mongo, but I thought I'd give it a crack since I've been hearing some good things about it.

For the small use case that you've described, I would switch up the families collection to a people or students collection
{
"_id" : ObjectId("barfoo"),
"family_id" : ObjectId("spamneggs"),
"name" : {
"first" : "Bartholomew",
"last" : "Simpson"
},
"school_id" : ObjectId("asdadssa")
}
that stores students as separate documents but unites them with a common family_id (I also snuck in another way to store names). Schools can just have school information without enrollments. I'll give example code in the mongo shell for your two use cases. To find all the children enrolled in Bart's school and Bart's school's document:
> db.students.find({ "school_id" : ObjectId("asdadssa") })
> db.schools.find({ "_id" : ObjectId("asdadssa") })
To find all of the schools Bart's family has children enrolled in:
> var schools = []
> db.students.find({ "family_id" : ObjectId("spamneggs") }, { "_id" : 0, "school_id" : 1 }).forEach(function(doc) {
schools.push(doc.school_id)
})
> db.schools.find({ "_id" : { "$in" : schools } })
Both of these are simple application-side joins and should work fine in your case because one family won't have zillions of children. Indexes on school_id and family_id will help.
For writes, only the student document needs to be updated with the proper school_id.

Related

Get number of products from each category in mongodb database

I'm new to mongodb and to overall databases side of development.
I'm trying to make a product listing site where all the categories would be displayed with the number of products within that particular category and when clicked on a particular category, it would get me all the products in that category.
Some things to note are:
every product will have only one category
each category will have multiple products
I don't know how to go about this problem and tried searching it online but couldn't exactly find what I was looking for. I've also tried making the schema for this but I do not know if it's the right approach or not and this is how it looks:
const productsSchema = {
category: String,
name: String,
price: String,
description: String,
thumbnail: String,
};
Side note: I'm using MERN stack.(if its of any help)

If I've understand well your question, you can use something like this:
db.collection.aggregate([
{
"$match": {
"category": "category1"
}
},
{
"$count": "total"
}
])
With this query you will get the total $count for each category.
Example here
In your frontend you will need a call for every category.
Maybe if your DB has a lot of different categories this is not a good approach, but if the number is not large enough you can call this query a couple times and you will get the result you want.
MongoDB Documentation reference here

I would say you should have a product schema and a product category schema, where the product category schema has an array of product ids that belong to that category.
In the product schema, you could also have a pointer to the category object that a product is linked to (as opposed to just the name of the category as a string).
Maybe take a look at mongoose populate https://mongoosejs.com/docs/populate.html

Mongoose populating subdocument to subdocument

I've got a complex Mongoose population issue that I'm trying to sort out, and wondered if someone could shed some light (yeah yeah, I know, could use a RDBMS, but most the other bits of the schema lend themselves nicely to Mongo).
I've got two models: a Study and a Participant.
Study:
var StudySchema = new mongoose.Schema({
name: String,
checklist: [
{
order: Number,
text: String
}
]
});
Participant:
var ParticipantSchema = new mongoose.Schema({
name: String,
checklist_items: [
{
isComplete: Boolean,
item: {
type: Schema.Types.ObjectId
}
}
]
});
When a participant is created (they're always part of a study), the checklist is copied over onto the participant, so we can keep track of that checklist on the individual participant. I'm simply pushing IDs into the Participant.checklist_items.item to link those back to the items on the Study. (These are referenced, not wholesale copied, so that text changes to the study checklist are propagated down naturally)
I want to populate this model when retrieving a participant. When I get them, I want item on checklist_items to be populated with the corresponding item from the study. Hope that makes sense.
I've tried things like:
Participant.findById(req.params.id)
.populate({path: 'checklist_items.item', populate: {model: 'Study', path: 'checklist'})
.exec()
But no dice. I've monkeyed around with this for awhile, and I'm not sure I'm grokking how to do this child-to-child type population.
Any ideas? Is this possible?
Edit: clarified title with correct terms

It appears this isn't possible with Mongoose, and represents a bit of antipattern. Leaving a reference to this issue for folks with this question in the future: https://github.com/Automattic/mongoose/issues/2772

You can simply try this:
Participant.findById(req.params.id)
.populate({path: 'checklist_items.item', model: 'Study'})
.exec()
This will fetch you requiredparticipant and populate all the item inside checklist_items.
See if it works for you.

CouchDB Referential Integrity

I am new to CouchDB and the NoSQL scene and coming from a SQL background. I have some questions on referential integrity, for example I have a product document as below
{
type: "product"
name: "Sweet Necklace"
category: "necklace"
}
And each category have their own document
{
type: "category",
name: "necklace",
custom_attr: ".."
}
Just for the sake of the argument, what happens when the stakeholder chose to rename the category from "necklace" to "accessories", what should happen on the products that have the category field set as "necklace"? Do I do a bulk update on all products with category equal to necklace? (I don't think CouchDB allows us to perform a "UPDATE ALL WHERE" kinda statement)
What is the best practice on handling such situation?
P/S: I chose to save the category name in the product document instead of a category ID since NoSQL encourages denormalization anyway.

If you're maintaining a separate document for the category then you've not denormalized your data at all. In fact, by doing what you're doing, you're getting the worst of both worlds - no normalization and no denormalization.
Try something like this:
Product document:
{
_id:"product:first_product",
name:"First Product"
category:"category:category_1"
}
Category document:
{
_id:"category:category_1",
name:"Category 1",
custom_attr: {}
}
This way, when you change the name of the category, you're still referring to the correct document from all the product documents that have this category.
Note: you can still have a type field and let the _id remain as it is currently.
Edit:
To get the product/category info, you can define a map function like so:
function(doc){
if(doc.id.indexOf('product:') === 0){
// or if(doc.type === 'product') if you use the type field
emit(doc, {'_id': doc.category});
}
}
Now whenever you use this view and you set include_docs to true, the category information will be included in your results.

MongoDB Relational Data Structures with array of _id's

We have been using MongoDB for some time now and there is one thing I just cant wrap my head around. Lets say I have a a collection of Users that have a Watch List or Favorite Items List like this:
usersCollection = [
{
_id: 1,
name: "Rob",
itemWatchList:[
"111111",
"222222",
"333333"
]
}
];
and a separate Collection of Items
itemsCollection = [
{
_id:"111111",
name: "Laptop",
price:1000.00
},
{
_id:"222222",
name: "Bike",
price:123.00
},
{
_id:"333333",
name: "House",
price:500000.00
}
];
Obviously we would not want to insert the whole item obj inside the itemWatchList array because the items data could change i.e. price.
Lets say we pull that user to the GUI and want to diplay a grid of the user itemWatchList. We cant because all we have is a list of ID's. Is the only option to do a second collection.find([itemWatchList]) and then in the results callback manipulate the user record to display the current items? The problem with that is what if I return an array of multiple Users each with an array of itemWatchList's, that would be a callback nightmare to try and keep the results straight. I know Map Reduce or Aggregation framework cant traverse multiple collections.
What is the best practice here and is there a better data structure that should be used to avoid this issue all together?

You have 3 different options with how to display relational data. None of them are perfect, but the one you've chosen may not be the best option for your use case.
Option 1 - Reference the IDs
This is the option you've chosen. Keep a list of Ids, generally in an array of the objects you want to reference. Later to display them, you do a second round-trip with an $in query.
Option 2 - Subdocuments
This is probably a bad solution for your situation. It means putting the entire array of documents that are stored in the items collection into your user collection as a sub-document. This is great if only one user can own an item at a time. (For example, different shipping and billing addresses.)
Option 3 - A combination
This may be the best option for you, but it'll mean changing your schema. For example, lets say that your items have 20 properties, but you really only care about the name and price for the majority of your screens. You then have a schema like this:
usersCollection = [
{
_id: 1,
name: "Rob",
itemWatchList:[
{
_id:"111111",
name: "Laptop",
price:1000.00
},
{
_id:"222222",
name: "Bike",
price:123.00
},
{
_id:"333333",
name: "House",
price:500000.00
}
]
}
];
itemsCollection = [
{
_id:"111111",
name: "Laptop",
price:1000.00,
otherAttributes: ...
},
{
_id:"222222",
name: "Bike",
price:123.00
otherAttributes: ...
},
{
_id:"333333",
name: "House",
price:500000.00,
otherAttributes: ...
}
];
The difficulty is that you then have to keep these items in sync with each other. (This is what is meant by eventual consistency.) If you have a low-stakes application (not banking, health care etc) this isn't a big deal. You can have the two update queries happen successively, updating the users that have that item to the new price. You'll notice this sort of latency on some websites if you pay attention. Ebay for example often has different prices on the search results pages than the actual price once you open the actual page, even if you return and refresh the search results.
Good luck!

Whats the best way of saving a document with revisions in a key value store?

I'm new to Key-Value Stores and I need your recommendation. We're working on a system that manages documents and their revisions. A bit like a wiki does. We're thinking about saving this data in a key value store.
Please don't give me a recommendation that is the database you prefer because we want to hack it so we can use many different key value databases. We're using node.js so we can easily work with json.
My Question is: What should the structure of the database look like? We have meta data for each document(timestamp, lasttext, id, latestrevision) and we have data for each revision (the change, the author, timestamp, etc...). So, which key/value structure you recommend?
thx

Cribbed from the MongoDB groups. It is somewhat specific to MongoDB, however, it is pretty generic.
Most of these history implementations break down to two common strategies.
Strategy 1: embed history
In theory, you can embed the history of a document inside of the document itself. This can even be done atomically.
> db.docs.save( { _id : 1, text : "Original Text" } )
> var doc = db.docs.findOne()
> db.docs.update( {_id: doc._id}, { $set : { text : 'New Text' }, $push : { hist : doc.text } } )
> db.docs.find()
{ "_id" : 1, "hist" : [ "Original Text" ], "text" : "New Text" }
Strategy 2: write history to separate collection
> db.docs.save( { _id : 1, text : "Original Text" } )
> var doc = db.docs.findOne()
> db.docs_hist.insert ( { orig_id : doc._id, ts : Math.round((new Date()).getTime() / 1000), data : doc } )
> db.docs.update( {_id:doc._id}, { $set : { text : 'New Text' } } )
Here you'll see that I do two writes. One to the master collection and
one to the history collection.
To get fast history lookup, just grab the original ID:
> db.docs_hist.ensureIndex( { orig_id : 1, ts : 1 })
> db.docs_hist.find( { orig_id : 1 } ).sort( { ts : -1 } )
Both strategies can be enhanced by only displaying diffs
You could hybridize by adding a link from history collection to original collection
Whats the best way of saving a document with revisions in a key value store?
It's hard to say there is a "best way". There are obviously some trade-offs being made here.
Embedding:
atomic changes on a single doc
can result in large documents, may break the reasonable size limits
probably have to enhance code to avoid returning full hist when not necessary
Separate collection:
easier to write queries
not atomic, needs two operations (do you have transactions?)
more storage space (extra indexes on original docs)

I'd keep a hierarchy of the real data under each document with the revision data attached, for instance:
{
[
{
"timestamp" : "2011040711350621",
"data" : { ... the real data here .... }
},
{
"timestamp" : "2011040711350716",
"data" : { ... the real data here .... }
}
]
}
Then use the push operation to add new versions and periodically remove the old versions. You can use the last (or first) filter to only get the latest copy at any given time.

I think there are multiple approaches and this question is old but I'll give my two cents as I was working on this earlier this year. I have been using MongoDB.
In my case, I had a User account that then had Profiles on different social networks. We wanted to track changes to social network profiles and wanted revisions of them so we created two structures to test out. Both methods had a User object that pointed to foreign objects. We did not want to embed objects from the get-go.
A User looked something like:
User {
"tags" : [Tags]
"notes" : "Notes"
"facebook_profile" : <combo_foreign_key>
"linkedin_profile" : <same as above>
}
and then, for the combo_foreign_key we used this pattern (Using Ruby interpolation syntax for simplicity)
combo_foreign_key = "#{User.key}__#{new_profile.last_updated_at}"
facebook_profiles {
combo_foreign_key: facebook_profile
... and you keep adding your foreign objects in this pattern
}
This gave us O(1) lookup of the latest FacebookProfile of a User but required us to keep the latest FK stored in the User object. If we wanted all of the FacebookProfiles we would then ask for all keys in the facebook_profiles collection with the prefix of "#{User.key}__" and this was O(N)...
The second strategy we tried was storing an array of those FacebookProfile keys on the User object so the structure of the User object changed from
"facebook_profile" : <combo_foreign_key>
to
"facebook_profile" : [<combo_foreign_key>]
Here we'd just append on the new combo_key when we added a new profile variation. Then we'd just do a quick sort of the "facebook_profile" attribute and index on the largest one to get our latest profile copy. This method had to sort M strings and then index the FacebookProfile based on the largest item in that sorted list. A little slower for grabbing the latest copy but it gave us the advantage knowing every version of a Users FacebookProfile in one swoop and we did not have to worry about ensuring that foreign_key was really the latest profile object.
At first our revision counts were pretty small and they both worked pretty well. I think I prefer the first one over the second now.
Would love input from others on ways they went about solving this issue. The GIT idea suggested in another answer actually sounds really neat to me and for our use case would work quite well... Cool.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string