CouchDB Referential Integrity

CouchDB Referential Integrity - couchdb

I am new to CouchDB and the NoSQL scene and coming from a SQL background. I have some questions on referential integrity, for example I have a product document as below
{
type: "product"
name: "Sweet Necklace"
category: "necklace"
}
And each category have their own document
{
type: "category",
name: "necklace",
custom_attr: ".."
}
Just for the sake of the argument, what happens when the stakeholder chose to rename the category from "necklace" to "accessories", what should happen on the products that have the category field set as "necklace"? Do I do a bulk update on all products with category equal to necklace? (I don't think CouchDB allows us to perform a "UPDATE ALL WHERE" kinda statement)
What is the best practice on handling such situation?
P/S: I chose to save the category name in the product document instead of a category ID since NoSQL encourages denormalization anyway.

If you're maintaining a separate document for the category then you've not denormalized your data at all. In fact, by doing what you're doing, you're getting the worst of both worlds - no normalization and no denormalization.
Try something like this:
Product document:
{
_id:"product:first_product",
name:"First Product"
category:"category:category_1"
}
Category document:
{
_id:"category:category_1",
name:"Category 1",
custom_attr: {}
}
This way, when you change the name of the category, you're still referring to the correct document from all the product documents that have this category.
Note: you can still have a type field and let the _id remain as it is currently.
Edit:
To get the product/category info, you can define a map function like so:
function(doc){
if(doc.id.indexOf('product:') === 0){
// or if(doc.type === 'product') if you use the type field
emit(doc, {'_id': doc.category});
}
}
Now whenever you use this view and you set include_docs to true, the category information will be included in your results.

Related

Get number of products from each category in mongodb database

I'm new to mongodb and to overall databases side of development.
I'm trying to make a product listing site where all the categories would be displayed with the number of products within that particular category and when clicked on a particular category, it would get me all the products in that category.
Some things to note are:
every product will have only one category
each category will have multiple products
I don't know how to go about this problem and tried searching it online but couldn't exactly find what I was looking for. I've also tried making the schema for this but I do not know if it's the right approach or not and this is how it looks:
const productsSchema = {
category: String,
name: String,
price: String,
description: String,
thumbnail: String,
};
Side note: I'm using MERN stack.(if its of any help)

If I've understand well your question, you can use something like this:
db.collection.aggregate([
{
"$match": {
"category": "category1"
}
},
{
"$count": "total"
}
])
With this query you will get the total $count for each category.
Example here
In your frontend you will need a call for every category.
Maybe if your DB has a lot of different categories this is not a good approach, but if the number is not large enough you can call this query a couple times and you will get the result you want.
MongoDB Documentation reference here

I would say you should have a product schema and a product category schema, where the product category schema has an array of product ids that belong to that category.
In the product schema, you could also have a pointer to the category object that a product is linked to (as opposed to just the name of the category as a string).
Maybe take a look at mongoose populate https://mongoosejs.com/docs/populate.html

Structuring one-to-one interdocument relationship MongoDB

I'm just starting out using MongoDB and I'm a bit lost at how to structure my documents to tackle the following problem (note this is just a basic example that matches what I'm having difficulty with).
The Problem:
Let's say I have a top-level document called Family, these documents are stored in a collection called families and contain some basic information about a family, i.e.
{
_id: ObjectId("foobar"),
familyName: "Simpson",
address: "743 Evergreen Terrace, SpringField",
children: [
{
_id: ObjectId("barfoo"),
firstName: "Bartholomew",
preferredName: "Bart",
middleName: "JoJo"
}
// ... etc.
]
}
Now, let's say I'm adding another top-level document to my application: School, stored in a collection called schools. I need to relate each child to their school, and each child may only attend one school at any point in time. How would I approach this in Mongo? I've come from a very heavy RDBMS background and I'm having a bit of difficulty figuring this out. The main issues that I've come up against in my solutioning revolve around the fact that I'll need to efficiently handle the following use-cases:
View a school and be able to see all Children enrolled there
View a family and see all of the Schools that their children are enrolled in
What I've tried:
Storing the child references in the `School`
The first solution I went with was to make a enrollments array in my School document which referenced the _id of a child as well as their full name for convenience, i.e.
{
_id: ObjectId("asdadssa"),
name: "Springfield Elementary",
enrollments: [
{
child_id: ObjectId("barfoo"), // Bart Simpson
fullName: "Bart Simpson" // concatenation of preferredName and familyName
}
]
}
This seemed fantastic for the first use-case, which just needed to display all of the students enrolled at a particular school.
However when I turned to the second use-case I realised I may have made a mistake. How on earth would you figure out which school each child in a Family belonged to? The only way I could see would be to actually traverse every single school in the schools collection, drill down into their enrollments and see if the child_id matched a child in the family...doesn't seem very efficient does it? That led to my next attempt.
Storing a reference to the school in a child object
Because each child can only belong to one school I figured I could maybe just store a reference to the School document in each child sub-doc, i.e. Bart's document would now become:
{
_id: ObjectId("barfoo"),
firstName: "Bartholomew",
preferredName: "Bart",
middleName: "JoJo",
school_id: ObjectId("asdadssa")
}
Now the second use-case is happy, but the first is unsatisfied.
Conclusion
The only way I can see both use-cases being satisfied is if I employ both solutions simultaneously, i.e. store the school_id in the child sub-doc and also store the child_id in the enrollments array.
This just seems clunky to me, it means you'll need to do at least two writes per enrollment change (to remove from the school and change the child). As far as I'm aware MongoDB only has atomic writes and no transaction support so this looks like a place where data integrity could potentially suffer.
If any MongoDB gurus could propose an alternate solution that'd be great. I'm aware that this particular problem really screams "RDBMS!!!!", but this is only a small part of the application and some of the other data really lends itself to a document store.
I'm only in the planning stage now so I'm not 100% committed to Mongo, but I thought I'd give it a crack since I've been hearing some good things about it.

For the small use case that you've described, I would switch up the families collection to a people or students collection
{
"_id" : ObjectId("barfoo"),
"family_id" : ObjectId("spamneggs"),
"name" : {
"first" : "Bartholomew",
"last" : "Simpson"
},
"school_id" : ObjectId("asdadssa")
}
that stores students as separate documents but unites them with a common family_id (I also snuck in another way to store names). Schools can just have school information without enrollments. I'll give example code in the mongo shell for your two use cases. To find all the children enrolled in Bart's school and Bart's school's document:
> db.students.find({ "school_id" : ObjectId("asdadssa") })
> db.schools.find({ "_id" : ObjectId("asdadssa") })
To find all of the schools Bart's family has children enrolled in:
> var schools = []
> db.students.find({ "family_id" : ObjectId("spamneggs") }, { "_id" : 0, "school_id" : 1 }).forEach(function(doc) {
schools.push(doc.school_id)
})
> db.schools.find({ "_id" : { "$in" : schools } })
Both of these are simple application-side joins and should work fine in your case because one family won't have zillions of children. Indexes on school_id and family_id will help.
For writes, only the student document needs to be updated with the proper school_id.

How should I model my MongoDB collection for nested documents?

I'm managing a MongoDB database for a building products store. The most immediate collection is products, right?
There are quite several products, however they all belong to one among a set of 5-8 categories and then to one subcatefory among a small set of subcategories.
For example:
-Electrical
*Wires
p1
p2
..
*Tools
p5
pn
..
*Sockets
p11
p23
..
-Plumber
*Pipes
..
*Tools
..
PVC
..
I will use Angular at web site client side to show whole products catalog, I think about AJAX for querying the right subset of products I want.
Then, I wonder whether I should manage one only collection like:
{
MainCategory1: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
},
MainCategory2: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
},
MainCategoryn: {
SubCategory1: {
{},{},{},{},{},{},{}
}
SubCategory2: {
{},{},{},{},{},{},{}
}
SubCategoryn: {
{},{},{},{},{},{},{}
}
}
}
Or a single collection per each category. The number of documents might not be higher than 500. However I care about a balance for:
quick DB answer,
easy server side DB querying, and
client-side Angular code for rendering results to html.
I'm using mongodb node.js module, not Mongoose now.
What CRUD operations will I do?
Inserts of products, I'd also like to have a way to obtain autogenerated ids (maybe sequential) per each new register. However, as it might seem natural I wouldn't offer the _id to the user.
Querying the whole documents set of a subcategory. Maybe just obtaining a few attributes at first.
Querying whole or a specific subset of attributes of a document (product) in particular.
Modifying a product's attributes values.

I agree client side should get the easiest result to render. However, to nest categories into products is still a bad idea. The trade off is once you want to change, for example, the name of a category, it will be a disaster. And if you think about the possible usecases, for example:
list all categories
find all subcategories of a certain category
find all products in a certain category
You'll find it hard to do these stuff with your data structure.
I had same situation in my current project. So here's what I do for your reference.
First, categories should be in a separate collection. DON'T nest categories into each other, as it will complicate the procedure to find all subcategories. The traditional way for finding all subcategories is to maintain an idPath property. For example, your categories are divided into 3 levels:
{
_id: 100,
name: "level1 category"
parentId: 0, // means it's the top category
idPath: "0-100"
}
{
_id: 101,
name: "level2 category"
parentId: 100,
idPath: "0-100-101"
}
{
_id: 102,
name: "level3 category"
parentId: 101,
idPath: "0-100-101-102"
}
Note with idPath, parentId is not necessary anymore. It's for you to understand the structure easier.
Once you need to find all subcategories of category 100, simply do the query:
db.collection("category").find({_id: /^0-100-/}, function(err, doc) {
// whatever you want to do
})
With category stored in a separate collection, in your product you'll need to reference them by _id, just like when we use RDBMS. For example:
{
... // other fields of product
categories: [100, 101, 102, ...]
}
Now if you want to find all products in a certain category:
db.collection("category").find({_id: new RegExp("/^" + idPath + "-/"}, function(err, categories) {
var cateIds = _.pluck(categories, "_id"); // I'm using underscore to pluck category ids
db.collection("product").find({categories: { $in: cateIds }}, function(err, products) {
// products are here
}
})
Fortunately, category collection is usually very small, with only hundreds of records inside (or thousands). And it doesn't varies a lot. So you can always store a live copy of categories inside memory, and it can be constructed as nested objects like:
[{
id: 100,
name: "level 1 category",
... // other fields
subcategories: [{
id: 101,
... // other fields
subcategories: [...]
}, {
id: 103,
... // other fields
subcategories: [...]
},
...]
}, {
// another top1 category
}, ...]
You may want to refresh this copy every several hours, so:
setTimeout(3600000, function() {
// refresh your memory copy of categories.
});
That's all I get in mind right now. Hope it helps.
EDIT:
to provide int ID for each user, $inc and findAndModify is very useful. you may have a idSeed collection:
{
_id: ...,
seedValue: 1,
forCollection: "user"
}
When you want to get an unique ID:
db.collection("idSeed").findAndModify({forCollection: "user"}, {}, {$inc: {seedValue: 1}}, {}, function(err, doc) {
var newId = doc.seedValue;
});
The findAndModify is an atomic operator provided by mongodb. It will guarantee thread safety. and the find and modify actually happens in a "transaction".
2nd question is in my answer already.
query subsets of properties is described with mongodb Manual. NodeJS API is almost the same. Read the document of projection parameter.
update subsets is also supported by $set of mongodb operator.

CouchDB reduce function useful in this scenario?

I want to store votes in CouchDB. To get round the problem of incrementing a field in one document and having millions of revisions, each vote will be a seperate document:
{
_id: "xyz"
type: "thumbs_up"
vote_id: "test"
}
So the actual document itself is the vote. The result I'd like is basically an array of: vote_id, sumOfThumbsUp, sumOfThumbsDown
Now I think my map function would need to look like:
if(type=="thumbs_up" | type =="thumbs_down"){
emit(vote_id, type)
}
Now here's the bit I can't figure out what to do, should I build a reduce function to somehow sum the vote types, keeping in mind there's two types of votes.
Or should I just take what's been emited from the map function and put it straight into an array to work on, ignoring the reduce function completely?

This is a perfect case for map-reduce! Having each document represent a vote is the right way to go in my opinion, and will work with CouchDB's strengths.
I would recommend a document structure like this:
Documents
UPVOTE
{
"type": "vote",
"vote_id": "test",
"vote": 1
}
DOWNVOTE
{
"type": "vote",
"vote_id": "test",
"vote": -1
}
I would use a document type of "vote", so you can have other document types in your database (like the vote category information, user information, etc)
I kept "vote_id" the same
I made the value field called "vote", and just used 1/-1 instead of "thumbs_up" or "thumbs_down" (really doesn't matter, you can do whatever you want and it will work just fine)
View
Map
function (doc) {
if (doc.type === "vote") {
emit(doc.vote_id, doc.vote);
}
}
Reduce
_sum
You end up with a result like this for your map function:
And if you reduce it:
As you add more vote documents with more vote_id variety, you can query for a specific vote_id by using: /:db/_design/:ddoc/_view/:view?reduce=true&group=true&key=":vote_id"

MongoDB query by foreign key

I have two collections:
USERS:
{ id:"aaaaaa" age:19 , sex:"f" }
{ id:"bbbbbb" age:30 , sex:"m" }
REVIEWS:
{ id:777777 , user_id:"aaaaaa" , text:"some review data" }
{ id:888888 , user_id:"aaaaaa" , text:"some review data" }
{ id:999999 , user_id:"bbbbbb" , text:"some review data" }
I would like to findAll REVIEWS Where sex=f and age>18
( I dont want to nest because the reviews collection will be huge )

You should include user's data into each review (a.k.a. as denormalizing):
{ id:777777 , user: { id:"aaaaaa", age:19 , sex:"f" } , text:"some review data" }
{ id:888888 , user: { id:"aaaaaa", age:19 , sex:"f" } , text:"some other review data" }
{ id:999999 , user: { id:"bbbbbb", age:20 , sex:"m" } , text:"mome review data" }
Here, read this link on MongoDB Data Modeling:
A Note on Denormalization
Relational purists may be feeling uneasy already, as if we were
violating some universal law. But let's bear in mind that MongoDB
collections are not equivalent to relational tables; each serves a
unique design objective. A normalized table provides an atomic,
isolated chunk of data. A document, however, more closely represents
an object as a whole. In the case of a social news site, it can be
argued that a username is intrinsic to the story being posted.
What about updates to the username? It's true that such updates will
be expensive; happily, in this case, they'll be rare. The read savings
achieved in denormalizing will surely outweigh the costs of the
occasional update. Alas, this is not hard and fast rule: ultimately,
developers must evaluate their applications for the appropriate level
of normalization.

Unless you de-normalize REVIEWS collection with your search attributes, MongoDB does not support querying another collection in a single query. See this post.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string