mongoDB + mongoose: choosing the right schema for hierarchical data

mongoDB + mongoose: choosing the right schema for hierarchical data - node.js

I'm new to mongoDB and in a node project i'm using mongoose to create a schema for my database. I have come to understand that in my case i should be using embedded data instead of by reference. (http://docs.mongodb.org/manual/core/data-modeling-introduction/)
The structure that i need to store in the database is something like this:
book
| title
| | chapter
| | | content (url to a html file)
Coming from a mySQL world, i'm trying to understand the noSQL database concepts and i was wondering how one would design the mongoose schema.
Thx,

You can nest documents, and document arrays in a MongoDB document:
db.books.findOne();
can return a JSON:
{
isbn:"253GHST78F6",
title:"some book",
author:"Earnest Hemmingway",
chapters:[
{title:"chapter 1",content:"http://api.com/chapters/1.html"},
{title:"chapter 2",content:"http://api.com/chapters/2.html"}
]
}
But, there are a few more things to keep in mind when modeling a collection in MongoDB:
Model data as close as possible to what will be asked for: There are no Joins in MongoDB, so, try to keep all such data together (pre joined), that is supposed be queried together in future.
Nesting is good for queries but nested update and search is bad:
Indexing and searching inside nested documents, especially arrays, will be expensive.
Avoid placing any data in nested arrays, that is very frequently updated.
There are no Foriegn Keys in MongoDB. So, if you have multiple copies of same document nested under different collections, keeping all of them updated is your responsibility.

First take a look at this Martin Fowlers excellent video:
I think there is no better authority that can explain nosql, then Fowler. Especially knowing his sql background.
Second, MongoDB follows json schema. If you know how to work with json, you will know how to work with nosql (mongodb). Basic thing that needs to be understand is that Mongo schema is expressed in a language that is using it.
So, if you are using NodeJS with Mongo, you still have objects and array to work with. In simple words, Mongo is not forcing any particular schema. It is on developer to create his scheme based on his language/mongo driver.
So how would you express you data logic in your language ? If it is in form of JS object, then move that form to db.
I really like MongoDB, becuse it can be combined with some great JS tools like Underscore.js for all kind of data manipulations.

Related

MongoDB - When to add SubDocuments and when to Ref

Im using MongoDB for storing information for a nodeJS application and a doubt came to my mind, after finding that it is possible to use ObjectID to ref another document. As it is known, MongoDB is a no-SQL db, so there is no need for consistency whatsoever and information can be repeated.
So, lets say, I have a collection for users and one of their field values is 'friends', which is an array of this user friends (another users). What is the best practice, saving all the user info there (thus repeating the same thing over and over again throughout the DB) or saving only the ObjectID of the friendUser (makes way more sense to me, but it sounds kinda SQL mindset). I'm not really getting when should I use each of the options, so a professional opinion would be very appreciated.

To model relationships between connected data, you can reference a document or embed it in another document as a subdocument.
Referencing a document does not create a “real” relationship between these two documents as does with a relational database.
Referencing documents is also known as normalization. It is good for data consistency but creates more queries in your system.
Embedding documents is also known as denormalization.
The benefit of Embedding approach is getting all the data you need about a document and it’s sub-document(s) with a single query. Therefore, this approach is very fast. The drawback is that data may not stay as consistent in the database.
Important
If one document is to be used by many documents then better create a referenced doc.
i. Will Save Space.
ii. if any change required, we will have to update only the referenced doc
instead of updating many docs.
Create sub doc(embedded)
i. If another document is not dependent on the subdocument.
Source: https://vegibit.com/mongoose-relationships-tutorial/
Recommended reading:
MongoDB Applied Design Patterns by Rick Copeland
To Embed or Reference

Using Mongoose Rather than the MongoDB Driver to Find Document

We are in the process of writing tests for our Node/MongoDB backend and I have a question about finding documents.
My understanding is it's preferable to use Mongoose to get your documents as opposed to the MongoDB driver. In other words, doing Customer.findOne().exec() instead of setting up a db connection and then doing db.collection("customers").findOne().
Other than the first option (using Mongoose to find the doc) being slightly less verbose, I'm curious what the other reasons are. Is a straight MongoDB lookup a bigger drag on the database?

One of the great feature of mongoose is the built in validation mechanism. Also the Populate method to get data from multiple collections is an awesome characteristic of Mongoose.
In terms of query performance, here is a good read:
https://medium.com/#bugwheels94/performance-difference-in-mongoose-vs-mongodb-60be831c69ad
Hope this helps :)

MongoDB - various document types in one collection

MongoDB is schemaless, which means a collection (table in relational DB) can contain documents (rows) of different structure - having different fields, for instance.
I'm new to Mongo, so I decided to use Mongoose which should make things a bit easier. Reading the guide:
Defining your schema
Everything in Mongoose starts with a Schema. Each schema maps to a
MongoDB collection and defines the shape of the documents within that
collection.
Notice at the last sentence. Doesn't it conflict with the schemaless philosophy of MongoDB? Or maybe it's that in 99% of cases, I want a collection of documents of the same structure, so in the introductory guide only that scenario is discussed? Does Mongoose even allow me to create schemaless collection?

MongoDB does not require a schema, but that confuses a lot of people from a standard SQL background so Mongoose is aimed at trying to bridge the gap between SQL and NoSQL. If you want to maintain a collection with different document types, than by all means do not use Mongoose.
If you're okay with the schemaless nature of MongoDB there is no reason to add additional abstractions and overhead to MongoDB which is what Mongoose surely applies.

The purpose of Mongoose is to use a Schema, there are other database drivers you can use to take advantage of MongoDBs Schemaless nature such as Mongoskin.
If you want to utilize the Mongoose's Schema Design and make an exeception you can use: Mongoose Strict.
According to the docs:
The strict option, (enabled by default), ensures that values passed to our model constructor that were not specified in our schema do not get saved to the db.

NoSQL doesn't mean no schema. It means, the database doesn't control schema. For instance, with MongoDB, you can look hard to find anything that determines a field in a document is a string, or a number or a date. The database doesn't care. You could store a number in a field in one document and in another document in the same collection, and in the same field, you could store a string. But, from a coding perspective, that can become quite hairy and would be bad practice. This is why you still have to define data types. So, you still need a schema of sorts and why Mongoose offers and, in fact, enforces this functionality.
Going a conceptual level higher now, the major concept of NoSQL is to put schema inside your code and not in some file of SQL commands i.e. not telling the DB what to expect in terms of data types and schema to be controlled by the database. So, instead of needing to have migration files/paths and versioning on database schema, you just have your code. ORMs, for example, try to bridge this issue too, where they often have automated migration systems.
ORMs also try to avoid the Object Relational Impedance Mismatch problem, which MongoDB avoids completely. Well, it doesn't have relationships per se, so the problem is avoided out of necessity.
Getting back to schema, with MongoDB and Mongoose, if you or one of your team make a change to the schema in the code, all your other team members need to do to get the database to work with it is pull in that new code. Voila, the schema is up-to-date and will work. No need to also pull in a copy of the newer migration file (to determine the new schema of the DB) to then have to run it on a (copy of the) db to update it too, just to continue programming. There is no need to make changes in schema in multiple places.
So, in the end, if you can imagine your schema is always in your code (only), making changes to an application with a database persisting state like MongoDB is a good bit simpler and even safer. (Safer, because code and schema can't get out of sync, as it's the one and the same.)

Using and populating (real) DBRef arrays with Mongoose / mongoose-dbref

Mongoose doesn't appear to support Mongo DBRefs. Apparently they released "DBRef" support but it was actually just plain references (no ability to reference documents from different collections). I've finally managed to craft a schema that allows me to hold an array of ObjectID references and populate them, which is great for certain parts of my schema, but it would be extremely convenient if I could use proper DBRefs to create an array that lets me refer to documents from a number of collections.
Luckily(?) there's a module that can monkey patch DBRef support into mongoose: https://github.com/goulash1971/mongoose-dbref
Unluckily, I can't make any sense of the documents. The best I can tell is that there is no ability to use DBRefs in an array (there is a 'fetch' method to dereference, but it takes a single dbref); 'populate' doesn't seem to be patched to fill in DBRefs, and I can't tell how I'm supposed to assign a DBRef given a source document [collection.items.push(?????)].
From the internet, it appears that I can assign an object of the form { $id: document._id, $ref: 'Collection' } -- when logging the result, it appears to have "taken" as a DBRef data type, but I am unsure if this is correct since I cannot seem to do anything useful with it (turn the ref back into a document).
What I really want is a way to represent an ordered list of items from multiple collections; any solution to this is fine by me, but so far DBRefs are the best I've got. Help?

A DBRef (as explained in detail here) is a tuple containing the ObjectId, collection name, and possibly the database container name of a referenced object in another collection.
Internally in the MongoDB server these serve no purpose and are just data within a document. The point is for use in some drivers and ODM implementations to allow for some sort of automatic expansion by issuing additional queries to the server in order to have the data that is elsewhere appear to be an ordinary sub-document part of the referencing document. This can be automatic or a lazy load depending on the implementation, but is always done over the wire and processed on the client side. The server will do nothing to traverse or join this data.
Additionally, MongoDB collections are schemaless, so there is nothing as in the relational sense that says all documents in a collection have to have the same structure.
In the case of Mongoose, there are built in functions to do this sort of loading for you as a convenience, and while not strictly a DBRef and utilizing documents with a different schema in the same collection is the same means as storing the documents external to the referencing document.
It is important to consider the data access patterns of your application and not to simply opt for the same sort of relational design you are used to. Keeping in mind that you are only ever reading from one collection at a time, it is most desirable to get at the data you need in a single read or write, without multiple operations over the wire, which will slow things down considerably.
In short, you should always consider embedding sub-documents first, and then use external references any your best supported form only when you absolutely have to. Your application users will thank you in the end.

creating and querying a collection in another collection. mongodb node.js

Is there a way to create and query a collection thats inside another collection...
Can any one direct me to literature that can teach me how to do this?
I've checked the mongodb
docs tutorial and i havent come across this scenario
No code example because i don't know where to start.
Here is what im trying to do.
Im modelling an online shop. So shop object details are in a collection holding many shops.
Now each shop has got products and each product its own unique details. I want to a void using references and rather append but from tutorial from mongodb docs its not quite clear how to do it.
Im fairly new to mongodb
Thanks.

There isn't any such thing as a collection inside another collection with mongo. A collection is a collection of documents.
There are a couple of ways to link documents to other documents, or collections of other data. The correct one to choose is up to you, your data, and your application.
Any document can point to another ObjectId from any field
You can have an array inside a document
Combine the two, and you can have an array of document ids
Mongo is a document store, not a relational database. My suggestion is don't try to use it as a relational database. If what you need is an RDBMS, use one.
Arrays:
http://docs.mongodb.org/manual/reference/operator/query-array/
http://docs.mongodb.org/manual/reference/operator/update-array/
ObjectId:
http://docs.mongodb.org/manual/core/document/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string