Mongoose Schema Design approach - node.js

I am new to NoSql databases. I am trying to build a project and stuck with the approach of whether to choose sql databases or NoSql Databases for the project.
The requirements of my project are a legal firm would have many clients and each client can have different matter Type such as Immigration, Conveyancing, Family and etc and each MatterType can also have different fields which are never constant and they can fairly change in future.
Due to this nature I thought Nosql databases might be a good choice as they are document based and I can add any new fields to the document structure instead of always adding new columns to a sql data table dynamically which is not a good approach ( atleast i think)
Can anyone please kindly suggest me or refer me to an article which can assist me in deciding my approach
To give my clarity into my question let me explain with an example
For a client name xyz and matterType Immigration I can have fields such as firstName,lastName,Dob at this moment but later on for the same client I might have to add Dependants and their details
For a client name def and matterType conveyancing I would have different fields but those fields should also be added dynamically depending on the matter Type
Thank you in advance
Regards
Anand

In my opinion, you shouldn't only consider this feature in other to decide between NoSql or RBMDS.
In fact, this flexibility sounds very good, but it might be dangerous, once systems tend to raise, then things can get out of hand.
I have a system where I use MongoDB, but even though, I decided creating a schema for my collections.
I would suggest you finish modeling, then after that, conclude if it's really necessary to use NoSql.

I would like to suggest you to look into postgres sql if you are expecting large datasets. It offers the advantages of no sql databases such as support for key value pair and also keeping a rigid data structure like sql databases. Following are links to a few articles which may help you decide which approach to choose:
NoSql vs Sql
postgres vs mongodb

Related

DynamoDB joins with multiple tables

I am developing ERP system. I am using AWS lambda(Node) with Dynamo DB. I am new in dynamo. We are using multiple tables.
My one small use case is like this,
I have one company table in which company's address will be store. In that address fields there is city, state, country fields which will have primary key of master tables(City, state, country).
So I need to fetch city,state and country also when i fetch company details in single query. Also filter, sorting should be done from master table.
So can anyone help me with this that there is any way to implement this? And if possible also provide some node js query code for reference.
If possible provide some official doc so I can review that.
Thanks for any help :)
DynamoDB is a fantastic technology. However, it is completely different than SQL databases. As such, you'll need to approach the entire process differently than you would with a SQL database.
Entire books have been written on the subject, so I won't try to cover all the differences here. However, I'll pass along a few pointers to get you in the right direction.
Single table design is considered a best practice in DynamoDB. Having multiple tables isn't necessarily an anti-pattern, but spend some time investigating why this is the case. A good place to start: DynamoDB does not have a SQL join operation!
The DynamoDB Book is the best book on this topic, hands down. Spending just a few hours reading this book will drastically reduce the learning curve.
Watch AWS Re:Invent talks on DynamoDB. This talk from 2019 is a great place to start. (This talk is from Alex DeBrie, the author of the DynamoDB Book).
The rules of SQL databases do not apply when working with NoSQL databases. In other words, forget everything you know about SQL databases when working with NoSQL. It's a paradigm shift, and I find it best to set aside any preconceived notions about databases and approach the topic with an open mind. Easier said than done, but it's my advice anyway :)
DynamoDB requires that you build your data model to match the use cases of your application, aka your application "access patterns". Throwing your data into DynamoDB before thinking about how you need to access the data is the wrong approach. Instead, ask yourself "what are all the ways my application needs to get data?" and design your data model accordingly.
NoSQL data modeling has a steep learning curve, but it's awesome when you get it right! Good luck!

Usage of the DISTINCT keyword through haskell persist

Is there some way in Haskell persistent to perform a select distinct (column_name_1, column_name_2) .... Note that I do not mean unique, I really want to select the distinct records for these columns. I can of course perform some filter-magic afterwards, but I would like to have the database (in my case postgres) solve it, but I did not really find this in the documentation.
Kasper
A search of the persistent repo shows that the DISTINCT keyword is not used in any meaningful context, meaning that persistent does not support DISTINCT queries at all.
The reason for this is because an explicit design goal of persistent is to be backend-agnostic, and many non-SQL backends do not natively support certain SQL features, such as distinct queries and joins.
I opened a Github issue to query this, and Matt Parsons, a persistent maintainer, responded recommending the Esqueleto package, which is written on top of persistent and aims to provide SQL-specific functionality.

MongoDB (noSQL) when to split collections

So I'm writing an application in NodeJS & ExpressJS. It's my first time I'm using a noSQL database like MongoDB and I'm trying to figure out how to fix my data model.
At start for our project we have written down everything in relationship database terms but since we recently switched from Laravel to ExpressJS for our project I'm a bit stuck on what to do with all my different tables layouts.
So far I have figured out it's better to denormalize your scheme but it does have to end somewhere, right? In the end you can end up storing your whole data in one collection. Well, not enterily but you get the point.
1. So is there a rule or standard that defines where to cut to make multiple collections?
I'm having a relation database with users (which are both a client or a store user), stores, products, purchases, categories, subcategories ..
2. Is it bad to define a relationship in a noSQL database?
Like every product has a category but I want to relate to the category by an id (parent does the job in MongoDB) but is it a bad thing? Or is this where you choose performance vs structure?
3. Is noSQL/MongoDB ment to be used for such large databases which have much relationships (if they were made in MySQL)?
Thanks in advance
As already written, there are no rules like the second normal form for SQL.
However, there are some best practices and common pitfalls related to optimization for MongoDB which I will list here.
Overuse of embedding
The BSON limit
Contrary to popular believe, there is nothing wrong with references. Assume you have a library of books, and you want to track the rentals. You could begin with a model like this
{
// We use ISBN for its uniqueness
_id: "9783453031456"
title: "Schismatrix",
author: "Bruce Sterling",
rentals: [
{
name:"Markus Mahlberg,
start:"2015-05-05T03:22:00Z",
due:"2015-05-12T12:00:00Z"
}
]
}
While there are several problems with this model, the most important isn't obvious – there will be a limited number of rentals because of the fact that BSON documents have a size limit of 16MB.
The document migration problem
The other problem with storing rentals in an array would be that this would cause relatively frequent document migrations, which is a rather costly operation. BSON documents are never partitioned and created with some additional space allocated in advance used when they grow. This additional space is called padding. When the padding is exceeded, the document is moved to another location in the datafiles and new padding space is allocated. So frequent additions of data cause frequent document migrations.
Hence, it is best practice to prevent frequent updates increasing the size of the document and use references instead.
So for the example, we would change our single model and create a second one. First, the model for the book
{
_id: "9783453031456",
title:"Schismatrix",
author: "Bruce Sterling"
}
The second model for the rental would look like this
{
_id: new ObjectId(),
book: "9783453031456",
rentee: "Markus Mahlberg",
start: ISODate("2015-05-05T03:22:00Z"),
due: ISODate("2015-05-05T12:00:00Z"),
returned: ISODate("2015-05-05T11:59:59.999Z")
}
The same approach of course could be used for author or rentee.
The problem with over normalization
Let's look back some time. A developer would identify the entities involved into a business case, define their properties and relations, write the according entity classes, bang his head against the wall for a few hours to get the triple inner-outer-above-and-beyond JOIN working required for the use case and all lived happily ever after. So why use NoSQL in general and MongoDB in particular? Because nobody lived happily ever after. This approach scales horribly and almost exclusively the only way to scale is vertical.
But the main difference of NoSQL is that you model your data according to the questions you need to get answered.
That being said, let's look at a typical n:m relation and take the relation from authors to books as our example. In SQL, you'd have 3 tables: two for your entities (books and authors) and one for the relation (Who is the author of which book?). Of course, you could take those tables and create their equivalent collections. But, since there are no JOINs in MongoDB, you'd need three queries (one for the first entity, one for its relations and one for the related entities) to find the related documents of an entity. This wouldn't make sense, since the three table approach for n:m relations was specifically invented to overcome the strict schemas SQL databases enforce.
Since MongoDB has a flexible schema, the first question would be where to store the relation, keeping the problems arising from overuse of embedding in mind. Since an author might write quite a few books in the years coming, but the authorship of a book rarely, if at all, changes, the answer is simple: We store the authors as a reference to the authors in the books data
{
_id: "9783453526723",
title: "The Difference Engine",
authors: ["idOfBruceSterling","idOfWilliamGibson"]
}
And now we can find the authors of that book by doing two queries:
var book = db.books.findOne({title:"The Difference Engine"})
var authors = db.authors.find({_id: {$in: book.authors})
I hope the above helps you to decide when to actually "split" your collections and to get around the most common pitfalls.
Conclusion
As to your questions, here are my answers
As written before: No, but keeping the technical limitations in mind should give you an idea when it could make sense.
It is not bad – as long as it fits your use case(s). If you have a given category and its _id, it is easy to find the related products. When loading the product, you can easily get the categories it belongs to, even efficiently so, as _id is indexed by default.
I have yet to find a use case which can't be done with MongoDB, though some things can get a bit more complicated with MongoDB. What you should do imho is to take the sum of your functional and non functional requirements and check wether the advantages outweigh the disadvantages. My rule of thumb: if one of "scalability" or "high availability/automatic failover" is on your list of requirements, MongoDB is worth more than a look.
The very "first" thing to consider when choosing an "NoSQL" solution for storage over an "Relational" solution is that things "do not work in the same way" and therefore respond differently by design.
More specifically, solutions such as MongoDB are "not meant" to "emulate" the "relational join" structure that is present in many SQL and therefore "relational" backends, and that they are moreover intended to look at data "joins" in a very different way.
This arrives at your "questions" as follows:
There really is no set "rule", and understand that the "rules" of denormalization do not apply here for the basic reason of why NoSQL solutions exist. And that is to offer something "different" that may work well for your situation.
Is it bad? Is it Good? Both are subjective. Considering point "1" here, there is the basic consideration that "non-relational" or "NoSQL" databases are designed to do things "differently" than a relational system is. So therefore there is usually a "penalty" to "emulating joins" in a relational manner. Specifically for MongoDB this means "additional requests". But that does not mean you "cannot" or "should not" do that. Rather it is all about how your usage pattern works for your application.
Re-capping on the basic points made above, NoSQL in general is designed to solve problems that do not suit the traditional SQL and/or "relational" design pattern, and therefore replace them with something else. The "ultimate goal" here is for you to "rethink your data access patterns" and evolve your application to use a storage model that is more suited to how you access it in your application usage.
In short, there are no strict rules, and that is also part of the point in moving away from "nth-normal-form" rules. NoSQL solutions such as MongoDB allow for "nested structure" storage that typical SQL/Relational solutions do not provide in an efficient form.
Another side of this is considering that operations such as "joins" do not "scale" well over "big data" forms, therefore there exists the different way to "join" by offering concepts such as "embedded data structures", such as MongoDB does.
You would do well to real some guides on the subjects of how many NoSQL solutions approach storing and accessing data. This is ultimately what you need to decide on to determine which is best for you and your application.
At the end of the day, it should be about realising when a SQL/Relational model does not meet your needs, and then choosing something else.

PouchDB structure

i am new with nosql concept, so when i start to learn PouchDB, i found this conversion chart. My confusion is, how PouchDB handle if lets say i have multiple table, does it mean that i need to create multiple databases? Because from my understanding in pouchdb a database can store a lot of documents, but a document mean a row in sql or am i misunderstood?
The answer to this question seems to be surprisingly under-documented. While #llabball clearly gave a decent answer, I don't think that views are always the way to go.
As you can read here in the section When not to use map/reduce, Nolan explains that for simpler applications, the key is to abuse _ids, and leverage the power of allDocs().
In other words, if you had two separate types (say artists, and albums), then you could prefix the id of each type to obtain an easily searchable data set. For example _id: 'artist_name' & _id: 'album_title', would allow you to easily retrieve artists in name order.
Laying out the data this way will result in better performance due to not requiring extra indexes, and less code. Clearly however, if your data requirements are more complex, then views are the way to go.
... does it mean that i need to create multiple databases?
No.
... a document mean a row in sql or am i misunderstood?
That's right. The SQL table defines column header (name and type) - that are the JSON property names of the doc.
So, all docs (rows) with the same properties (a so called "schema") are the equivalent of your SQL table. You can have as much different schemata in one database as you want (visit json-schema.org for some inspiration).
How to request them separately? Create CouchDB views! You can get all/some "rows" of your tabular data (docs with the same schema) with one request as you know it from SQL.
To write such views easily the property type is very common for CouchDB docs. Your known name from a SQL table can be your type like doc.type: "animal"
Your view names will be maybe animalByName or animalByWeight. Depends on your needs.
Sometimes multiple-databases plan is a good option, like a database per user or even a database per user-feature. Take a look at this conversation on CouchDB mailing list.

Querying with Redis?

I've been learning Node.js so I decided to make a simple ad network, but I can't seem to decide on a database to use. I've been messing around with Redis but I can't seem to find a way to query the database by specific criteria, instead I can only get the value of a key or a list or set inside a key.
Am I missing something, or should I be using a more robust database like MongoDB?
I would recommend to read this tutorial about Redis in order to understand its concepts and data types. I also had problems to understand why there is no querying support similar to other (no) SQL databases until I read few articles and try to test and compare Redis with other solutions. Maybe it isn't the right database for your use case, although it is very fast and supports advanced data structures, but lacks querying which is crucial for you. If you are looking for a database which allows you to query your data then you should try mongodb or maybe riak.
Redis is often referred to as a data
structure server since keys can
contain strings, hashes, lists, sets
and sorted sets.
If able(easy to implement) you should use these primitives(strings,hashes,lists,set and sorted sets). The main advantage of Redis is that is lightning fast, but that it is rather primitive key-value store(redis is a little bit more advanced). This also means that it can not be queried like for example SQL.
It would probably be easier to use a more advanced store, like for example Mongodb, which is a document-oriented database. The trade-off you make in this case is PERFORMANCE, but I believe you should only tackle that if that is becoming a problem, which it probably will not be because Mongodb is also pretty fast and has the advantage that it can be queried. I think it would be advisable to have proper indexes for your queries(read>write) to make it fast.
I think that the main answer comes from the data structure. Check this article about NoSQL Data Modelling, for me it was very helpful: NoSql Data Modelling.
A second good article ever about Data Modeling, and making a comparison between SQL and NoSQL is the following: The Relational model anti pattern.

Resources