DynamoDB joins with multiple tables - node.js

I am developing ERP system. I am using AWS lambda(Node) with Dynamo DB. I am new in dynamo. We are using multiple tables.
My one small use case is like this,
I have one company table in which company's address will be store. In that address fields there is city, state, country fields which will have primary key of master tables(City, state, country).
So I need to fetch city,state and country also when i fetch company details in single query. Also filter, sorting should be done from master table.
So can anyone help me with this that there is any way to implement this? And if possible also provide some node js query code for reference.
If possible provide some official doc so I can review that.
Thanks for any help :)

DynamoDB is a fantastic technology. However, it is completely different than SQL databases. As such, you'll need to approach the entire process differently than you would with a SQL database.
Entire books have been written on the subject, so I won't try to cover all the differences here. However, I'll pass along a few pointers to get you in the right direction.
Single table design is considered a best practice in DynamoDB. Having multiple tables isn't necessarily an anti-pattern, but spend some time investigating why this is the case. A good place to start: DynamoDB does not have a SQL join operation!
The DynamoDB Book is the best book on this topic, hands down. Spending just a few hours reading this book will drastically reduce the learning curve.
Watch AWS Re:Invent talks on DynamoDB. This talk from 2019 is a great place to start. (This talk is from Alex DeBrie, the author of the DynamoDB Book).
The rules of SQL databases do not apply when working with NoSQL databases. In other words, forget everything you know about SQL databases when working with NoSQL. It's a paradigm shift, and I find it best to set aside any preconceived notions about databases and approach the topic with an open mind. Easier said than done, but it's my advice anyway :)
DynamoDB requires that you build your data model to match the use cases of your application, aka your application "access patterns". Throwing your data into DynamoDB before thinking about how you need to access the data is the wrong approach. Instead, ask yourself "what are all the ways my application needs to get data?" and design your data model accordingly.
NoSQL data modeling has a steep learning curve, but it's awesome when you get it right! Good luck!

Related

Is Cassandra just a storage engine?

I've been evaluating Cassandra to replace MySQL in our microservices environment, due to MySQL being the only portion of the infrastructure that is not distributed. Our needs are both write and read intensive as it's a platform for exchanging raw data. A type of "bus" for lack of better description. Our selects are fairly simple and should remain that way, but I'm already struggling to get past some basic filtering due to the extreme limitations of select queries.
For example, if I need to filter data it has to be in the key. At that point I can't change data in the fields because they're part of the key. I can use a SASI index but then I hit a wall if I need to filter by more than one field. The hope was that materialized views would help with this but in another post I was told to avoid them, due to some instability and problematic behavior.
It would seem that Cassandra is good at storage but realistically, not good as a standalone database platform for non-trivial applications beyond very basic filtering (i.e. a single field.) I'm guessing I'll have to accept the use of another front-end like Elastic, Solr, etc. The other option might be to accept the idea of filtering data within application logic, which is do-able, as long as the data sets coming back remain small enough.
Apache Cassandra is far more than just a storage engine. Its design is a distributed database oriented towards providing high availability and partition tolerance which can limit query capability if you want good and reliable performance.
It has a query engine, CQL, which is quite powerful, but it is limited in a way to guide user to make effective queries. In order to use it effectively you need to model your tables around your queries.
More often than not, you need to query your data in multiple ways, so users will often denormalize their data into multiple tables. Materialized views aim to make that user experience better, but it has had its share of bugs and limitations as you indicated. At this point if you consider using them you should be aware of their limitations, although that is generally good idea for evaluating anything.
If you need advanced querying capabilities or do not have an ahead of time knowledge of what the queries will be, Cassandra may not be a good fit. You can build these capabilities using products like Spark and Solr on top of Cassandra (such as what DataStax Enterprise does), but it may be difficult to achieve using Cassandra alone.
On the other hand there are many use cases where Cassandra is a great fit, such as messaging, personalization, sensor data, and so on.

Mongoose Schema Design approach

I am new to NoSql databases. I am trying to build a project and stuck with the approach of whether to choose sql databases or NoSql Databases for the project.
The requirements of my project are a legal firm would have many clients and each client can have different matter Type such as Immigration, Conveyancing, Family and etc and each MatterType can also have different fields which are never constant and they can fairly change in future.
Due to this nature I thought Nosql databases might be a good choice as they are document based and I can add any new fields to the document structure instead of always adding new columns to a sql data table dynamically which is not a good approach ( atleast i think)
Can anyone please kindly suggest me or refer me to an article which can assist me in deciding my approach
To give my clarity into my question let me explain with an example
For a client name xyz and matterType Immigration I can have fields such as firstName,lastName,Dob at this moment but later on for the same client I might have to add Dependants and their details
For a client name def and matterType conveyancing I would have different fields but those fields should also be added dynamically depending on the matter Type
Thank you in advance
Regards
Anand
In my opinion, you shouldn't only consider this feature in other to decide between NoSql or RBMDS.
In fact, this flexibility sounds very good, but it might be dangerous, once systems tend to raise, then things can get out of hand.
I have a system where I use MongoDB, but even though, I decided creating a schema for my collections.
I would suggest you finish modeling, then after that, conclude if it's really necessary to use NoSql.
I would like to suggest you to look into postgres sql if you are expecting large datasets. It offers the advantages of no sql databases such as support for key value pair and also keeping a rigid data structure like sql databases. Following are links to a few articles which may help you decide which approach to choose:
NoSql vs Sql
postgres vs mongodb

How to arrange my Data in NoSQL (Invoices)

i'm walking my first steps with nosql databases, but so far my knowledge is very basic. I try to set up a database for a small invoice system.
In SQL i'd create 4 Tables: Products, Customers , Invoices, and a match table for Invoice and the produts.
But how to do this with nosql? Do i even build relations or just build 1 document for each invoice.
You should keep in mind that NoSQL design is not only based on data structure but also strongly on data function. So you should first ask yourself what kind of queries you need to do over your data and take it from there.
First figure out how far you want to go with denormalization and aggregation. For instance: what sets of data will often require to query or update at once? And try to keep that to a single document even if it means duplicating data from other entities (i.e. Storing customer data along with the invoice data).
So ask yourself why you want to use non relational databases, and how will you use that data. Then decide which modeling techniques to apply and how far. The highly scalable blog has a great article about NoSQL data modeling if you care to give it a read.
... or just build 1 document for each invoice.
Yes, do that for the beginning. Imagine your data in the CouchDB as read-only copy of your data in the relational database. The docs are like the result of your SQL queries.
Do i even build relations?
Of course you can, its the same as in your SQL tables. You including ids of foreign docs and name the property regarding to the relation you want to express e.g. doc.customer_id in an invoice doc can point to the doc._id of a customer doc.
Its helpful you imagine the CouchDB views as "relations" e.g. you can create a view called InvoicesByCustomer with the example above.
But summarized i would recommend to begin with the 1 document for each invoice.-approach and follow #JavoSN hint ...
So you should first ask yourself what kind of queries you need to do over your data and take it from there
... when you know that clearly its time to dig deeper into your possibilities of document designs.

Cassandra or mongodb or something else for big online sales site

Currently we are using mongodb as our primary store for big online sales site, and currently we are focusing ourselves on big scalability among multiple machines.
Site backend is written in node.js and we are using mongoose as ODM.
I can see many blog posts which are writing about awesome cassandra DB, and I am starting to think about switching to cassandra. But still I am not sure if this is a really good decision, because I didn't found any good ODM/ORM lib for cassandra and node.js (and writing raw queries can be pain. Also writing good tested ORM/ODM can be time consuming task). So I am not sure how much benefit will I have after this switch. We are using elasticsearch as search engine, and it works excellent in combination with mongodb, and I am asking my self will do also good with cassandra.
If you have any experiance with this, it will be very helpfull.
Thank you!
Cassandra is a very nicely designed database, which can fulfill a lot of scenarios. MongoDB is also a really good DB engine. So let me just compare couple of main bullet points for you.
Always on system
Cassandra is really great when you need to provide 24x7 operations in multiple data centers. If you got more then one datacenter with multiple servers in each of them then Cassandra is great for you. Cassandra can sync writes to more than one datacenter and maintain desired data consistency across complex set ups. Recovery and re-sync is also quite easy.
On the other note MongoDB is easy to operate. If you got one data center and only couple of servers it might be a perfect fit (although global write lock might be a pain over time). In simple deployments it's easy to maintain and monitor.
Scalability
To continue the above statements - Cassandra is linearly scalable. There is, literally, no limit of how big the cluster will be. Your writes will always stay fast, while reads might become more complicated over time - depending on the structure of your data.
Denormalization of data
With Cassandra your writes and reads can be extremely fast if you will create a structure that will reflect what you need to get from your data. There is no query language (well, there is, but it's not exactly SQL) that you can use to reorganize your result set using aggregates, groupings, etc. Yes, some things are doable and some not - that is very specific to Cassandra data model. You will have to implement a lot of things on your own and write the result to the DB - i.e. counters for aggregation, different groupings, etc.
In comparison MongoDB is easy to use, easier to learn and more flexible - both for development (as knowledge curve/efforts goes) and for implementation of business logic (as time/effort is considered). That is - kind of - a reason why there are ORM engines for MongoDB and only couple (very limited) for Cassandra.
To summarize - both DBs are really good... if you will embrace their limitations. If you got only 100GB of data and you need flexible, easy to implement DB engine I would stick to MongoDB, alternatively take a look RethinkDB which have a very similar model and way better (in my personal opinion) clustering/data center replication implementation.
Cassandra is a great option for you if you will need to store TBs of data soon, deploying your apps across multiple data centers while accepting the cost of additional efforts to implement the same features and maintaining similar capabilities.
Don't take it personally that I have used the word only while describing your data set. Yes, it's not big - my company stores more than 20 TB these days... so yeah, 100GB is really not that much...
To stop everyone from pointing that I should compare some other features or point out some other differences between those two - it's just a rough, high level overview on the things I consider relevant to the problem, not a full comparison or analysis of the problem. But feel free to point out what I have missed and I will be happy to include new stuff in this answer...

Querying with Redis?

I've been learning Node.js so I decided to make a simple ad network, but I can't seem to decide on a database to use. I've been messing around with Redis but I can't seem to find a way to query the database by specific criteria, instead I can only get the value of a key or a list or set inside a key.
Am I missing something, or should I be using a more robust database like MongoDB?
I would recommend to read this tutorial about Redis in order to understand its concepts and data types. I also had problems to understand why there is no querying support similar to other (no) SQL databases until I read few articles and try to test and compare Redis with other solutions. Maybe it isn't the right database for your use case, although it is very fast and supports advanced data structures, but lacks querying which is crucial for you. If you are looking for a database which allows you to query your data then you should try mongodb or maybe riak.
Redis is often referred to as a data
structure server since keys can
contain strings, hashes, lists, sets
and sorted sets.
If able(easy to implement) you should use these primitives(strings,hashes,lists,set and sorted sets). The main advantage of Redis is that is lightning fast, but that it is rather primitive key-value store(redis is a little bit more advanced). This also means that it can not be queried like for example SQL.
It would probably be easier to use a more advanced store, like for example Mongodb, which is a document-oriented database. The trade-off you make in this case is PERFORMANCE, but I believe you should only tackle that if that is becoming a problem, which it probably will not be because Mongodb is also pretty fast and has the advantage that it can be queried. I think it would be advisable to have proper indexes for your queries(read>write) to make it fast.
I think that the main answer comes from the data structure. Check this article about NoSQL Data Modelling, for me it was very helpful: NoSql Data Modelling.
A second good article ever about Data Modeling, and making a comparison between SQL and NoSQL is the following: The Relational model anti pattern.

Resources