Use Mongoose-Transactions over multiple databases - node.js

I am creating a Node.js API consisting of multiple Microservices.
Each Microservice is responsible for one or more features of my application. However, my data is structured into multiple databases which each have multiple collections.
Now I need one sevice to perform atomic operations across multiple databases. If everything happened in the same database, I'd use a normal transaction. However, I don't know how to do this with multiple databases or if this is even possible?
Example:
One of the Microservices takes care of creating users. A user must be
created inside two databases. However, this should happen atomically,
i.e. if the user is created, it must be created in both databases.
UPDATE: MongoDB's official docs state the following:
With distributed transactions, transactions can be used across
multiple operations, collections, databases, documents, and shards.
I haven't found anything on how to perform distributed transactions with mongoose though.
I would be extremely glad if someone could give me some clarification on this topic.

You need to use the SAGA pattern of the microservice architecture.
The SAGA pattern is divided into two types:
Choreography-based saga
Orchestration-based saga
If you want to manage distributed transactions from a single service, then you can use Orchestration-based saga (2).
So with this pattern, you can implement a distributed transaction that either executes a chain of actions or rolls back along the chain, using compensating transactions.
I also recommend studying the patterns of microservice architecture on this site and recommend the book.

EDIT: Mongoose support Distributed Transactions, because it's a client to MongoDB Server. Form Mongoose point of view, a distributed transaction is just a transaction.
According to this video, on Distributed Transactions in MongoDB
the Distributed Transactions is defined above the level of mongoose, and can use it.
in the documentation of mongodb, they say:
Distributed Transactions and Multi-Document Transactions Starting in
MongoDB 4.2, the two terms are synonymous. Distributed transactions
refer to multi-document transactions on sharded clusters and replica
sets. Multi-document transactions (whether on sharded clusters or
replica sets) are also known as distributed transactions starting in
MongoDB 4.2.
Here is how I would try to solve this (Divide-and-conquer):
Try simple example of Distributed Transactions with MongoDB
Then try using simple mongoose with Transactions (it might be that there is be no different between , Distributed Transactions and non- Distributed Transactions as far as mongoose knows, because the Transactions is in higher level – see video).
Then try to combine the 2 solutions and see it this works,
If does not work with mongoose, I would try to implement Distributed Transactions with MongoDB, as the video implay that they spent a lot of effort in this, and since mongoose just let you do things that you can also do with MongoDB alone. Moving from mongoose to MongoDB maybe not so simple, but implementing Distributed Transactions is very hard.

Related

Transaction mongodb

I need to write into two different mongodb collections using an 'all or nothing' process. Fyi I use NodeJs in my backend side.
As far as I know MongoDb provides atomicity when it comes to a single collection, but it does not when we need to write into multiple collections.
So I'd like to know a way of emulating this a transaction in nodejs/mongodb in order to avoid writing into one collection if the other failed and also getting the possibility of doing a 'roll back' if the second process fails.
Thank you guys!
Starting from version 4.0 MongoDB will add support for multi-document transactions. Transactions in MongoDB will be like transactions in relational databases.
For details visit this link:
https://www.mongodb.com/blog/post/multi-document-transactions-in-mongodb?jmp=community
I wrote a library that implements the two phase commit system mentioned above. It might help in this scenario. Fawn - Transactions for MongoDB
The transactions for multi-document have been introduced in MongoDB 4.0 !!!
https://docs.mongodb.com/manual/core/transactions
In MongoDB (prior to 4.0) there is no way you can fully implement transactions on database level. However, there are some mechanisms which provides some transactions functionality. You can read about them in documentation.
Since MongoDB 4.0, transactions are supported. Very little chage is needed in your current code to support them. There's a new section in the documentation fully dedicated to the subject

What are best practices for partitioning data in MongoDB?

I'm creating a social site using mean stack and I need some suggestions regarding mongoDB and mongoose.
I'm part of a startup and we decided to use these amazing technologies to fulfil our task.
Basically, I need some suggestions.
Currently, I have finished creation of simple CRUD and implemented local passport JS. I have currently one single collection in my mongoDB called users.
Our social site will have a blog, marketplace and many other pages (features) that will be related to a single user.
Since I never worked with mongoDB before, I'm curious if mongoDB should use one collection per user or have multiple collections for each feature.
To clarify it, let's say I use User model for user registration, blog model for blogs etc etc.
This would really mean a lot to me if you would shortly explain me how to structure my mongoose models, if all data should be inside one collection or if one user should have separate collections for different features. And if you recommend multiple collections, how do I then link these collections together and make sure that all data is saved for one user etc.
Thanks a lot in advance!
I will explain partitioning/dividing into two level.
Of course, you're going to create different collections for different models. Such as Users, Blogs, Messages etc.
Now comes the 2nd part, if we are talking about millions of data. How you partition them for faster data lookup.
For example, you have 1M users, which you are going to put in one big collection of 'Users'. But if you look for a user whose first name is 'Imdad' and age is 28, Now your query looks through these 1M items in your single Users collection, which will take a good amount of time.
To solve this problem, users collection can be divided into multiple collections through horizontal partition (Users1 (age between 10-20), Users2 (age between 20-30), Users3 (age between 30-40)). Now based on your query predicate monggoDB is to look up into a different collection/s. This is the idea that MongoDB has applied like other SQL DB. You don't have to explicitly execute your query to the chunk collection but the mongoDB itself take care of that.
Shard key generation
Mongoose shard key
If you are using mongoDB as a backend for a REST interface, the best practice is to create on collection per resource. For example, if you intend on having a /api/users endpoint, you should have users collection and it should contain any and everything you intend to return on that endpoint.
If you are using node to compile server-side templates, structure can be more flexible. In this case, the above still applies (as you will probably eventually want to expose a REST service), but there is more flexibility. In fact, if a many-to-many style relationship is appropriate, it is easier to separate these collections and load them together in the same page.
As an aside, you mention having users and a marketplace. The bigger issue than the separation of data into collections is the use of transactions. Any time you intend on performing a transaction of data, it should be performed within a SQL transaction. There are no notions of transactions in MongoDB. This is by design, as MongoDB is designed to be a fast, scaleable data store. It is not unreasonable to amalgamate SQL and noSQL data, in this case.

Is Mongodb's lack of transaction a deal breaker?

I've been doing some research but have reached the point where I think MongoDB/Mongoose (on Node.js) is not the right tool for the job. Here is the scenario...
Two documents: Account (money) information and Inventory information
Check if user's account has enough money
If so, check and deduct inventory
Deduct funds from Account Information
It seems like I really need a transaction system to prevent other events from altering the data in between steps.
Am I correct, or can this still be handled in MongoDB/Mongoose? If not, is there a NoSQL db that I should check out, preferably with Node.JS support?
Implementing transactional safety is usually tricky and requires more than just transactions on the database, e.g. if you need to communicate with external parties in a reliable fashion or if the transaction runs over minutes, hours or even days. But that's leading to far.
Anyhow, on the db side you can do transactions in MongoDB using two-phase commits, but it's not exactly trivial.
There's a ton of NoSQL databases with transaction support, e.g. redis, cassandra (using the Paxos protocol) and foundationdb.
However, this seems rather random to me because the idea of NoSQL databases is to use one that fits your particular problem. If you just need 'anything' with transactions, an SQL db might do the job, right?
You can always implement your own locking mechanism within your application to lock out other sections of the app while you are making your account and inventory checks and updates. That combined with findAndModify() http://docs.mongodb.org/manual/reference/command/findAndModify/#dbcmd.findAndModify may be enough for your transaction needs while also maintaining the flexibility of a NoSQL solution.
For the distributed lock I'd look at Warlock https://www.npmjs.org/package/node-redis-warlock I've not used it myself but it's node.js based and built on top of Redis, although implementing your own via Redis is not that hard to begin with.

Rate limiting - using CouchDB with Redis or CouchDB on its own

I've written an application with a CouchDB backend. I have invested a lot of time into CouchDB and so I'm reluctant to move everything over to a different NoSQL database (like Redis).
The problem is that I now need to implement a rate limiting (based on IP address) feature.
There are plenty of examples on how good Redis is for this kind of task, however because I don't want to drop CouchDB for other tasks this means I would essentially be running (and supporting) two databases (1 for most data, 1 for rate limiting) and so...
Is running CouchDB in tandem with Redis unheard of?
Is CouchDB itself suitable for handling rate limiting itself?
Is running CouchDB in tandem with Redis unheard of?
Redis is commonly used in complement with other storage solutions (MySQL, PostgreSQL, MongoDB, CouchDB, etc ...). Like many other NoSQL solutions, Redis is not adapted to all kind of workloads or situations. The authors of Redis are pragmatic and open people, and they routinely suggest to use other solutions rather than Redis, when they are more adapted to the situation.
Redis is therefore a good team player, and it is generally easy to integrate in an existing infrastructure.
Here is an example of usage of Redis with CouchDB.
Is CouchDB itself suitable for handling rate limiting itself?
CouchDB has a number of useful features to implement the rate limiting strategy described in Chris O'Hara's article. For instance, it supports bulk operations on several documents (with optional atomicity). A "bucket span" can be stored in a single document. In-place incrementation of counters can be covered by using update handlers.
IMO, the main missing feature would be automatic item expiration (which CouchDB does not provide AFAIK). So you would have to design a clever mechanism to get rid of obsolete data on top of CouchDB.
The main problem is CouchDB is not really designed for this kind of workload: it is a log structured document oriented database. Each time a counter has to be incremented, it would involve JSON unpacking/packing operations, some Javascript code to be executed, and writing a new revision of the whole document in append only files. You can find a good article describing how CouchDB stores its data here.
I suspect a rate limiting strategy implemented on top of CouchDB would not scale very well (too many I/Os, too much CPU consumption, inefficient network protocol). For instance, CouchDB is a RESTful server; I would not feel comfortable to initiate client HTTP operations (REST queries to CouchDB) to rate limit each incoming HTTP query of my system.
Redis is much more adapted to this kind of workload (fast, in-memory, no I/O, efficient client protocol, no JSON parsing/formatting, incrementations are native atomic operations, etc ...)
You can do rate limiting with Memcached - it has a nice counter increment command as you mention, plus obsolete data is automatically purged from the cache in due course, so it has all the benefits of Redis for this application without the annoying duplication of capability (and complexity) that running Redis on top of CouchDB would bring.
http://simonwillison.net/2009/jan/7/ratelimitcache/
You could add memcached to your own setup easily enough or you could investigate CouchBase whose current server product integrates a CouchDB derived database with Memcached compatibility baked in:
http://www.couchbase.com/memcached
Personally I dislike the way Couchbase forked from CouchDB, but for your application it might be a perfect fit.

What is the difference between Cassandra and CouchDB?

I'm looking at both projects and I can't really see the difference
from Cassandra Site:
Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store...Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.
from CouchDB Site:
Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API.
That said, I see the specific differences between each project as: access methods, written languages, etc. but to put AN EXAMPLE, when you talk about SOLR or Sphinx you know both are indexers with big differences but at the end are indexers.
Can I say here that Cassandra and CouchDB are non-relational databases that in some cases one can replace the other?
CouchDB is a document store. You put documents (JSON objects) in it and define views (indexes) over them. The objects can be arbitrarily complex with potentially deep structure. Further, they are not constrained to following some consistent schema.
Cassandra is a ragged-table key-value store. It just stores rows, each of which has a set of named columns grouped in to families with values. It sounds quite close to BigTable; BigTable doesn't require each row to have the same structure (unlike an SQL database). The values may have some structure, but this kind of store doesn't know anything about that -- they're just strings/byte sequences.
Yes, they are both non-relational databases, and there is probably a fair amount of overlap in their applicability, but they do have distinctly different data organization models. Each can probably be forced into emulating the other, but each model will map best to a different set of problems.
CouchDB has a feature present in very few open source database technologies: offline replication. CouchDB is designed so that applications can be run at the edge of the network. These applications are available even when internet connectivity fails.
Offline replication can also be leveraged to build large clusters, but CouchDB is designed to be robust and simple whether it is running on a single server, a datacenter, or even a smartphone.

Resources