NodeJS framework to properly handle ACID transactions and database concurrency - node.js

I am a beginner in NodeJS world coming from several years working with relational databases in Java / Hibernate.
I would like to use Node for a project, and have spent some time researching about frameworks / ORMs that handle proper database transactions/concurrency such as:
Ensure ACID transaction blocks (operations set is completely executed or not executed at all)
Deal with concurrency, i.e. leveraging strategies as optimistic / pessimistic locking
I've looked into some promising ORMs like Sequelize and Waterline
as the most promising ones.
Waterline looks good, but lacks both features mentioned above.
Sequelize looks much more comprehensive having proper ACID transaction handling. Support for locking and concurrency is absent.
I would like to ask to NodeJS experts about specific patterns or strategies as well as any modules implementing how to deal with a highly concurrent load at database level, cleanly retry failed transactions or ensure data integrity in a HA system.

Related

Use Mongoose-Transactions over multiple databases

I am creating a Node.js API consisting of multiple Microservices.
Each Microservice is responsible for one or more features of my application. However, my data is structured into multiple databases which each have multiple collections.
Now I need one sevice to perform atomic operations across multiple databases. If everything happened in the same database, I'd use a normal transaction. However, I don't know how to do this with multiple databases or if this is even possible?
Example:
One of the Microservices takes care of creating users. A user must be
created inside two databases. However, this should happen atomically,
i.e. if the user is created, it must be created in both databases.
UPDATE: MongoDB's official docs state the following:
With distributed transactions, transactions can be used across
multiple operations, collections, databases, documents, and shards.
I haven't found anything on how to perform distributed transactions with mongoose though.
I would be extremely glad if someone could give me some clarification on this topic.
You need to use the SAGA pattern of the microservice architecture.
The SAGA pattern is divided into two types:
Choreography-based saga
Orchestration-based saga
If you want to manage distributed transactions from a single service, then you can use Orchestration-based saga (2).
So with this pattern, you can implement a distributed transaction that either executes a chain of actions or rolls back along the chain, using compensating transactions.
I also recommend studying the patterns of microservice architecture on this site and recommend the book.
EDIT: Mongoose support Distributed Transactions, because it's a client to MongoDB Server. Form Mongoose point of view, a distributed transaction is just a transaction.
According to this video, on Distributed Transactions in MongoDB
the Distributed Transactions is defined above the level of mongoose, and can use it.
in the documentation of mongodb, they say:
Distributed Transactions and Multi-Document Transactions Starting in
MongoDB 4.2, the two terms are synonymous. Distributed transactions
refer to multi-document transactions on sharded clusters and replica
sets. Multi-document transactions (whether on sharded clusters or
replica sets) are also known as distributed transactions starting in
MongoDB 4.2.
Here is how I would try to solve this (Divide-and-conquer):
Try simple example of Distributed Transactions with MongoDB
Then try using simple mongoose with Transactions (it might be that there is be no different between , Distributed Transactions and non- Distributed Transactions as far as mongoose knows, because the Transactions is in higher level – see video).
Then try to combine the 2 solutions and see it this works,
If does not work with mongoose, I would try to implement Distributed Transactions with MongoDB, as the video implay that they spent a lot of effort in this, and since mongoose just let you do things that you can also do with MongoDB alone. Moving from mongoose to MongoDB maybe not so simple, but implementing Distributed Transactions is very hard.

Relational Database User trying to understand Non-Relational and how to implement CRUD

I'm currently involved in a app project, and I'm incharge of setting up the backend.
What i'm use to using is a MYSQL database + php for cleaning and managing the data sent to and fro the front end, which I have much more experience in. However, because of certain preferences of my bosses, on this project I've found myself looking at IBMs Bluemix and Cloudant software. Cloudant is a NoSQL database(like CouchDB) and my experience regarding noSQL is severely lacking. All I've mananged to do so far is to create a few JSON documents, and some basic views
What I need to figure out is how to perform the CRUD(create,read,update,delete) actions on a NoSQL database, or at least what it would look like.
In addition to this, I need to know if there are ways to implement security measures(implement security and anti-hacking functions) on a NoSQL database without an external source, or will I need to learn how to reroute the data through some sort of php function first, if i want it cleaned, before sending it to the Cloudant server where my database sits.
Let me know if my attempt to explain my problem is lacking in clarity. I'll try my best to state a different way, if need be.
Generally speaking, there is nothing equivalent to an ANSI to NoSQL databases. In other words, NoSQL databases are not as standardized as SQL databases. All standards are starting to appear. You can think of it as a technology still in the making.
What you have in general is an API with methods such as put_record or delete_record, or a REST interface that is logically equivalent. Also, in general you CRUD the whole record, not parts of the record.
Take a look at the reference: Cloudant - Reading and Writing
Having that said, in your case I would recommend abstracting away from the specific implementation of the NoSQL you want to use if you care about avoiding vendor lock-in. So I would suggest you to wrap CRUD functions using PHP functions that later can be replaced if you want to change the NoSQL database flavor.
This approach has the additional advantage to provide an abstraction for you to implement your own security. Some important NoSQL databases have no concept of multi-tenancy or just implemented that. Again, it is a technology in the making.
When your mindset is the relational one, you tend to think of the database as something that will help you guarantee data consistency as much as possible. But NoSQL databases are not like that. Think of them as a simple repository of documents (in a JSON or XML structure, for instance), without cross references.
Then the obvious question is perhaps: why would anyone want such a thing? One of the possible answers is because NoSQL databases may hold an aggregate of consolidated data. You can then retrieve aggregates to save time reprocessing or re-retrieving data unnecessarily.
As for security, most (if no all) NoSQL databases have some pretty good authentication mechanisms.

Wiring up Hapi / node.js & Postgresql in non-ORM enterprise scenario

I'm using Hapi to implement a backend for a angularJS-based web application providing access to non-trivial workflows.
The database behind the backend is postgresql. Because the data is also used by other components, I have limited control over the schema (I can add tables, views, columns etc., but I cannot restructure everything to fit an ORM). The workflows must be atomic, so I need to be able to do SELECT... FOR UPDATE to avoid transactionality / locking issues. Optimistic locking would also be an option, but doesn't seem necessary so far.
In my ideal world, there would be a hapi plugin providing
- generic reading of JavaScript objects from query results: I do a SELECT * FROM myview, I get a JavaScript object for every result row, and the columns have miraculously turned into fields
- a way to save me from worrying about parameter escaping - e.g. I do 'WHERE column=%s' and a parameter and it just works
- requests are automagically wrapped in pg transactions - and there is a hook to retry if the commit went wrong (especially for optimistic locking)
I have looked at all the nodejs and hapi modules/plugins I could find and that seemed relevant to the issue, but they all seem to either
- leave me having to worry about a lot of low-level stuff
- require me to deal with any errors at every single db call
- not support ALL pg features I might require (views, stored procedures - I don't really want to be limited to a subset of SQL)
But then - I lack practical experience with this scenario so far (though I've built plenty of backends with postgresql, usually in conjunction with either Java EE or python/django).
I like my business logic to deal with business logic - and not be interspersed with low-level stuff. These things should be separated in a clean architecture.
What is a good way to achieve that in the described scenario?

Are relational databases a poor fit for Node.js?

Recently I've been playing around with Node.js a little bit. In my particular case I wound up using MongoDB, partly because it made sense for that project because it was very simple, and partly because Mongoose seemed to be an extremely simple way to get started with it.
I've noticed that there seems to be a degree of antipathy towards relational databases when using Node.js. They seem to be poorly supported compared to non-relational databases within the Node.js ecosystem, but I can't seem to find a concise reason for this.
So, my question is, is there a solid technical reason why relational databases are a poorer fit for working with Node.js than alternatives such as MongoDB?
EDIT: Just want to clarify a few things:
I'm specifically not looking for details relating to a specific application I'm building
Nor am I looking for non-technical reasons (for example, I'm not after answers like "Node and MongoDB are both new so developers use them together")
What I am looking for is entirely technical reasons, ONLY. For instance, if there were a technical reason why relational databases performed unusually poorly when used with Node.js, then that would be the kind of thing I'm looking for (note that from the answers so far it doesn't appear that is the case)
No, there isn't a technical reason. It's mostly just opinion and using NoSQL with Node.js is currently a popular choice.
Granted, Node's ecosystem is largely community-driven. Everything beyond Node's core API requires community involvement. And, certainly, people will be more likely to support what aligns with their personal preferences.
But, many still use and support relational databases with Node.js. Some notable projects include:
mysql
pg
sequelize
I love Node.js, but with Node it actually makes more sense to use a RDBMs, as opposed to a non-relational DB. With a noSQL/non-relational solution you often need to do manual joins in your Node.js code and sometimes work with a lack of transactions, a technical feature of RDBMs that have commit/rollback features. Here are some potential problems with using Non-Relational DBs + Node.js servers:
(a) the joins are slower and responses are slower, because Node is not C/C++
(b) the expensive joins block your
event loop, because the join is happening in your Node.js code not on some database server
(c) manually writing joins is often difficult and error-prone; your
noSQL queries could easily be incorrect or your join code might be
incorrect or suboptimal; optimized joins have been done before by the masters of
RDBMs, and joins in RDBMs are proven to be correct, mathematically in most cases.
(d) Some non-relational databases, like MongoDB, do not support transactions - in my team's case, that means we have to use an external distributed lock so that multiple queries can be grouped together into an atomic transaction. It would be somewhat easier if we could just use transactions and avoid application level locks.
with a more powerful relational database system that can do optimized joins in C/C++ on the database server rather than in your Node.js code, you let your Node.js server do what it's best at.
With that being said, I think it's pretty f*ing stupid that many major noSQL vendors don't support joins (?) Complete de-normalization is only a dream as far as I can see it. And the lack of transactions can be a bit weird. Without transactions, only one query is atomic, you cannot make multiple queries atomic without an application level locking mechanism :/
Take-aways:
If you want non-relational persistence - why not simply de-normalize a relational database? There is nobody forcing you to use a traditional database in a relational manner.
If you use a relational DB with Node.js I recommend this ORM:
https://github.com/typeorm/typeorm
As an aside, I prefer the term "non-relational" as opposed to "noSQL".
In my experience node tends to be popular with databases that have a stateless API, this fits very nicely into nodes async nature. Most relational databases utilize stateful connections for transactions, this minimizes the primary advantages of async non-block i/o.
Can you explain exactly what specific problems you are facing with your chosen database and node.js?
A few reasons why MongoDB could be more popular than relational databases:
MongoDB is essentially a JSON object store, so it translates very well for a javascript application. MongoDB functions are javascript functions.
I am just guessing here, but since NoSQL databases are newer and have more enthusiastic programmers experimenting with it, you probably have more involvement in those NPM modules.
Apart from this, Node.js technically is a perfect choice for any sort of database application. I have personally worked on a small Node.js/MySQL application and I didn't face any hurdles.
But back to my main point, we could talk about this all day, and that is not what this forum is for. If you have any specific issues in any code with Node.js and your database of choice, please ask those questions instead.
Edit: Strictly technical reasons, apart from the JSON compatibility on both sides: There are none.
Anyone wondering about the same question in 2021-
Node has nothing to do with type of databse you choose.
You can choose database of your choice as per your requirement.
If you need to maintain strict data structure then choose relational db, else you can go for NO-SQL.
There are NPM packages for PostgreSQL, MySql and other db which are non-blocking. These db clients will not block the Node process while performing queries.

Rate limiting - using CouchDB with Redis or CouchDB on its own

I've written an application with a CouchDB backend. I have invested a lot of time into CouchDB and so I'm reluctant to move everything over to a different NoSQL database (like Redis).
The problem is that I now need to implement a rate limiting (based on IP address) feature.
There are plenty of examples on how good Redis is for this kind of task, however because I don't want to drop CouchDB for other tasks this means I would essentially be running (and supporting) two databases (1 for most data, 1 for rate limiting) and so...
Is running CouchDB in tandem with Redis unheard of?
Is CouchDB itself suitable for handling rate limiting itself?
Is running CouchDB in tandem with Redis unheard of?
Redis is commonly used in complement with other storage solutions (MySQL, PostgreSQL, MongoDB, CouchDB, etc ...). Like many other NoSQL solutions, Redis is not adapted to all kind of workloads or situations. The authors of Redis are pragmatic and open people, and they routinely suggest to use other solutions rather than Redis, when they are more adapted to the situation.
Redis is therefore a good team player, and it is generally easy to integrate in an existing infrastructure.
Here is an example of usage of Redis with CouchDB.
Is CouchDB itself suitable for handling rate limiting itself?
CouchDB has a number of useful features to implement the rate limiting strategy described in Chris O'Hara's article. For instance, it supports bulk operations on several documents (with optional atomicity). A "bucket span" can be stored in a single document. In-place incrementation of counters can be covered by using update handlers.
IMO, the main missing feature would be automatic item expiration (which CouchDB does not provide AFAIK). So you would have to design a clever mechanism to get rid of obsolete data on top of CouchDB.
The main problem is CouchDB is not really designed for this kind of workload: it is a log structured document oriented database. Each time a counter has to be incremented, it would involve JSON unpacking/packing operations, some Javascript code to be executed, and writing a new revision of the whole document in append only files. You can find a good article describing how CouchDB stores its data here.
I suspect a rate limiting strategy implemented on top of CouchDB would not scale very well (too many I/Os, too much CPU consumption, inefficient network protocol). For instance, CouchDB is a RESTful server; I would not feel comfortable to initiate client HTTP operations (REST queries to CouchDB) to rate limit each incoming HTTP query of my system.
Redis is much more adapted to this kind of workload (fast, in-memory, no I/O, efficient client protocol, no JSON parsing/formatting, incrementations are native atomic operations, etc ...)
You can do rate limiting with Memcached - it has a nice counter increment command as you mention, plus obsolete data is automatically purged from the cache in due course, so it has all the benefits of Redis for this application without the annoying duplication of capability (and complexity) that running Redis on top of CouchDB would bring.
http://simonwillison.net/2009/jan/7/ratelimitcache/
You could add memcached to your own setup easily enough or you could investigate CouchBase whose current server product integrates a CouchDB derived database with Memcached compatibility baked in:
http://www.couchbase.com/memcached
Personally I dislike the way Couchbase forked from CouchDB, but for your application it might be a perfect fit.

Resources