is database access a blocking operation with NodeJs - node.js

I went through this article and the following rose a question:
QUEUED INPUTS If you’re receiving a high amount of concurrent data,
your database can become a bottleneck. As depicted above, Node.js can
easily handle the concurrent connections themselves. But because
database access is a blocking operation (in this case), we run into
trouble.
Isn't Db access an asynchronous operation in Nodejs? E.g. I usually perform all possible data transformations using MongoDb aggregation to minimize impact on NodeJs. Or I get things wrong?

that is why callbacks came into picture. that is the actual use of callbacks sine we don't know how much time db will take to process the aggregation. Db access is asynchronous just because of callbacks .

Related

Is it possible to force Node.js/Express to process requests sequentially?

I took over a project where the developers were not fully aware of how Node.js works, so they created code accessing MongoDB with Mongoose which would leave inconsistent data in the database whenever you had any concurrent request reaching the same endpoint / modifying the same data. The project uses the Express web framework.
I already instructed them to implement a fix for this (basically, to use Mongoose transaction support with automatically managed retriable transactions), but due to the size of the project they will take a lot of time to fix it.
I need to put this in production ASAP, so I thought I could try to do it if I'm able to guarantee sequential processing of the incoming requests. I'm completely aware that this is a bad thing to do, but it would be just a temporary solution (with a low count of concurrent users) until a proper fix is in place.
So is there any way to make Node.js to process incoming requests in a sequential manner? I just basically don't want code from different requests to run interleaved, or putting it another way, I don't want non-blocking operations (.then()/await) to yield to another task and instead block until the asynchronous operation ends, so every request is processed entirely before attending another request.
I have an NPM package that can do this: https://www.npmjs.com/package/async-await-queue
Create a queue limited to 1 concurrent user and enclose the code that calls Mongo in wait()/end()
Or you can also use an async mutex, there are a few NPM packages as well.

MongoDB + NodeJS: MapReduce or manual calculation

I am creating a REST API in NodeJS that connects to MongoDB does a MapReduce and store the results on a different collection.
The code is pretty simple. It takes a User ID, gets all other users who are related to this user somehow using some algorithm, and then for each one, calculate a likeness percentage. Assuming there are 50k users in the test database, this MapReduce takes around 200-800ms. And that is ideal for me. If this were to get famous and have hundreds of concurrent requests like this, I'm pretty sure that will not be the case any more. I understand that MongoDB might need to be sharded as needed.
The other scenario is to just do a normal find(), loop over the cursor and do the same logic. It takes the same amount of time as MapReduce mind you. However, I just thought about this to try and put the heavy lifting of the calculations on the client side (NodeJS) and not on the server side like MapReduce. Does this idea even have merit? I thought that this way, I can scale APIs horizontally behind a load balancer or something.
It would be better to keep heavy lifting off of the server which processes each request and put it onto the database.
If you have 1000 requests and 200 of them require you to perform the calculation, 800 requests can be processed as normal by the server, so long as mongo does the the calculation with mapReduce or aggregation.
If you instead run the calculations manually on your node server, all requests will be affected by the server having to do the heavy lifting.
Mongo is also quite efficient at aggregation for sure and mapReduce also I would imagine.
I recently moved a ton of logic from my server onto mongoDB where I could and it made a world of difference.

node js performance for the individual user

A lot has been said about the high performance of node js, when compared to the multi-threaded model. But from the perspective of a single user, an http request is made, a db operation is issued, and the user will have to wait till the response from the database arrives. Sure, the event loop can in the meantime listen to other requests, but how does the individual user feel the speed benefit when it is the IO operation that takes most of the time?
From de user perspective the performance will be the same. But imagine a site that receive millions of request, the server would stuck, because the resources of the server are limited. With node it is possible improve the power of the server, because node will optmize the way that resources will be accessed, using queues with events, saving memory usage and put to cpu to work heavy.

When is blocking code acceptable in node.js?

I know that blocking code is discouraged in node.js because it is single-threaded. My question is asking whether or not blocking code is acceptable in certain circumstances.
For example, if I was running an Express webserver that requires a MongoDB connection, would it be acceptable to block the event loop until the database connection was established? This is assuming that all pages served by Express require a database query (which would fail if MongoDB was not initialized).
Another example would be an application that requires the contents of a configuration file before being initializing. Is there any benefit in using fs.readFile over fs.readFileSync in this case?
Is there a way to work around this? Is wrapping all the code in a callback or promise the best way to go? How would that be different from using blocking code in the above examples?
It is really up to you to decide what is acceptable. And you would do that by determining what the consequences of blocking would be ... on a case-by-case basis. That analysis would take into account:
how often it occurs,
how long the event loop is likely to be blocked, and
the impact that blocking in that context will have on usability1.
Obviously, there are ways to avoid blocking, but these tend to add complexity to your application. Really, you need to decide ... on a case-by-case basis ... whether that added complexity is warranted.
Bottom line: >>you<< need to decide what is acceptable based on your understanding of your application and your users.
1 - For example, in a game it would be more acceptable to block the UI while switching "levels" than during active play. Or for a general web service, "once off" blocking while a config file is loaded or a DB connection is established during webserver startup is more acceptable that if this happened on every request.
From my experience most tasks should be handled in a callback or by returning a promise. You DO NOT want to block code in a Node application. That's what makes it so nice! Mostly with MongoDB it will crash before it has a chance to connect if there is no connection. It won't' really have an effect on an API call because your server will be dead!
Source: I'm a developer at a bootcamp that teaches MEAN stack.
Your two examples are completely different. The distinction actually answers the question in and of itself.
Grabbing data from a database is dependent on being connected to that database. Any code that is dependent upon that data is then dependent upon that connection. These things have to happen serially for the app to function and be meaningful.
On the other hand, readFileSync will block ALL code, not just code that is reliant on it. You could start reading a csv file while simultaneously establishing a database connection. Once both are done, you could add that csv data to the database.

Minimize nodejs response time with large number of database calls

One of my api in nodejs does lot of CRUD operations are finally returns, sometimes it takes 10-15 seconds to return. How to offload this task somehow so the response time is quick. And how to handle error occured in offloaded task as response would have gone already?
Currently we are using async module for making parallel calls to cassandra. Some people suggest to use compute-cluster module in nodejs, but to me it would be helpful for computation intensive task. I think my problem heavy on i/o not computation.

Resources