How do i avoid blocking an express rest service? - node.js

When making a REST service using express in node, how do i prevent a blocking task from blocking the entire rest service? Take as example the following express rest service:
const express = require('express');
const app = express();
app.get('/', (req, res) => res.send('Hello, World'));
const blockService = async function () {
return new Promise((resolve, reject) => {
const end = Date.now() + 20000;
while (Date.now() < end) {
const doSomethingHeavyInJavaScript = 1 + 2 + 3;
}
resolve('I am done');
});
}
const blockController = function (req, res) {
blockService().then((val) => {
res.send(val);
});
};
app.get('/block', blockController);
app.listen(3000, () => console.log('app listening on port 3000'));
In this case, a call to /block will render the entire service unreachable for 20 seconds. This is a big problem if there are many clients using the service, since no other client will be able to access the service for that time. This is obviously a problem of the while loop being blocking code, and thus hanging the main thread. This code might be confusing, since, despite using a promise in blockService, the main thread still hangs. How do i ensure that blockService will run a worker-thread and not the event-loop?

By default node.js runs your Javascript code in a single thread. So, if you really have CPU intensive code in a request handler (like you show above), then that is indeed a problem. Your options are as follows:
Start up a Worker Thread and run the CPU-intensive code in a worker thread. Since version 10, node.js has had worker threads for this purpose. You then communicate back the result to the main thread with messaging.
Start up any other process that runs node.js code or any type of code and compute the result in that other process. You then communicate back the result to the main thread with messaging.
Use node clustering to start N processes so that if once process is stuck with a CPU intensive operation, at least one of the others is hopefully free to run other requests.
Please note that a lot of things that servers do like read files, do networking, make requests to databases are all asynchronous and non-blocking so it's not incredibly common to actually have lots of CPU intensive code. So, if this is just a made up example for your own curiosity, you should make sure you actually have a CPU-intensive problem in your server before you go designing threads or clusters.

Node.js is an event-based model that uses a single runtime thread. For the reasons you've discovered, Node.js is not a good choice for CPU bound tasks (or synchronously blocking tasks). Node.js works best for coordinating I/O asynchronously.
worker-threads were released in Node.js v12. This allows you to use another thread for blocking tasks. They are relatively simple to use and could work if you absolutely need the offload blocking tasks.

Related

Why making many requests on NodeJS is slow?

I set up a local express server with:
const express = require('express');
const app = express();
app.get('/test', (request, response) => {
response.sendStatus(200);
});
const port = 3000;
app.listen(port, () => {});
Then I ran a script with:
const axios = require('axios');
async function main() {
console.time('time');
const requests = Array(5000).fill().map(() => axios.get('http://localhost:3000/test'));
await Promise.all(requests);
console.timeEnd('time');
}
main();
And my question is why this script takes 3 seconds on my machine?
I'd expect it to take a few milliseconds just like with any other for loop of 5000 iterations.
Because I'm running the server locally and calling it via localhost, I expect no latency, therefore, the waiting time for the promises should be almost 0.
Can anyone explain to me what's going on?
Also, how can I do many requests at the same time faster?
EDIT
Looking here https://stressgrid.com/blog/webserver_benchmark/ I'd expect my single process node server to be able to handle at least 20k requests concurrently without any delay.
So I'm guessing there is some configuration missing on my machine. Maybe some flag when starting the node server?
3 things:
That benchmark is not properly setup.
Express is the slowest of all NodeJS web frameworks.
Your machine might be misconfigured.
You can find better benchmarks and a comparison of different frameworks here: https://www.fastify.io/benchmarks/
Their github repo explains all the setup they've done, so you can compare your machine against theirs too.
1. Benchmarking
To put it plainly, the benchmark you set up is not valid. It doesn't reproduce any real world scenario, and is not optimized for the synthetic scenario it creates.
Just to exemplify, since on Node everything is single threaded, you'd have better performance running requests serially so that connections can be reused (would also need to change your request framework to one that can reuse connections). HTTP 1 doesn't reuse connections if you issue requests in parallel, AND your client isn't setup to reuse connections anyways.
Let's take a look at what results look like after fixing that. On my machine, the benchmark you posted doesn't even run--node crashes if you try to open that many connections simultaneously on the same port. This version has about the same theoretical performance as your benchmark, and it runs:
const axios = require("axios");
async function main() {
console.info(process.hrtime.bigint() / 1000000n + "ms");
for (let i = 0; i < 5000; ++i) {
await axios.get("http://localhost:3000/test");
}
console.info(process.hrtime.bigint() / 1000000n + "ms");
}
main();
That takes around 3 seconds on my machine (about the same time as yours). Now let's reuse connections:
const axios = require("axios");
const http = require("http");
async function main() {
const httpAgent = new http.Agent({ keepAlive: true });
console.info(process.hrtime.bigint() / 1000000n + "ms");
for (let i = 0; i < 5000; ++i) {
await axios.get("http://localhost:3000/test", { httpAgent });
}
console.info(process.hrtime.bigint() / 1000000n + "ms");
}
main();
This takes 800ms.
There's a lot of other details like this that your benchmark misses. I can't summarize all of them. You can compare your benchmark to Fastify's (linked above) to see how each difference impacts your measurement.
2. Frameworks
Express has its popularity for being simple, but it is not a fast framework. Take a look at more modern ones such as Koa or Fastify. Note that your app likely will do much more than just serve an empty page, so performance of your web framework is likely not important. That said, I don't think anyone should be using express in 2021 if they have a choice anyways, since their developer experience is also outdated (eg there's no support for awaiting a request within a middleware).
3. Local Machine
It could also just be that your computer is slow, etc. That's another reason to start by rerunning a standardized benchmark instead of creating your own.
Define slow to begin with. You have Array(5000).fill() which we can interpreted as reserved me 5000 slots in memory for me in other word you do a for loop of 5000 then you do 5000 request so that means 10,000 looping. Do the same 10,000 looping on java and compare then tell me if JavaScript is slow.
Also I don’t know if you have, but axios has quite a few internal validations

Concurrency in node js express app for get request with setTimeout

Console log Image
const express = require('express');
const app = express();
const port = 4444;
app.get('/', async (req, res) => {
console.log('got request');
await new Promise(resolve => setTimeout(resolve, 10000));
console.log('done');
res.send('Hello World!');
});
app.listen(port, () => {
console.log(`Example app listening at http://localhost:${port}`);
});
If I hit get request http://localhost:4444 three times concurrently then it is returning logs as below
got request
done
got request
done
got request
done
Shouldn't it return the output in the below way because of nodes event loop and callback queues which are external to the process thread? (Maybe I am wrong, but need some understanding on Nodes internals) and external apis in node please find the attached image
Javascript Run time environment
got request
got request
got request
done
done
done
Thanks to https://stackoverflow.com/users/5330340/phani-kumar
I got the reason why it is blocking. I was testing this in chrome. I am making get requests from chrome browser and when I tried the same in firefox it is working as expected.
Reason is because of this
Chrome locks the cache and waits to see the result of one request before requesting the same resource again.
Chrome stalls when making multiple requests to same resource?
It is returning the response like this:
Node.js is event driven language. To understand the concurrency, you should look a How node is executing this code. Node is a single thread language(but internally it uses multi-thread) which accepts the request as they come. In this case, Node accepts the request and assign a callback for the promise, however, in the meantime while it is waiting for the eventloop to execute the callback, it will accept as many request as it can handle(ex memory, cpu etc.). As there is setTimeout queue in the eventloop all these callback will be register there and once the timer is completed the eventloop will exhaust its queue.
Single Threaded Event Loop Model Processing Steps:
Client Send request to the Node.js Server.
Node.js internally maintains a limited(configurable) Thread pool to provide services to the Client Requests.
Node.js receives those requests and places them into a Queue that is known as “Event Queue”.
Node.js internally has a Component, known as “Event Loop”. Why it got this name is that it uses indefinite loop to receive requests and process them.
Event Loop uses Single Thread only. It is main heart of Node JS Platform Processing Model.
Event Loop checks any Client Request is placed in Event Queue. If not then wait for incoming requests for indefinitely.
If yes, then pick up one Client Request from Event Queue
Starts process that Client Request
If that Client Request Does Not requires any Blocking IO Operations, then process everything, prepare response and send it back to client.
If that Client Request requires some Blocking IO Operations like interacting with Database, File System, External Services then it will follow different approach
Checks Threads availability from Internal Thread Pool
Picks up one Thread and assign this Client Request to that thread.
That Thread is responsible for taking that request, process it, perform Blocking IO operations, prepare response and send it back to the Event Loop
You can check here for more details (very well explained).

In Node js, what happens if a new request arrives and event loop is already busy processing a request?

I have this file named index.js:
const express = require('express')
const app = express()
const port = 3000
app.get('/home', (req, res) => {
res.send('Hello World!')
})
app.get('/route1', (req, res) => {
var num = 0;
for(var i=0; i<1000000; i++) {
num = num+1;
console.log(num);
}
res.send('This is Route1 '+ num)
})
app.listen(port, () => console.log(`Example app listening on port ${port}!`))
I first call the endpoint /route1 and then immediately the endpoint /home. The /route1 has for loop and takes some time to finish and then /home runs and finishes. My question is while app was busy processing /route1, how was the request to /home handled, given node js is single threaded?
The incoming request will be queued in the nodejs event queue until nodejs gets a chance to process the next event (when your long running event handler is done).
Since nodejs is an event-driven system, it gets an event from the event queue, runs that event's callback until completion, then gets the next event, runs it to completion and so on. The internals of nodejs add things that are waiting to be run to the event queue so they are queued up ready for the next cycle of the event loop.
Depending upon the internals of how nodejs does networking, the incoming request might be queued in the OS for a bit and then later moved to the event queue until nodejs gets a chance to serve that event.
My question is while app was busy processing /route1, how was the request to /home handled, given node js is single threaded?
Keep in mind that node.js runs your Javascript as single threaded (though we do now have Worker Threads if you want), but it does use threads internally to manage things like file I/O and some other types of asynchronous operations. It does not need threads for networking, though. That is managed with actual asynchronous interfaces from the OS.
Nodejs has event loop and event loop allows nodejs to perform non blocking I/O operation. Each event loop iteration is called a tick. There are different phases of the event loop.
First is timer phase, since there are no timers in your script event loop will go further to check I/O script.
When you hit route /route1, Node JS Web Server internally maintains a Limited Thread pool to provide services to the Client Requests. It will be placed in FIFO queue then event loop will go further to polling phase.
Polling phase will wait for pending I/O, which is route /route1. Even Loop checks any Client Request is placed in Event Queue. If no, then wait for incoming requests for indefinitely.
Meanwhile next I/O script arrives in FIFO queue which is route /home.
FIFO means, first in first out. Therefore first /route1 will get execute the route /home
Below you can see this via diagram.
A Node.js application runs on single thread and the event loop also runs on the same thread
Node.js internally uses the libuv library which is responsible for handling operating system related tasks, like asynchronous I/O based operation systems, networking, concurrency.
More info
Node has an internal thread pool from which a thread is assigned when a blocking(io or memeory or network) request is sent. If not, then the request is processed and sent back as such. If the thread pool is full, the request waits in the queue. Refer How, in general, does Node.js handle 10,000 concurrent requests? for more clear answers.

Why clusters don't work when requesting the same route at the same time in Express Node JS?

I wrote a simple express application example handling 2 GET routes. The first route contains a while loop which represent a blocking operation in 5 seconds.
The second route is simply return a Hello world text.
Also I set up a cluster following the simple guide on Node JS documentation.
Result of what I've tried:
Make 2 requests to 2 different routes at the same time => They work dependently as expected. Route / took 5 seconds and route /hello took several ms.
Make 2 requests to the same route / at the same time => They work synchronously, one responds after 5 seconds and the other after 10 seconds.
const cluster = require("cluster");
const express = require("express");
const app = express();
if (cluster.isMaster) {
cluster.fork();
cluster.fork();
} else {
function doWork(duration) {
const start = Date.now();
while (Date.now() - start < duration) {}
}
app.get("/", (req, res) => {
doWork(5000);
res.send("Done");
});
app.get("/hello", (req, res) => {
res.send("Hello world");
});
app.listen(3000);
}
I expect it would handle 2 requests of the same route in parallel. Can anyone explain what is going on?
I expect it would handle 2 requests of the same route in parallel. Can
anyone explain what is going on?
This is not the case as you have created two instances of server (two event loops, using cluster.fork()) ,so each of this request gets executed in different event loops (Server instance ) and the /hello will give you prompt request, whereas / request still wait for 5 seconds to send response.
Now if you haven't created cluster ,then the / request would have blocked the event loop and until it gets executed (Sends the response to browser ) /hello wouldn't have executed.
/ will take 5 seconds time to execute because you are blocking the event loop it executes in ,so whether you create single event loop or two event loops (using fork()) it will execute after 5 seconds
I tried your scenario in two different browsers and both request took 5.05 seconds(Both executed by different worker threads at same time)
const cluster = require("cluster");
const express = require("express");
const app = express();
if (cluster.isMaster) {
cluster.fork();
cluster.fork();
} else {
function doWork(duration) {
const start = Date.now();
while (Date.now() - start < duration) {}
}
app.get("/", (req, res) => {
console.log("Cluster ID",cluster.worker.id); // publish the workerid
doWork(5000);
res.send("Done");
});
app.listen(3000);
}
But with same browser ,the request always went to one worker thread, which executes the second request only after it has executed first ,So I guess its all about how the requests are distributed among worker threads created by cluster.fork()
As quoted from node docs
The cluster module supports two methods of distributing incoming
connections.
The first one (and the default one on all platforms except Windows),
is the round-robin approach, where the master process listens on a
port, accepts new connections and distributes them across the workers
in a round-robin fashion, with some built-in smarts to avoid
overloading a worker process.
The second approach is where the master process creates the listen
socket and sends it to interested workers. The workers then accept
incoming connections directly.
Node.js does not provide routing logic. It is, therefore important to
design an application such that it does not rely too heavily on
in-memory data objects for things like sessions and login.
I ran your code, first response came after 5 seconds and the other after 8 seconds, so clusters are working. Find out the number of cores of your machine using the below code. If it ones, then there is only one main thread.
const cpuCount = require('os').cpus().length;
It happens due to the cleverness of the modern browsers. If you make the same request in two different tab at the same time, the browser notice that and it wait to finish it and use the cache data of the first request to response the second request. No matter you use the clusters or how many fork().
To get rid out of this, simply disable cache in the network tab just shown as below:
Disable Cache

NodeJS server run individual process for each request against queue

See this example node.js code:
const http = require('http');
const server = http.createServer(function (req, res) {
if (req.url === '/loop') {
console.log('LOOP');
while (true) {}
}
res.write('Hello World');
res.end();
});
server.listen(3000);
In my script each request takes 3 to 5 seconds to process. while (true) {} is just for example.
But, here nodejs not processing another request when one request in process.
I want to run multiple requests at same time. But, server is running only one request at one time.
NOTE: I don't like to open cluster or child_process for each request. Because nodejs takes another 65 ms for starting cluster or child_process.
When you create server ( and listens ) nodejs creates an eventloop in which it process the request, you will not be able to use a infinite loop in it, since it will block the eventloop in which your server is running.
I hope you are not dealing with an infinite loop, but a certain process that takes time, for that you make use of, modules like async
in request/res function block use async module like this,
async.map(['param1','param2','param3'], task, function(err, results) {
// results of task function
});
what it does is that it will make use of already running eventloop and run the process.
Point to Note :
Most Javascript VMs are single threaded ( including NodeJS ) hence you can also make use of setTimeout function instead of an infinite while loop
You will not be able to create a thread in NodeJS instead use any process based solution like cluster or childprocess ( single threaded VM )

Resources