Node.js asynchronous call handling and multi-core scaling

Node.js asynchronous call handling and multi-core scaling - node.js

It is known that node.js internally handles asynchronous calls and the programmer never needs to care about what is going on in the backstage. As far as I know, even if everyone says that node.js is only single thread, internally v8/libuv libraries are spawning threads to handle the execution of the async fragments of the program.
My question is if those threads are spawned, are they scaling the multicore architectures? I mean If I have a cpu with 4 cores and my main node thread is running on one of those CPU's, will those internally spawned threads scale to the other three CPU's and not remain on the same CPU. Theoretically they should scale but since everyone says node.js out-of-box is not using multiple cores, I thought this is worth asking.

Node.js deals with one-thread-per-process. To make it scale out to multiple cores, you need to run multiple Node.js servers, one per core and split request traffic between them.

Related

Will a single Node process program gain any performance running on multi-core container?

I understand that Node.js application runs on a single thread but will mostly hand off async operations (e.g. I/O) to OS which could run in multiple threads.
My question is could these multi-threads operations be running on multi-cores as well? If that's the case, does this mean I can still gain performance by running a single node process on a multi-core container? Is there any merit of running a single node application on a multi-core container?

My question is could these multi-threads operations be running on multi-cores as well?
Nodejs does have some native threads that could benefit from multiple cores, particularly for things like crypto operations.
Network I/O is all natively asynchronous so it doesn't really benefit.
Disk I/O uses a thread pool, but the threads are mostly blocked waiting for system I/O once a disk operation is started so there might be a small benefit, but not a large one. The main reason for the thread pool is to give a synchronous OS operation an asynchronous interface to preserve the asynchronous I/O model of nodejs.
If that's the case, does this mean I can still gain performance by running a single node process on a multi-core container? Is there any merit of running a single node application on a multi-core container?
It really depends upon what your app is doing. If performance is mostly limited by your own CPU-intensive Javascript execution or by network I/O, then you won't really benefit much from other CPU cores because of the single-threadedness of Javascript execution and the natively asynchronous networking I/O. If, on the other hand, you were doing a bunch of crypto operations that nodejs pushes off to threads, then you might benefit significantly.
If your container contains other processes such as a database, then that likely will benefit from other cores.
In a generic sense, you would probably get a small benefit just from the few things that nodejs uses native threads for that might see some benefit, but I wouldn't expect it to be a large effect.
In the end, the true answer to this question for your specific application can only be answered by testing. Create a reproducible specific load on your server that is representative of the types of things your app does and then test with a single-core container vs. a multi-core container.

When is better using clustering or worker_threads?

I have been reading about multi-processing on NodeJS to get the best understanding and try to get a good performance in heavy environments with my code.
Although I understand the basic purpose and concept for the different ways to take profit of the resources to handle the load, some questions arise as I go deeper and it seems I can't find the particular answers in the documentation.
NodeJS in a single thread:
NodeJS runs a single thread that we call event loop, despite in background OS and Libuv are handling the default worker pool for I/O asynchronous tasks.
We are supossed to use a single core for the event-loop, despite the workers might be using different cores. I guess they are sorted in the end by OS scheduler.
NodeJS as multi-threaded:
When using "worker_threads" library, in the same single process, different instances of v8/Libuv are running for each thread. Thus, they share the same context and communicate among threads with "message port" and the rest of the API.
Each worker thread runs its Event loop thread. Threads are supposed to be wisely balanced among CPU cores, improving the performance. I guess they are sorted in the end by OS scheduler.
Question 1: When a worker uses I/O default worker pool, are the very same
threads as other workers' pool being shared somehow? or each worker has its
own default worker pool?
NodeJS in multi-processing:
When using "cluster" library, we are splitting the work among different processes. Each process is set on a different core to balance the load... well, the main event loop is what in the end is set in a different core, so it doesn't share core with another heavy event loop. Sounds smart to do it that way.
Here I would communicate with some IPC tactic.
Question 2: And the default worker pool for this NodeJS process? where
are they? balanced among the rest of cores as expected in the first
case? Then they might be on the same cores as the other worker pools
of the cluster I guess. Shouldn't it be better to say that we are balancing main threads (event loops) rather than "the process"?
Being all this said, the main question:
Question 3: Whether is better using clustering or worker_threads? If both are being used in the same code, how can both libraries agree the best performance? or they
just can simply get in conflict? or at the end is the OS who takes
control?

Each worker thread has its own main loop (libuv etc). So does each cloned Node.js process when you use clustering.
Clustering is a way to load-balance incoming requests to your Node.js server over several copies of that server.
Worker threads are a way for a single Node.js process to offload long-running functions to a separate thread, to avoid blocking its own main loop.
Which is better? It depends on the problem you're solving. Worker threads are for long-running functions. Clustering makes a server able to handle more requests, by handling them in parallel. You can use both if you need to: have each Node.js cluster process use a worker thread for long-running functions.
As a first approximation for your decision-making: only use worker threads when you know you have long-running functions.
The node processes (whether from clustering or worker threads) don't get tied to specific cores (or Intel processor threads) on the host machine; the host's OS scheduling assigns cores as needed. The host OS scheduler minimize context-switch overhead when assigning cores to runnable processes. If you have too many active Javascript instances (cluster instances + worker threads) the host OS will give them timeslices according to its scheduling algorithms. Other than avoiding too many Javascript instances, there's very little point in trying second-guess the OS scheduler.
Edit Each Node.js instance, with any worker threads, uses a single libuv thread pool. A main Node.js process shares a single libuv thread pool with all its worker threads. If your Node.js program uses many worker threads, you may, or may not, need to set the UV_THREADPOOL_SIZE environment variable to a value greater than the default 4.
Node.js's cluster functionality uses the underlying OS's fork/exec scheme to create a new OS process for each cluster instance. So, each cluster instance has its own libuv pool.
If you're running stuff at scale, lets say with more than ten host machines running your Node.js server, then you can spend time optimizing Javascript instances.
Don't forget nginx if you use it as a reverse proxy to handle your https work. It needs some processor time too, but it uses fine-grain multithreading so you won't have to worry about it unless you have huge traffic.

Why run one Node.js process per core?

According to https://nodejs.org/api/cluster.html#cluster_cluster, one should run the same number of Node.js processes in parallel as the number of cores on the machine.
The supposed reasoning behind this is that Node.js is single threaded.
However, is this really true? Sure the JavaScript code and the event loop run on one thread but Node also has a worker thread pool. The default number of thread in this pool is 4. So why does it make sense to run one Node process per core?

This article has an extension review on the threading mechanism of node.js, worth a read.
In short, the main point is in plain node.js only a few function calls uses thread pool (DNS and FS calls). Your call mostly runs on the event loop only. So for example if you wrote a web app that each request takes 100ms synchronously, you are bound to 10req/s. Thread pool won't be involved. And to increase throughput on a multicore system is to use other cores.
Then it comes asynchronous or callback functions. While it does give you a sense of parallelization, what really happens is it waits for the async code to finish in background so that event loop can work on another function call. Afterwards, the callback codes still has to run in event loop, therefore all your written code are still ran in the one and only one event loop, thus won't be able to harness multi-core systems' power.

The said document clearly states that Node is single-threaded:
A single instance of Node.js runs in a single thread. To take advantage of multi-core systems, the user will sometimes want to launch a cluster of Node.js processes to handle the load.
This way Node process has a single thread, unless new threads are created with respective APIs like child_process, cluster, native add-ons or several built-in modules that use libuv treadpool:
Asynchronous system APIs are used by Node.js whenever possible, but where they do not exist, libuv's threadpool is used to create asynchronous node APIs based on synchronous system APIs. Node.js APIs that use the threadpool are:
all fs APIs, other than the file watcher APIs and those that are
explicitly synchronous
crypto.pbkdf2()
crypto.randomBytes(), unless it is used without a callback
crypto.randomFill()
dns.lookup()
all zlib APIs, other than those that are explicitly synchronous
A single thread uses 1 CPU core, in order to use available resources to the fullest extent and utilize multicore CPU, there should be several threads, the number of cores is used as a rule of thumb.
If cluster processes occupy 100% CPU and it's known there are other threads or external processes (database service) that would fight over CPU cores with cluster processes, the number of cluster processes can be decreased.

Node.js thread pool and core usage

I've read tons of articles and stackoverflow questions, and I saw a lot of information about thread pool, but no one talks about physical CPU core usage. I believe this question is not duplicated.
Given that I have a quad-core computer and libuv thread pool size of 4, will Node.js utilize all those 4 cores when processing lots of i/o requests(maybe more than thousands)?
I'm also curious that which i/o request uses thread pool. No one gives clear and full list of request. I know that Node.js event loop is single threaded but uses a thread pool to handle i/o such as accessing disk and db.

I'm also curious that which i/o request uses thread pool.
Disk I/O uses the thread pool.
Network I/O is async from the beginning and does not use threads.
With disk I/O, the individual disk I/O calls still present to Javascript as non-blocking and asynchronous even though they use threads in their native code implementation. When you exceed more disk I/O calls in process than the size of the thread pool, the disk I/O calls are queued and when one of the threads frees up, the next disk I/O call in the queue will run using that now available thread. Since the Javascript for the disk I/O is all non-blocking and assumes a completion callback will get called sometime in the future, the queuing of requests when the thread pool is all busy just means it will take longer to get to the later I/O requests, but otherwise the Javascript programming interface is not affected.
Given that I have a quad-core computer and libuv thread pool size of 4, will Node.js utilize all those 4 cores when processing lots of i/o requests(maybe more than thousands)?
This is not up to node.js and is hard to answer in the absolute for that reason. The first referenced article below says that on Linux, the I/O thread pool will use multiple cores and offers a small demo app that shows that.
This is up to the specific OS implementation and the thread scheduler that it uses. node.js just happily creates the threads and uses them and the OS then decides how to make use of the CPU given what it is being asked to do overall on the system. Since threads in the same process often have to communicate with one another in some way, using a separate CPU for different threads in the same process is a lot more complicated.
There are a couple node.js design patterns that are guaranteed to take advantage of multiple cores (in any modern OS)
Cluster your app and create as many clusters as you have processor cores. This also has the advantage that each cluster has its own I/O thread pool that can work independently and each can execute it's own Javascript independently. With only one node.js process and multiple cores, you never get more than one thread of Javascript execution (this is where node.js is referred to as single threaded - even though it does use threads in its library implementations). But, with clustering, you get independent Javascript execution for each clustered server process.
For individual tasks that might be CPU-intensive (for example, image processing), you can create a work queue and a pool of child worker processes that you hand work off to. This has some benefits in common with clustering, but it is more special purpose where you know exactly where the CPU bottleneck is and you want to attack it specifically.
Other related answers/articles:
how libuv threads in nodejs utilize multi core cpu
Node.js on multi-core machines
Taking Advantage of Multi-Processor Environments in node.js
When is the thread pool used?

If nodejs is multithreaded why should i use cluster module to utilize multicore cpu?

if nodejs is multithreaded see
this article and
threads are managed by OS which can do it in the same core or in another core in multicore cpu see this question then nodejs will automatically utilize multicore cpu ,
so why should i use cluster.fork to make different process of node to utilize multicore as shown in this example at node docs
i know that multiprocess have the advantage that when one process fall there still another process to respond to requests unlike in threads , i need to know if multicore can be utilized by just spawning process for each core or it's an OS task that i can't control

It depends.
Work that happens asynchronously and by Node itself, such as IO operations, is multithreaded. Your JavaScript application runs in a single thread.
In my opinion, the only time you need to fire off multiple processes, is if the vast majority of your work is done in straight JavaScript. Node was designed behind the fact that this is rarely the case, and is built for applications that primarily block on disk and network.
So, if you have a typical Node application where your JavaScript isn't the bulk of the work, then firing off multiple processes will not help you utilize multiple CPUs/cores.
However, if you have a special application where you do lots of work in your main loop, then multiple processes may be for you.
The easiest way to know is to monitor CPU utilization while your application runs. You will have to decide on a per-application basis what is best.

Node is not multi-threaded from the point of developer's view. Threads are used in a very different way than they are used by for example Apache's worker mpm.
I believe this answer will clear things up.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string