How different is NodeJs Architecture? - node.js

Have some questions regarding NodeJs Architecture:
It says although NodeJs is single threaded, internally it uses libuv library's thread pool? Is it right?
All non-blocking requests handled by main thread and all blocking requests handled by libuv thread pool? While some says there is nothing like main thread. Right or misconception?
If yes, then what happen if thread pool size is 4 and blocking requests are
Request no 5 have to be wait until thread is available? Is it right?
if point 3 is the case, then how NodeJs is different from Java if blocking requests count exceeds thread pool size?

1.
In general, both libuv and v8 are allowed to use (and in fact use) threads.
As a rule of thumb, note that a single threaded JavaScript runtime environment doesn't mean that the underlying libraries cannot use threads.
2.
You can refer to the documentation of libuv to know what will be dispatched on threads.
I cite it:
File system operations
DNS functions
User specified code via uv_queue_work()
3.
It is said that you can queue work on the thread pool.
So, yes, if you queue more work that what you can schedule, requests are going to wait their turn to run.
4.
A thread pool is a concept that abstracts away from the language.
At the end of the day, libuv and thus node are well targeted for I/O bound applications where you do a lot of networking and the API clearly states it.

Related

How does Node.js schedule Workers on a limited resource system

I would like to know how to take full advantage of the Worker class in nodejs' worker_threads, specifically, on a 1 or 2 cpu system, do tasks get scheduled better than if I had just blocked in a for-loop in a regular nodejs program (without making use of any worker api)? Are they just delegated to the OS?
Also, can I block inside a Worker? I assumed that's what they are for.
How does Node.js schedule Workers on a limited resource system
Nodejs worker threads use underlying OS threads so worker threads are scheduled by the OS, not by nodejs. If you have more active threads than you have CPU cores, then the underlying OS will time slice (e.g. share) the cores among the active threads. In general, you shouldn't write a blocked for loop in the main nodejs event loop thread, but for more specifics on that part of the question, we would need to see the actual code you're talking about, what the precise context is and what the alternatives are.
Also, can I block inside a Worker? I assumed that's what they are for.
Yes, you can. It will not have any adverse effect on the main event loop thread. You will, of course, not be able to do anything else in the worker thread while it is blocked. Also, you may want to know that worker threads in nodejs are not lightweight things (in terms of memory usage). Each one comes with a separate V8 interpreter environment. So, in a low resource system, you will have to very carefully plan out your memory usage as nodejs + multiple worker threads do not make for low memory usage.
Keep in mind that each V8 interpreter instance also creates its own thread pool for the libuv engine to use for things like crypto operations and file operations to allow blocking OS system calls to present an asynchronous interface to the JS engine. So, in addition to your Javascript threads, there are also these libuv threads involved in some nodejs APIs.

Why run one Node.js process per core?

According to https://nodejs.org/api/cluster.html#cluster_cluster, one should run the same number of Node.js processes in parallel as the number of cores on the machine.
The supposed reasoning behind this is that Node.js is single threaded.
However, is this really true? Sure the JavaScript code and the event loop run on one thread but Node also has a worker thread pool. The default number of thread in this pool is 4. So why does it make sense to run one Node process per core?
This article has an extension review on the threading mechanism of node.js, worth a read.
In short, the main point is in plain node.js only a few function calls uses thread pool (DNS and FS calls). Your call mostly runs on the event loop only. So for example if you wrote a web app that each request takes 100ms synchronously, you are bound to 10req/s. Thread pool won't be involved. And to increase throughput on a multicore system is to use other cores.
Then it comes asynchronous or callback functions. While it does give you a sense of parallelization, what really happens is it waits for the async code to finish in background so that event loop can work on another function call. Afterwards, the callback codes still has to run in event loop, therefore all your written code are still ran in the one and only one event loop, thus won't be able to harness multi-core systems' power.
The said document clearly states that Node is single-threaded:
A single instance of Node.js runs in a single thread. To take advantage of multi-core systems, the user will sometimes want to launch a cluster of Node.js processes to handle the load.
This way Node process has a single thread, unless new threads are created with respective APIs like child_process, cluster, native add-ons or several built-in modules that use libuv treadpool:
Asynchronous system APIs are used by Node.js whenever possible, but where they do not exist, libuv's threadpool is used to create asynchronous node APIs based on synchronous system APIs. Node.js APIs that use the threadpool are:
all fs APIs, other than the file watcher APIs and those that are
explicitly synchronous
crypto.pbkdf2()
crypto.randomBytes(), unless it is used without a callback
crypto.randomFill()
dns.lookup()
all zlib APIs, other than those that are explicitly synchronous
A single thread uses 1 CPU core, in order to use available resources to the fullest extent and utilize multicore CPU, there should be several threads, the number of cores is used as a rule of thumb.
If cluster processes occupy 100% CPU and it's known there are other threads or external processes (database service) that would fight over CPU cores with cluster processes, the number of cluster processes can be decreased.

Why is Node.js called single threaded when it maintains threads in thread pool?

Node.js maintains an event loop but then it also has by default four threads for the complicated requests. How this is single threaded when there are more threads available in the thread pool?
Also, the threads assigned by the event loop for the complicated task are the dedicated threads then how it's different from other multithreading concepts?
In the context to which you're referring, "single threaded" means that your Javascript runs as a single thread. No two pieces of Javascript are ever running at the same time either literally or time sliced (note: as of 2020 node.js does now have WorkerThreads, but those are something different from this original discussion). This massively simplifies Javascript development because there is no need to do thread synchronization for Javascript variables which are shared between different pieces of Javascript because only one piece of Javascript can ever be running at the same time.
All that said, node.js does use threads internal to its implementation. The default four threads you mention are used in a thread pool for disk I/O. Because disk I/O is normally a synchronous operation at the OS level that blocks the calling thread and node.js has a design where all I/O operations should be offered as asynchronous operations, the node.js designers decided to fulfill the asynchronous interface by using a pool of threads in order to implement (in native code), the fs module disk I/O interface (yes there are non-blocking disk I/O operations in some operating systems, but the node.js designers decided not to use them). This all happens under the covers in native code and does not affect the fact that your Javascript runs only in a single thread.
Here's a summary of how a disk I/O call works in node.js. Let's assume there's already an open file handle.
Javascript code calls fs.write() on an existing file handle.
fs module packages the arguments to the function and then calls native code.
Native code gets a thread from the thread pool and initiates the OS call to write data to that file
Native code returns from the function
fs module returns from the fs.write() call
Javascript continues to execute (whatever statements came after the fs.write() call
Some time later the native code fs.write() call on a thread finishes. It obtains a mutex protecting the event loop and inserts an event in the event queue.
When the Javascript engine is done executing whatever stream of Javascript it was running, it checks the event queue to see if there are any other events to run.
When it finds an event in the event queue, it removes it from the event queue and executes the callback associated with that event, starting a new stream of running Javascript.
Because a new event is never acted upon until the current stream of Javascript is done executing, this is where Javascript gets is event-driven, single threaded nature even though native code threads may be used to implement some library functions. Those threads are used to make a blocking operation into a non-blocking operation, but do not affect the single threaded-ness of Javascript execution itself.
The key here is that node.js is event driven. Every new operation that triggers some Javascript to run is serialized through the event queue and the next event is not serviced until the current stream of Javascript has finished executing.
In the node.js architecture the only way to get two pieces of Javascript to run independently and at the same time is to use a separate node.js process for each. Then, they will run as two completely separate operations and the OS will manage them separately. If your computer has at least two cores, then they can literally run at the same time, each on their own core. If your computer has only one core, they will essentially be in their own process thread and the OS will time slice them (sharing the one CPU between them).
I will tell it in a clear and simple way and clear the confusion :
Node Event Loop is SINGLE-THREADED But THE Other processes are not.
The confusion came from c++, which Node uses underline ( NodeJs is about 30% js + 70% c++ ).So, By default, The JS part of NodeJs is single-threaded BUT it uses a thread pool of c++. So, We have a single thread JS which is the event loop of NodeJs + 4 threads of c++ if needed for asynchronous I/O operations.
It is also important to know that The event loop is like a traffic organizer, Every request go through the loop ( which is single-thread ) then the loop organizes them to the pool threads if I/O processes are needed, so if you have a high computational app that does like heavy lifting image-processing, video-editing, audio-processing or 3d-graphics ..etc, which is not needed for most apps,So NodeJs will be a bottleneck for that high load computational app and the traffic organizer will be unhappy.
While NodeJS shine for I/O bound apps ( most apps ) Like apps dealing with databases and filesystem.
Again: By default, NodeJs uses a 4 thread pool (PLUS one thread for the event loop itself ). so by default (total of 5) because of the underlying c++ system.
As a general idea, The CPU could contain one or more cores, it depends on your server(money).
Each core could have threads. Watch your activity Monitor discover how many threads you are using.
Each process has multiple threads.
The multi-threading of Node is due to that node depends on V8 and libuv ( C Library ).
So Long story short:-
Node is single-threaded for the event loop itself but there are many operations that are done outside the event Loop, Like crypto and file system (fs ). if you have two calls for crypto then each of them will reach each THREAD ( imagine 3 calls to crypto and 1 for fs, These calls will be distributed one for each thread from the 4 thread pool )
Finally: It is very easy to increase the default number of threads of the C-Library libuv thread pool which is 4 by default by changing the value of process.env.uv_threadpool_size. and also you could use clustering ( PM2 recommended ) to like clone the event-loop, like have multiple event-loops in case the single-threaded one is not enough for your high load app.
So nobody illustrates that thread pool is a c++ thing that’s nodeJs control mostly not the developer, which still asking How it’s single-thread while having a thread-pool !!
Hope that simplifies that advanced topic.
By default, the execution of your JavaScript code runs on a single thread.
However, node.js tries to make most long-running calls async. For some that just involves doing async OS calls, but for some others node.js will execute the call itself on a secondary thread, while continuing to run other JS code. Once the async call terminated, the Js callback or Promise handler will run.
For async processing, Node.js was created explicitly as an experiment. It is believed that more performance and scalability can be achieved by doing async processing on a single thread under typical web loads than the typical thread based implementation.

In node js, what is libuv and does it use all core?

As far as I know, all IO requests and other asynchronous tasks are done by libuv in nodejs.
I want to know if libuv is using threading. If it is, is it using all available core or not?
First of all, what is libuv. As mentioned in the documentation, it's a multi-platform support library with a focus on asynchronous I/O.
libuv doesn't use thread for asynchronous tasks, but for those that aren't asynchronous by nature.
As an example, it doesn't use threads to deal with sockets, it uses threads to make synchronous fs calls asynchronous.
When threads are involved, libuv uses a thread pool the size of which you can change at compile-time using UV_THREADPOOL_SIZE.
node.js is provided with a precompiled version of libuv and thus a fixed UV_THREADPOOL_SIZE parameter.
It goes without saying that it has nothing to do with the number of cores of your chip.
I'm tempted to affirm that you can safely ignore the topic, for libuv and thus node.js don't use threads intensively for their purposes (unless you are using them in a really perverse way or if you are running an high number of libuv work requests).
Feel free to run an instance of node.js per core if you need as most of the users do.
The design overview section of libuv is also clear enough about this point:
The I/O (or event) loop is the central part of libuv. It establishes the content for all I/O operations, and it’s meant to be tied to a single thread. One can run multiple event loops as long as each runs in a different thread.
The libuv module has a responsibility that is relevant for some particular functions in the standard library. for SOME standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely.They make something called a thread pool that thread pool is a series of four threads that can be used for running computationally intensive tasks such as hashing functions.
By default libuv creates four threads in this thread pool. Thread Pool in the picture is organized by the Libuv So that means that in addition to that thread used for the event loop there are four other threads that can be used to offload expensive calculations that need to occur inside of our application. Many of the functions include in the node standard library will automatically make use of this thread pool.
Network (Network IO) is responsible for api requests, File system (File IO) is fs module. so node.js single thread delegates those heavy work to the libuv
If you have too many function calls, It will use all of the cores. CPU cores do not actually speed up the processing function calls, they just allow for some amount of concurrency inside of the work that you are doing.
From here:
A single instance of Node.js runs in a single thread. To take
advantage of multi-core systems the user will sometimes want to launch
a cluster of Node.js processes to handle the load.
The cluster module allows easy creation of child processes that all
share server ports.
Multiple processes could be better than multithreading in some cases. Some people even think theads are evil. Maybe node.js is designed in such a way to take advantage of processes better than threads.

Node.js thread pool and core usage

I've read tons of articles and stackoverflow questions, and I saw a lot of information about thread pool, but no one talks about physical CPU core usage. I believe this question is not duplicated.
Given that I have a quad-core computer and libuv thread pool size of 4, will Node.js utilize all those 4 cores when processing lots of i/o requests(maybe more than thousands)?
I'm also curious that which i/o request uses thread pool. No one gives clear and full list of request. I know that Node.js event loop is single threaded but uses a thread pool to handle i/o such as accessing disk and db.
I'm also curious that which i/o request uses thread pool.
Disk I/O uses the thread pool.
Network I/O is async from the beginning and does not use threads.
With disk I/O, the individual disk I/O calls still present to Javascript as non-blocking and asynchronous even though they use threads in their native code implementation. When you exceed more disk I/O calls in process than the size of the thread pool, the disk I/O calls are queued and when one of the threads frees up, the next disk I/O call in the queue will run using that now available thread. Since the Javascript for the disk I/O is all non-blocking and assumes a completion callback will get called sometime in the future, the queuing of requests when the thread pool is all busy just means it will take longer to get to the later I/O requests, but otherwise the Javascript programming interface is not affected.
Given that I have a quad-core computer and libuv thread pool size of 4, will Node.js utilize all those 4 cores when processing lots of i/o requests(maybe more than thousands)?
This is not up to node.js and is hard to answer in the absolute for that reason. The first referenced article below says that on Linux, the I/O thread pool will use multiple cores and offers a small demo app that shows that.
This is up to the specific OS implementation and the thread scheduler that it uses. node.js just happily creates the threads and uses them and the OS then decides how to make use of the CPU given what it is being asked to do overall on the system. Since threads in the same process often have to communicate with one another in some way, using a separate CPU for different threads in the same process is a lot more complicated.
There are a couple node.js design patterns that are guaranteed to take advantage of multiple cores (in any modern OS)
Cluster your app and create as many clusters as you have processor cores. This also has the advantage that each cluster has its own I/O thread pool that can work independently and each can execute it's own Javascript independently. With only one node.js process and multiple cores, you never get more than one thread of Javascript execution (this is where node.js is referred to as single threaded - even though it does use threads in its library implementations). But, with clustering, you get independent Javascript execution for each clustered server process.
For individual tasks that might be CPU-intensive (for example, image processing), you can create a work queue and a pool of child worker processes that you hand work off to. This has some benefits in common with clustering, but it is more special purpose where you know exactly where the CPU bottleneck is and you want to attack it specifically.
Other related answers/articles:
how libuv threads in nodejs utilize multi core cpu
Node.js on multi-core machines
Taking Advantage of Multi-Processor Environments in node.js
When is the thread pool used?

Resources