How does Node.js schedule Workers on a limited resource system

How does Node.js schedule Workers on a limited resource system - node.js

I would like to know how to take full advantage of the Worker class in nodejs' worker_threads, specifically, on a 1 or 2 cpu system, do tasks get scheduled better than if I had just blocked in a for-loop in a regular nodejs program (without making use of any worker api)? Are they just delegated to the OS?
Also, can I block inside a Worker? I assumed that's what they are for.

How does Node.js schedule Workers on a limited resource system
Nodejs worker threads use underlying OS threads so worker threads are scheduled by the OS, not by nodejs. If you have more active threads than you have CPU cores, then the underlying OS will time slice (e.g. share) the cores among the active threads. In general, you shouldn't write a blocked for loop in the main nodejs event loop thread, but for more specifics on that part of the question, we would need to see the actual code you're talking about, what the precise context is and what the alternatives are.
Also, can I block inside a Worker? I assumed that's what they are for.
Yes, you can. It will not have any adverse effect on the main event loop thread. You will, of course, not be able to do anything else in the worker thread while it is blocked. Also, you may want to know that worker threads in nodejs are not lightweight things (in terms of memory usage). Each one comes with a separate V8 interpreter environment. So, in a low resource system, you will have to very carefully plan out your memory usage as nodejs + multiple worker threads do not make for low memory usage.
Keep in mind that each V8 interpreter instance also creates its own thread pool for the libuv engine to use for things like crypto operations and file operations to allow blocking OS system calls to present an asynchronous interface to the JS engine. So, in addition to your Javascript threads, there are also these libuv threads involved in some nodejs APIs.

Related

When a workerThread is created in nodejs, does it utilize the same core in which nodejs process is running?

Let's assume i have a nodejs serverProgram with one api and it does some manipulations on the video file, sent via the http request.
const saveVideoFile=(req,res)=>{
processAndSaveVideoFile(); // can run for minimum of 10 minutes
res.send({status: "video is being processed"})
}
i decided to to make use of a workerThread to do this processing as my machine has 3 cores (core1,core2,core3) and there is no hyperthreading enabled here
Assume that my nodejs program is running on core1. When i fire up a single workerThread, will the workerThread run on core2/core3 or core1?
i read that workerThread is not the same as childProcess. ChildProcess will fork a new process which will facilitate the childProcess to choose from available free cores (core2 or core3).
i read that workerThread shares memory with the mainThread. Let's assume that i create 2 workerThreads (wt1,wt2). Will my nodejs program, wt1, wt2 run on the same core i.e core1 ?
Also, in nodejs we have eventloop (mainthread) and otherThreads doing the background operations i.e I/O. is it correct to assume that all of these are utilizing the resources available in a single core (core1). if this is the case, is creating and using additional workerThread's an overkill on the nodejs server?
Below is an excerpt from this blog
We can run things in parallel in Node.js. However, we need not to
create threads. The operating system and the virtual machine
collectively run the I/O in parallel and the JS code then runs in a
single thread when it is time to send the data back to the JavaScript
code.
i keep reading this same information about nodejs in many articles and video presentations. But what i do not understand is this,
The operating system and the virtual machine collectively run the I/O in parallel
How can the operating system run the I/O requests from nodejs program in parallel without using any of the childProcess or threads spawned from nodejs? if those I/O requests from nodejs program is running in parallel, does it mean that all 3 cores (core1,core2,core3) will be utilized?
There are lot of contents on nodejs, but it doesn't clear doubts related to my above questions. if you have idea on how these things actually work, please share the detail.

A worker thread in node.js is an actual OS thread running in a different instance of V8. As such, it's totally up to the operating system to decide how to allocate it among available CPU cores. If there are cores with available time, then it will not generally be run on the same core as the main nodejs thread when that thread is busy because the OS will allocate busy threads across the various cores.
But, again this is entirely up to the OS and is not something that nodejs controls and the exact strategy for which cores are used will vary by OS. But, in all modern operating systems, the design goal is that available cores are used for threads that are currently executing. Now, if there are more threads active at once than there are cores, the threads will be time-sliced and all the cores will be active.
Also, in nodejs we have eventloop (mainthread) and otherThreads doing the background operations i.e I/O. is it correct to assume that all of these are utilizing the resources available in a single core (core1). if this is the case, is creating and using additional workerThread's an overkill on the nodejs server?
No, it is not correct to assume those threads all use the same core.
A workerThread in nodejs has its own event loop. For the most part, it does not share memory. In fact, if you want to share memory, you have to very specifically allocated SharedMemory and pass that to the workerThread.
Is it overkill? Well, it depends upon what you're doing. There are very useful things to do with workerThreads and there are things that they would not be necessary for.
The operating system and the virtual machine collectively run the I/O in parallel
I/O in node.js is either asynchronous at the OS level (such as networking) or run in separate threads (such as disk I/O). That means it runs separately from the main thread in node.js that runs your Javascript and can run in parallel with it, synchronizing only at the completion of an event. "Parallel" in this case means that both make progress at the same time. If there are multiple cores, then they can truly be running at exactly the same time. If there was only one core, then the OS will timeslice between the various threads and they will be both make progress (in an interleaved fashion that will seem to be parallel, but really they are taking turns).
How can the operating system run the I/O requests from nodejs program in parallel without using any of the childProcess or threads spawned from nodejs? if those I/O requests from nodejs program is running in parallel, does it mean that all 3 cores (core1,core2,core3) will be utilized?
The OS has its own threads for managing things like a network interface or a disk interface. The job of those threads is to interface with the hardware and bring data to an appropriate application or take data from the application and send it to the hardware. These are OS-level threads that exists independent of node.js. Yes, other cores can be used by those OS-level threads. It is important to realize that many operations such as networking are inherently non-blocking. Thus, if you're waiting for some data to arrive on a network interface, you don't need to have a thread doing something the whole time.
I want to add that it appears in your questions that you've combined questions about a several different things. Mentioned in your questions are:
Worker Threads
Internal node.js threads
Operating system threads
These are all different things.
A worker thread is a new thread you can start to run specific pieces of Javascript in another thread so you can have more than one Javascript thread running at the same time. In node.js, this is done by creating a whole new instance of V8, setting up a whole new global environment and loaded modules environment and using almost entirely separate memory.
Internal node.js threads are used by node.js as part of implementing its event loop and its standard library. Specifically, disk I/O and some crypto operations are run in internal native threads and they communicate with your Javascript via events/callbacks through the event loop.
Operating system threads are threads that the OS uses to implement it's own system APIs. Since the OS is responsible for lots of things, these threads ca have many different uses. Depending upon native implementations, they may be used to facilitate things like disk I/O or networking I/O. These threads are the responsibility of the OS to create and use and are not directly controlled by node.js.
Some additional questions asked in comments:
what is the difference b/w workerThread & childProcess concept in nodejs? is childProcess = workerThread without sharedMemory ?
A child process can be any type of program - it does not have to be a node.js program. A worker thread is node.js code.
A worker thread can share memory if sharedMemory is specifically allocated and shared with the worker thread and if it is carefully managed for concurrency issues.
It is more efficient to copy memory back and forth between worker thread and main thread than with child process.
If main program exits, worker threads will exit. If main program exits, child process can be configured to exit or to continue.
If worker thread calls process.exit(), the main thread will exit too. If child program exits, it cannot cause main program to exit without main program's cooperation.
how nodejs is able to magically interact with the os level thread without nodejs itself creating any threads?, i need additional details on this, your explanation is the common one present in most places including the blog i shared?
nodejs just calls an OS API. It's the OS API that manages communicating with its own threads (if threads are needed for that specific OS API). How it does that communication internally is implementation dependent and will vary by OS. It will even vary by OS which OS APIs use threads and which don't.

When is better using clustering or worker_threads?

I have been reading about multi-processing on NodeJS to get the best understanding and try to get a good performance in heavy environments with my code.
Although I understand the basic purpose and concept for the different ways to take profit of the resources to handle the load, some questions arise as I go deeper and it seems I can't find the particular answers in the documentation.
NodeJS in a single thread:
NodeJS runs a single thread that we call event loop, despite in background OS and Libuv are handling the default worker pool for I/O asynchronous tasks.
We are supossed to use a single core for the event-loop, despite the workers might be using different cores. I guess they are sorted in the end by OS scheduler.
NodeJS as multi-threaded:
When using "worker_threads" library, in the same single process, different instances of v8/Libuv are running for each thread. Thus, they share the same context and communicate among threads with "message port" and the rest of the API.
Each worker thread runs its Event loop thread. Threads are supposed to be wisely balanced among CPU cores, improving the performance. I guess they are sorted in the end by OS scheduler.
Question 1: When a worker uses I/O default worker pool, are the very same
threads as other workers' pool being shared somehow? or each worker has its
own default worker pool?
NodeJS in multi-processing:
When using "cluster" library, we are splitting the work among different processes. Each process is set on a different core to balance the load... well, the main event loop is what in the end is set in a different core, so it doesn't share core with another heavy event loop. Sounds smart to do it that way.
Here I would communicate with some IPC tactic.
Question 2: And the default worker pool for this NodeJS process? where
are they? balanced among the rest of cores as expected in the first
case? Then they might be on the same cores as the other worker pools
of the cluster I guess. Shouldn't it be better to say that we are balancing main threads (event loops) rather than "the process"?
Being all this said, the main question:
Question 3: Whether is better using clustering or worker_threads? If both are being used in the same code, how can both libraries agree the best performance? or they
just can simply get in conflict? or at the end is the OS who takes
control?

Each worker thread has its own main loop (libuv etc). So does each cloned Node.js process when you use clustering.
Clustering is a way to load-balance incoming requests to your Node.js server over several copies of that server.
Worker threads are a way for a single Node.js process to offload long-running functions to a separate thread, to avoid blocking its own main loop.
Which is better? It depends on the problem you're solving. Worker threads are for long-running functions. Clustering makes a server able to handle more requests, by handling them in parallel. You can use both if you need to: have each Node.js cluster process use a worker thread for long-running functions.
As a first approximation for your decision-making: only use worker threads when you know you have long-running functions.
The node processes (whether from clustering or worker threads) don't get tied to specific cores (or Intel processor threads) on the host machine; the host's OS scheduling assigns cores as needed. The host OS scheduler minimize context-switch overhead when assigning cores to runnable processes. If you have too many active Javascript instances (cluster instances + worker threads) the host OS will give them timeslices according to its scheduling algorithms. Other than avoiding too many Javascript instances, there's very little point in trying second-guess the OS scheduler.
Edit Each Node.js instance, with any worker threads, uses a single libuv thread pool. A main Node.js process shares a single libuv thread pool with all its worker threads. If your Node.js program uses many worker threads, you may, or may not, need to set the UV_THREADPOOL_SIZE environment variable to a value greater than the default 4.
Node.js's cluster functionality uses the underlying OS's fork/exec scheme to create a new OS process for each cluster instance. So, each cluster instance has its own libuv pool.
If you're running stuff at scale, lets say with more than ten host machines running your Node.js server, then you can spend time optimizing Javascript instances.
Don't forget nginx if you use it as a reverse proxy to handle your https work. It needs some processor time too, but it uses fine-grain multithreading so you won't have to worry about it unless you have huge traffic.

Why run one Node.js process per core?

According to https://nodejs.org/api/cluster.html#cluster_cluster, one should run the same number of Node.js processes in parallel as the number of cores on the machine.
The supposed reasoning behind this is that Node.js is single threaded.
However, is this really true? Sure the JavaScript code and the event loop run on one thread but Node also has a worker thread pool. The default number of thread in this pool is 4. So why does it make sense to run one Node process per core?

This article has an extension review on the threading mechanism of node.js, worth a read.
In short, the main point is in plain node.js only a few function calls uses thread pool (DNS and FS calls). Your call mostly runs on the event loop only. So for example if you wrote a web app that each request takes 100ms synchronously, you are bound to 10req/s. Thread pool won't be involved. And to increase throughput on a multicore system is to use other cores.
Then it comes asynchronous or callback functions. While it does give you a sense of parallelization, what really happens is it waits for the async code to finish in background so that event loop can work on another function call. Afterwards, the callback codes still has to run in event loop, therefore all your written code are still ran in the one and only one event loop, thus won't be able to harness multi-core systems' power.

The said document clearly states that Node is single-threaded:
A single instance of Node.js runs in a single thread. To take advantage of multi-core systems, the user will sometimes want to launch a cluster of Node.js processes to handle the load.
This way Node process has a single thread, unless new threads are created with respective APIs like child_process, cluster, native add-ons or several built-in modules that use libuv treadpool:
Asynchronous system APIs are used by Node.js whenever possible, but where they do not exist, libuv's threadpool is used to create asynchronous node APIs based on synchronous system APIs. Node.js APIs that use the threadpool are:
all fs APIs, other than the file watcher APIs and those that are
explicitly synchronous
crypto.pbkdf2()
crypto.randomBytes(), unless it is used without a callback
crypto.randomFill()
dns.lookup()
all zlib APIs, other than those that are explicitly synchronous
A single thread uses 1 CPU core, in order to use available resources to the fullest extent and utilize multicore CPU, there should be several threads, the number of cores is used as a rule of thumb.
If cluster processes occupy 100% CPU and it's known there are other threads or external processes (database service) that would fight over CPU cores with cluster processes, the number of cluster processes can be decreased.

In node js, what is libuv and does it use all core?

As far as I know, all IO requests and other asynchronous tasks are done by libuv in nodejs.
I want to know if libuv is using threading. If it is, is it using all available core or not?

First of all, what is libuv. As mentioned in the documentation, it's a multi-platform support library with a focus on asynchronous I/O.
libuv doesn't use thread for asynchronous tasks, but for those that aren't asynchronous by nature.
As an example, it doesn't use threads to deal with sockets, it uses threads to make synchronous fs calls asynchronous.
When threads are involved, libuv uses a thread pool the size of which you can change at compile-time using UV_THREADPOOL_SIZE.
node.js is provided with a precompiled version of libuv and thus a fixed UV_THREADPOOL_SIZE parameter.
It goes without saying that it has nothing to do with the number of cores of your chip.
I'm tempted to affirm that you can safely ignore the topic, for libuv and thus node.js don't use threads intensively for their purposes (unless you are using them in a really perverse way or if you are running an high number of libuv work requests).
Feel free to run an instance of node.js per core if you need as most of the users do.
The design overview section of libuv is also clear enough about this point:
The I/O (or event) loop is the central part of libuv. It establishes the content for all I/O operations, and it’s meant to be tied to a single thread. One can run multiple event loops as long as each runs in a different thread.

The libuv module has a responsibility that is relevant for some particular functions in the standard library. for SOME standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely.They make something called a thread pool that thread pool is a series of four threads that can be used for running computationally intensive tasks such as hashing functions.
By default libuv creates four threads in this thread pool. Thread Pool in the picture is organized by the Libuv So that means that in addition to that thread used for the event loop there are four other threads that can be used to offload expensive calculations that need to occur inside of our application. Many of the functions include in the node standard library will automatically make use of this thread pool.
Network (Network IO) is responsible for api requests, File system (File IO) is fs module. so node.js single thread delegates those heavy work to the libuv
If you have too many function calls, It will use all of the cores. CPU cores do not actually speed up the processing function calls, they just allow for some amount of concurrency inside of the work that you are doing.

From here:
A single instance of Node.js runs in a single thread. To take
advantage of multi-core systems the user will sometimes want to launch
a cluster of Node.js processes to handle the load.
The cluster module allows easy creation of child processes that all
share server ports.
Multiple processes could be better than multithreading in some cases. Some people even think theads are evil. Maybe node.js is designed in such a way to take advantage of processes better than threads.

Node.js thread pool and core usage

I've read tons of articles and stackoverflow questions, and I saw a lot of information about thread pool, but no one talks about physical CPU core usage. I believe this question is not duplicated.
Given that I have a quad-core computer and libuv thread pool size of 4, will Node.js utilize all those 4 cores when processing lots of i/o requests(maybe more than thousands)?
I'm also curious that which i/o request uses thread pool. No one gives clear and full list of request. I know that Node.js event loop is single threaded but uses a thread pool to handle i/o such as accessing disk and db.

I'm also curious that which i/o request uses thread pool.
Disk I/O uses the thread pool.
Network I/O is async from the beginning and does not use threads.
With disk I/O, the individual disk I/O calls still present to Javascript as non-blocking and asynchronous even though they use threads in their native code implementation. When you exceed more disk I/O calls in process than the size of the thread pool, the disk I/O calls are queued and when one of the threads frees up, the next disk I/O call in the queue will run using that now available thread. Since the Javascript for the disk I/O is all non-blocking and assumes a completion callback will get called sometime in the future, the queuing of requests when the thread pool is all busy just means it will take longer to get to the later I/O requests, but otherwise the Javascript programming interface is not affected.
Given that I have a quad-core computer and libuv thread pool size of 4, will Node.js utilize all those 4 cores when processing lots of i/o requests(maybe more than thousands)?
This is not up to node.js and is hard to answer in the absolute for that reason. The first referenced article below says that on Linux, the I/O thread pool will use multiple cores and offers a small demo app that shows that.
This is up to the specific OS implementation and the thread scheduler that it uses. node.js just happily creates the threads and uses them and the OS then decides how to make use of the CPU given what it is being asked to do overall on the system. Since threads in the same process often have to communicate with one another in some way, using a separate CPU for different threads in the same process is a lot more complicated.
There are a couple node.js design patterns that are guaranteed to take advantage of multiple cores (in any modern OS)
Cluster your app and create as many clusters as you have processor cores. This also has the advantage that each cluster has its own I/O thread pool that can work independently and each can execute it's own Javascript independently. With only one node.js process and multiple cores, you never get more than one thread of Javascript execution (this is where node.js is referred to as single threaded - even though it does use threads in its library implementations). But, with clustering, you get independent Javascript execution for each clustered server process.
For individual tasks that might be CPU-intensive (for example, image processing), you can create a work queue and a pool of child worker processes that you hand work off to. This has some benefits in common with clustering, but it is more special purpose where you know exactly where the CPU bottleneck is and you want to attack it specifically.
Other related answers/articles:
how libuv threads in nodejs utilize multi core cpu
Node.js on multi-core machines
Taking Advantage of Multi-Processor Environments in node.js
When is the thread pool used?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string