Nodejs clusters performance improvement? - node.js

I was watching this video about Nodejs clusterning where you can spawn off several child process. He said that the parent cluster takes care of the child cluster in a round-robin approach giving a chunk of time to each child process. Why would that be any different than the running a single thread? Unless each child can handle itself I do not see much benefit of doing so. However, in the end he pulls some benchmark that is really good which makes me wonder why do people even use a none clustered app if having clusters improves performance by that much.

This statement: "the parent cluster takes care of the child cluster in a round-robin approach giving a chunk of time to each child process" is not entirely correct.
Each cluster process is a separate OS process. The operating system takes care of sharing the CPU among processes, not the parent process. The parent process takes each incoming http request and splits them among the child processes, but that is the extent of the parent process' involvement.
If there is only one CPU core in the server hardware (not usually the case these days unless you're on a hosting plan that only gives you access to one CPU core), then that single CPU core is shared among all of your processes/threads that are active and wish to be running.
If there is more than one CPU core, then individual processes can be assigned to separate cores and your processes can be running truly in parallel. This is where the biggest advantage comes from with nodejs clustering. Then, you get multiple requests truly running in parallel rather than in a single thread as it would be with a single nodejs process.
However, in the end he pulls some benchmark that is really good which makes me wonder why do people even use a non-clustered app if having clusters improves performance by that much.
Clustering adds some level of complication. For example, if you have any server-side state that is kept in memory within your server process that all incoming requests need access to, then that won't work with clustered processes because all the clustered processes don't have access to the same memory (they each have their own memory). In that case, you typically have to either remove any server-side state or move all the server-side state to it's own process (usually a database) that all the clustered processes can then access. This is very doable, but adds an extra level of complication and has its own performance implications.
In addition clustering takes more memory on your server. It may also require more monitoring and logging to make sure all processes are running healthy.

Related

When is better using clustering or worker_threads?

I have been reading about multi-processing on NodeJS to get the best understanding and try to get a good performance in heavy environments with my code.
Although I understand the basic purpose and concept for the different ways to take profit of the resources to handle the load, some questions arise as I go deeper and it seems I can't find the particular answers in the documentation.
NodeJS in a single thread:
NodeJS runs a single thread that we call event loop, despite in background OS and Libuv are handling the default worker pool for I/O asynchronous tasks.
We are supossed to use a single core for the event-loop, despite the workers might be using different cores. I guess they are sorted in the end by OS scheduler.
NodeJS as multi-threaded:
When using "worker_threads" library, in the same single process, different instances of v8/Libuv are running for each thread. Thus, they share the same context and communicate among threads with "message port" and the rest of the API.
Each worker thread runs its Event loop thread. Threads are supposed to be wisely balanced among CPU cores, improving the performance. I guess they are sorted in the end by OS scheduler.
Question 1: When a worker uses I/O default worker pool, are the very same
threads as other workers' pool being shared somehow? or each worker has its
own default worker pool?
NodeJS in multi-processing:
When using "cluster" library, we are splitting the work among different processes. Each process is set on a different core to balance the load... well, the main event loop is what in the end is set in a different core, so it doesn't share core with another heavy event loop. Sounds smart to do it that way.
Here I would communicate with some IPC tactic.
Question 2: And the default worker pool for this NodeJS process? where
are they? balanced among the rest of cores as expected in the first
case? Then they might be on the same cores as the other worker pools
of the cluster I guess. Shouldn't it be better to say that we are balancing main threads (event loops) rather than "the process"?
Being all this said, the main question:
Question 3: Whether is better using clustering or worker_threads? If both are being used in the same code, how can both libraries agree the best performance? or they
just can simply get in conflict? or at the end is the OS who takes
control?
Each worker thread has its own main loop (libuv etc). So does each cloned Node.js process when you use clustering.
Clustering is a way to load-balance incoming requests to your Node.js server over several copies of that server.
Worker threads are a way for a single Node.js process to offload long-running functions to a separate thread, to avoid blocking its own main loop.
Which is better? It depends on the problem you're solving. Worker threads are for long-running functions. Clustering makes a server able to handle more requests, by handling them in parallel. You can use both if you need to: have each Node.js cluster process use a worker thread for long-running functions.
As a first approximation for your decision-making: only use worker threads when you know you have long-running functions.
The node processes (whether from clustering or worker threads) don't get tied to specific cores (or Intel processor threads) on the host machine; the host's OS scheduling assigns cores as needed. The host OS scheduler minimize context-switch overhead when assigning cores to runnable processes. If you have too many active Javascript instances (cluster instances + worker threads) the host OS will give them timeslices according to its scheduling algorithms. Other than avoiding too many Javascript instances, there's very little point in trying second-guess the OS scheduler.
Edit Each Node.js instance, with any worker threads, uses a single libuv thread pool. A main Node.js process shares a single libuv thread pool with all its worker threads. If your Node.js program uses many worker threads, you may, or may not, need to set the UV_THREADPOOL_SIZE environment variable to a value greater than the default 4.
Node.js's cluster functionality uses the underlying OS's fork/exec scheme to create a new OS process for each cluster instance. So, each cluster instance has its own libuv pool.
If you're running stuff at scale, lets say with more than ten host machines running your Node.js server, then you can spend time optimizing Javascript instances.
Don't forget nginx if you use it as a reverse proxy to handle your https work. It needs some processor time too, but it uses fine-grain multithreading so you won't have to worry about it unless you have huge traffic.

How to apply clustering/spawing child process techniques for Node.js application having bouth IO bound and CPU bound tasks?

I'm working on a IOT project where the Node.js application perform following tasks:
1. Reading stream of messages using asynchronous messaging library (IO bound)
2. Sending the messages to web service where machine learning happens based on the messages that were sent by Node.js application (IO bound as only API call is involved)
3. Receive the pattern generated as a result of machine learning from web service (using REST API)
4. Compare the pattern against the real-time streaming messages (CPU intensive as complex algorithms are involved for pattern matching).
5. Logging stack traces (IO bound)
A node.js application is going to be developed to have these functionalities as separate tasks running under a single-thread by default. Being the fact that, spawning the child process will be useful only for CPU intensive tasks, how to to do clustering for node.js process doint both IO and CPU bound tasks? Do we need to partially perform clustering on this node.js application?
Can anyone please suggest the effective architecture for this node.js application?
If you have ANY CPU-intensive tasks, then use clustering for all requests.
The fact that a clustered process is also doing some I/O intensive stuff won't hurt you, but you will want the clustered process for the CPU intensive stuff. So, just make your server clustered and let each cluster handle the whole load of a request (both the I/O and the CPU stuff).
In a nutshell, CPU-intensive stuff is the primary driver for clustering. It doesn't hurt anything if the clustered processes are also doing non-blocking I/O. In fact, clustering up to the number of CPUs available can even help I/O bound processes some too in high load situations (though not nearly as much help as with CPU-intensive processes).
An alternative, though it may be a more complicated implementation, is to use child processes or the new Worker threads only for the CPU-intensive parts of your request handling. In that case, you'd create some sort of work queue and a set of child processes or Worker threads for performing operations in the queue and your master process would distribute tasks to each child process from the queue. Using this scheme, you can decide exactly which code is executed via the work queue and which code stays in the main process, though you now have to coordinate between the two using some sort of interprocess communication.

Node: one core, many processes

I have looked up online and all I seem to find are answers related to the question of "how does Node benefit from running in a multi core cpu?"
But. If you have a machine with just one core, you can only be running one process at any given time. (I am considering task scheduling here). And node uses a single threaded model.
My question: is there any scenario in which it makes sense to run multiple node processes in one core? And if the process is a web server that listens on a port, how can this ever work given that only one process can listen?
My question: is there any scenario in which it makes sense to run
multiple node processes in one core?
Yes, there are some scenarios. See details below.
And if the process is a web server that listens on a port, how can
this ever work given that only one process can listen?
The node.js clustering module creates a scenario where there is one listener on the desired port, but incoming requests are shared among all the clustered processes. More details to follow...
You can use the clustering module to run more than one process that are all configured to handle incoming requests on the same port. If you want to know how incoming requests are shared among the different clustered processes, you can read the explanation in this blog post. In a nutshell, there ends up being only one listener on the desired port and the incoming requests are shared among the various clustered processes.
As to whether you could benefit from more processes than you have cores, the answer is that it depends on what type of benefit you are looking for. If you have a properly written server with all async I/O, then adding more processes than cores will likely not improve your overall throughput (as measured by requests per second that your server could process).
But, if you have any CPU-heavy processing in your requests, having a few more processes may provide a bit fairer scheduling between simultaneous requests because the OS will "share" the CPU among each of your processes. This will likely slightly decrease overall throughput (because of the added overhead of task switching the CPU between processes), but it may make request processing more even when there are multiple requests to be processed together.
If your requests don't have much CPU-heavy processing and are really just waiting for I/O most of the time, then there's probably no benefit to adding more processes than cores.
So, it really depends what you want to optimize for and what your situation is.

Node.js child process limits

I know that node is a single threaded system and I was wondering if a child process uses its own thread or its parents. say for example I have an amd E-350 cpu with two threads. if I ran a node server that spawned ten child instances which all work continuously. would it allow it or would it fail as the hardware itself is not sufficient enough?
I can say from own experience that I successfully spawned 150 child processes inside an Amazon t2.micro with just one core.
The reason? I was DoS-ing myself for testing my core server's limits.
The attack stayed alive for 8 hours, until I gave up, but it could've been working for much longer.
My code was simply running an HTTP client pool and as soon as one request was done, another one spawned. This doesn't need a lot of CPU. It needs lots of network, though.
Most of the time, the processes were just waiting for requests to finish.
However, in a high-concurrency application, the performance will be awful if you share memory between so many processes.

How does Cluster keeps up with Node's single thread concept?

When you fork, or start multiple workers using something like Cluster:
Are multiple threads or instances of Node process being created ? Does this breaks Node's single thread concept?
How are the request handled between workers? Does Cluster provides some intelligent mechanism to load balance all requests to multiple workers ?
Cluster uses fork, and yes, it gets balanced automatically:
The worker processes are spawned using the child_process.fork method, so that they can communicate with the parent via IPC and pass server handles back and forth.
[...]
When multiple processes are all accept()ing on the same underlying resource, the operating system load-balances across them very efficiently. There is no routing logic in Node.js, or in your program, and no shared state between the workers. Therefore, it is important to design your program such that it does not rely too heavily on in-memory data objects for things like sessions and login.
You might think that this breaks node.js single thread concept if you count a new node.js instance as another thread, however, keep in mind that all callbacks to a given request are going to be handled be the same node.js instance that accepted the original request. There are no race conditions, no shared data, only fairly safe interprocess communication.
See the Cluster documentation for more information.
Cluster was made developed to compensate of node.js's single thread architecture. Modern processors have multiple cores and a single threaded process will not be able to take advantage of the available cores. It does deviate from its single thread architecture, but it was never the plan to stick to it. The main concept was asynchronous, event-driven execution.
Cluster uses fork to create processes. A forked process really is its
own process with its own address space - there is nothing that the
child can do (normally) to affect its parent's or siblings address
space (unlike a thread). In addition to having all the methods in a
normal ChildProcess instance, the returned object has a communication
channel built-in. All forked processes can communicate using this
channel.
Notice the subtle difference here : it is not multi-threaded, it just forks to create new independent processes. See here Threads vs Processes in Linux to compare them. Each worker assumes single-threaded architecture like before. So it does not break node's single thread concept.
The balancing of load depends on your code itself (since each is independent) and the OS. The load is balanced equally among all forked processes and original process alike, by the OS.
But if you wish to do it differently, it is also possible. If you use master thread differently than worker, or each worker specializing different tasks(compressing/ffmpeg) you can do that.

Resources