how to make use of multiple processors with Node.js? [duplicate] - node.js

Node.js looks interesting, BUT I must miss something - isn't Node.js tuned only to run on a single process and thread?
Then how does it scale for multi-core CPUs and multi-CPU servers? After all, it is all great to make fast as possible single-thread server, but for high loads I would want to use several CPUs. And the same goes for making applications faster - seems today the way is use multiple CPUs and parallelize the tasks.
How does Node.js fit into this picture? Is its idea to somehow distribute multiple instances or what?

[This post is up-to-date as of 2012-09-02 (newer than above).]
Node.js absolutely does scale on multi-core machines.
Yes, Node.js is one-thread-per-process. This is a very deliberate design decision and eliminates the need to deal with locking semantics. If you don't agree with this, you probably don't yet realize just how insanely hard it is to debug multi-threaded code. For a deeper explanation of the Node.js process model and why it works this way (and why it will NEVER support multiple threads), read my other post.
So how do I take advantage of my 16 core box?
Two ways:
For big heavy compute tasks like image encoding, Node.js can fire up child processes or send messages to additional worker processes. In this design, you'd have one thread managing the flow of events and N processes doing heavy compute tasks and chewing up the other 15 CPUs.
For scaling throughput on a webservice, you should run multiple Node.js servers on one box, one per core and split request traffic between them. This provides excellent CPU-affinity and will scale throughput nearly linearly with core count.
Scaling throughput on a webservice
Since v6.0.X Node.js has included the cluster module straight out of the box, which makes it easy to set up multiple node workers that can listen on a single port. Note that this is NOT the same as the older learnboost "cluster" module available through npm.
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
} else {
http.Server(function(req, res) { ... }).listen(8000);
}
Workers will compete to accept new connections, and the least loaded process is most likely to win. It works pretty well and can scale up throughput quite well on a multi-core box.
If you have enough load to care about multiple cores, then you are going to want to do a few more things too:
Run your Node.js service behind a web-proxy like Nginx or Apache - something that can do connection throttling (unless you want overload conditions to bring the box down completely), rewrite URLs, serve static content, and proxy other sub-services.
Periodically recycle your worker processes. For a long-running process, even a small memory leak will eventually add up.
Setup log collection / monitoring
PS: There's a discussion between Aaron and Christopher in the comments of another post (as of this writing, its the top post). A few comments on that:
A shared socket model is very convenient for allowing multiple processes to listen on a single port and compete to accept new connections. Conceptually, you could think of preforked Apache doing this with the significant caveat that each process will only accept a single connection and then die. The efficiency loss for Apache is in the overhead of forking new processes and has nothing to do with the socket operations.
For Node.js, having N workers compete on a single socket is an extremely reasonable solution. The alternative is to set up an on-box front-end like Nginx and have that proxy traffic to the individual workers, alternating between workers for assigning new connections. The two solutions have very similar performance characteristics. And since, as I mentioned above, you will likely want to have Nginx (or an alternative) fronting your node service anyways, the choice here is really between:
Shared Ports: nginx (port 80) --> Node_workers x N (sharing port 3000 w/ Cluster)
vs
Individual Ports: nginx (port 80) --> {Node_worker (port 3000), Node_worker (port 3001), Node_worker (port 3002), Node_worker (port 3003) ...}
There are arguably some benefits to the individual ports setup (potential to have less coupling between processes, have more sophisticated load-balancing decisions, etc.), but it is definitely more work to set up and the built-in cluster module is a low-complexity alternative that works for most people.

One method would be to run multiple instances of node.js on the server and then put a load balancer (preferably a non-blocking one like nginx) in front of them.

Ryan Dahl answers this question in the tech talk he gave at Google last summer. To paraphrase, "just run multiple node processes and use something sensible to allow them to communicate. e.g. sendmsg()-style IPC or traditional RPC".
If you want to get your hands dirty right away, check out the spark2 Forever module. It makes spawning multiple node processes trivially easy. It handles setting up port sharing, so they can each accept connections to the same port, and also auto-respawning if you want to make sure a process is restarted if/when it dies.
UPDATE - 10/11/11: Consensus in the node community seems to be that Cluster is now the preferred module for managing multiple node instances per machine. Forever is also worth a look.

You can use cluster module. Check this.
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
} else {
// Workers can share any TCP connection
// In this case its a HTTP server
http.createServer(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(8000);
}

Node Js is supporting clustering to take full advantages of your cpu. If you are not not running it with cluster, then probably you are wasting your hardware capabilities.
Clustering in Node.js allows you to create separate processes which can share same server port. For example, if we run one HTTP server on Port 3000, it is one Server running on Single thread on single core of processor.
Code shown below allow you to cluster your application. This code is official code represented by Node.js.
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
Object.keys(cluster.workers).forEach(function(id) {
console.log("I am running with ID : " + cluster.workers[id].process.pid);
});
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
} else {
//Do further processing.
}
check this article for the full tutorial

As mentioned above, Cluster will scale and load-balance your app across all cores.
adding something like
cluster.on('exit', function () {
cluster.fork();
});
Will restart any failing workers.
These days, a lot of people also prefer PM2, which handles the clustering for you and also provides some cool monitoring features.
Then, add Nginx or HAProxy in front of several machines running with clustering and you have multiple levels of failover and a much higher load capacity.

The cluster module allows you to utilise all cores of your machine. In fact you can take advantage of this in just 2 commands and without touching your code using a very popular process manager pm2.
npm i -g pm2
pm2 start app.js -i max

Multi-node harnesses all the cores that you may have.
Have a look at http://github.com/kriszyp/multi-node.
For simpler needs, you can start up multiple copies of node on different port numbers and put a load balancer in front of them.

Future version of node will allow you to fork a process and pass messages to it and Ryan has stated he wants to find some way to also share file handlers, so it won't be a straight forward Web Worker implementation.
At this time there is not an easy solution for this but it's still very early and node is one of the fastest moving open source projects I've ever seen so expect something awesome in the near future.

Spark2 is based on Spark which is now no longer maintained. Cluster is its successor, and it has some cool features, like spawning one worker process per CPU core and respawning dead workers.

You may run your node.js application on multiple cores by using the cluster module on combination with os module which may be used to detect how many CPUs you have.
For example let's imagine that you have a server module that runs simple http server on the backend and you want to run it for several CPUs:
// Dependencies.
const server = require('./lib/server'); // This is our custom server module.
const cluster = require('cluster');
const os = require('os');
// If we're on the master thread start the forks.
if (cluster.isMaster) {
// Fork the process.
for (let i = 0; i < os.cpus().length; i++) {
cluster.fork();
}
} else {
// If we're not on the master thread start the server.
server.init();
}

I'm using Node worker to run processes in a simple way from my main process. Seems to be working great while we wait for the official way to come around.

The new kid on the block here is LearnBoost's "Up".
It provides "Zero-downtime reloads" and additionally creates multiple workers (by default the number of CPUs, but it is configurable) to provide the best of all Worlds.
It is new, but seems to be pretty stable, and I'm using it happily in one of my current projects.

💢 IMPORTANT DIFFERENCE - ROLLING RESTART
I have to add an important difference between using node's build in cluster mode VS a process manager like PM2's cluster mode.
PM2 allows zero down time reloads when you are running.
pm2 start app.js -i 2 --wait-ready
In your codes add the following
process.send('ready');
When you call pm2 reload app after code updates, PM2 will reload
the first instance of the app, wait for the 'ready' call, then it move on
to reloads the next instance, ensuring you always have an app active to respond to requests.
While if you use nodejs' cluster, there will be down time when you restart and waiting for server to be ready since there is only one instance of the app and you are restarting all the cores together.

I searched for Clusterize an app for all CPU cores available and found my myself here. Where I found this keyword Is Pm2 command
pm2 examples
This is what I found
Clusterize an app to all CPU cores available:
$ pm2 start -i max
If you need to install pm2 use these commands
npm install -g pm2
yan add -g pm2
or
Use this link Here

It's also possible to design the web-service as several stand alone servers that listen to unix sockets, so that you can push functions like data processing into seperate processes.
This is similar to most scrpting/database web server architectures where a cgi process handles business logic and then pushes and pulls the data via a unix socket to a database.
the difference being that the data processing is written as a node webserver listening on a port.
it's more complex but ultimately its where multi-core development has to go. a multiprocess architecture using multiple components for each web request.

It's possible to scale NodeJS out to multiple boxes using a pure TCP load balancer (HAProxy) in front of multiple boxes running one NodeJS process each.
If you then have some common knowledge to share between all instances you could use a central Redis store or similar which can then be accessed from all process instances (e.g. from all boxes)

Related

Why does my Node express pm2 primary process in cluster mode never handle incoming requests?

I am running a node express app in pm2 cluster mode. Everything is working fine, however; I have noticed that incoming connections to my express routes only ever hit the forked worker app instances and never the primary (master) process.
In the pm2 documentation (https://pm2.keymetrics.io/docs/usage/cluster-mode/) on cluster mode they say
Under the hood, this uses the Node.js cluster module
In the "how it works" section on the Node.js website (https://nodejs.org/api/cluster.html#cluster_how_it_works) it says
The cluster module supports two methods of distributing incoming
connections. The first one (and the default one on all platforms
except Windows) is the round-robin approach, where the primary process
listens on a port, accepts new connections and distributes them across
the workers in a round-robin fashion, with some built-in smarts to
avoid overloading a worker process.
Does this mean the primary process will never actually handle any incoming requests? That can't be!! That would make the entire primary process a glorified load balancer and essentially a dead weight with a bunch of code and a full CPU never really getting used.
If the above IS accurate does that mean that the primary process is a bottleneck for all incoming express connections?
What am I understanding incorrectly or doing wrong that the primary (master) process never actually handles any requests please?
After I completely removed and re installed pm2 and then re-added all my node apps back in cluster mode via cli the first instance (app 0) started receiving messages. I didn't change any code so I'm not exactly sure what the issues was. Thank you to #JonePolvora for your time with comments that lead me to troubleshoot more.

nodejs cluster distributing connection

In nodejs api doc, it says
The cluster module supports two methods of distributing incoming
connections.
The first one (and the default one on all platforms except Windows),
is the round-robin approach, where the master process listens on a
port, accepts new connections and distributes them across the workers
in a round-robin fashion, with some built-in smarts to avoid
overloading a worker process.
The second approach is where the master process creates the listen
socket and sends it to interested workers. The workers then accept
incoming connections directly.
The second approach should, in theory, give the best performance. In
practice however, distribution tends to be very unbalanced due to
operating system scheduler vagaries. Loads have been observed where
over 70% of all connections ended up in just two processes, out of a
total of eight.
I know PM2 is using the first one, but why it doesn't use the second? Just because of unbalnced distribution? thanks.
The second may add CPU load when every child process is trying to 'grab' the socket master sent.

Node js avoid pyramid of doom and memory increases at the same time

I am writing a socket.io based server and I'm trying to avoid the pyramid of doom and to keep the memory low.
I wrote this client - http://jsfiddle.net/QUDXU/1/ which i run with node client-cluster 1000. So 1000 connections that are making continuous requests.
For the server side a tried 3 different solutions which i tested. The results in terms of RAM used by the server, after i let everything run for an hour are:
Simple callbacks - http://jsfiddle.net/DcWmJ/ - 112MB
Q module - http://jsfiddle.net/hhsja/1/ - 850MB and increasing
Async module - http://jsfiddle.net/SgemT/ - 1.2GB and increasing
The server and clients are on different machines. (Softlayer cloud instances). Node 0.10.12 and Socket.io 0.9.16
Why is this happening? How can I keep the memory low and use some kind of library which allows to keep the code readable?
Option 1. You can use the cluster module and gracefully kill your workers from time to time (make sure you disconnect() first). You can check process.memoryUsage().rss > 130000000 in the master and kill the workers when they exceed 130MB, for example :)
Option 2. NodeJS has the habit of using memory and rarely doing rigorous cleanups. As V8 reaches the maximum memory limit, GC calls are more aggressive. So you could lower the maximum memory a node process can take up by running node --max-stack-size <amount>. I do this when running node on embedded devices (often with less than 64 MB of ram available).
Option 3. If you really want to keep the memory low, use weak references where it is possible (anywhere except in long-running calls) https://github.com/TooTallNate/node-weak . This way, the objects will get garbage collected sooner. Extensive tests to make sure everything works are needed, though. GL if u use this one :) https://github.com/TooTallNate/node-weak
It seems like the problem was on the client script, not on the server one. I ran 1000 processes, each of them emitting messages to the server at every second. I think the server was getting very busy resolving all of those requests and thus using all of that memory. I rewrote the client side like this, spawning a number of processes proportional to the number of processors, each of them connecting multiple times like this:
client = io.connect(selectedEnvironment, { 'force new connection': true, 'reconnect': false });
Notice the 'force new connection' flag that allows to connect multiple clients using the same instance of socket.io-client.
The part that solved my problem was actually how the requests were made: any client would make another request after a second from receiving the acknowledge of the previous request, not at every second.
Connecting 1000 clients is making my server using ~100MB RSS. I also used async on the server script which seems very elegant and easier to understand than Q.
The bad part is that I've been running the server for about 2-3 days and the memory rised at 250MB RSS. This, I don't know why.

What's the limit of spawning child_processes?

I have to serve a calculation via algorithm, I've been advised to use a child process per each opened socket, what I am about to do is something like that:
var spawn = require('child_process').spawn;
var child = spawn('node', ['algorithem.js']);
I know how to send argument to the algorithm process and how to receive results.
What I am concerned about, is how many socket (each socket will spawn a process) I can have?
How can I resolve this with my cloud hosting provider? so that my app gets auto scaled?
What's the recommended node js cloud hosting provider?
Finally, is this a good approach in using child processes?
Yes, this is a fair approach when you have to do some heavy processing in node. However, starting a new process introduces some overhead, so be aware. The number of sockets (file descriptors) you can open is limited by your operating system. On Linux, the limits can seen using for example the ulimit-utility.
One alternative approach, that would remove the number of sockets/processes worry, is to run a separate algorithm/computation-server. This server could spawn N worker threads and would listen on a socket. When a computation request is received, this can for example be queued and processed by the first available thread. An advantage of this approach is that your computation server can run on any machine, freeing up resources for your node instance.

How does the cluster module work in Node.js?

Can someone explain in detail how the core cluster module works in Node.js?
How the workers are able to listen to a single port?
As far as I know that the master process does the listening, but how it can know which ports to listen since workers are started after the master process? Do they somehow communicate that back to the master by using the child_process.fork communication channel? And if so how the incoming connection to the port is passed from the master to the worker?
Also I'm wondering what logic is used to determine to which worker an incoming connection is passed?
I know this is an old question, but this is now explained at nodejs.org here:
The worker processes are spawned using the child_process.fork method,
so that they can communicate with the parent via IPC and pass server
handles back and forth.
When you call server.listen(...) in a worker, it serializes the
arguments and passes the request to the master process. If the master
process already has a listening server matching the worker's
requirements, then it passes the handle to the worker. If it does not
already have a listening server matching that requirement, then it
will create one, and pass the handle to the worker.
This causes potentially surprising behavior in three edge cases:
server.listen({fd: 7}) -
Because the message is passed to the master,
file descriptor 7 in the parent will be listened on, and the handle
passed to the worker, rather than listening to the worker's idea of
what the number 7 file descriptor references.
server.listen(handle) -
Listening on handles explicitly will cause the
worker to use the supplied handle, rather than talk to the master
process. If the worker already has the handle, then it's presumed that
you know what you are doing.
server.listen(0) -
Normally, this will cause servers to listen on a
random port. However, in a cluster, each worker will receive the same
"random" port each time they do listen(0). In essence, the port is
random the first time, but predictable thereafter. If you want to
listen on a unique port, generate a port number based on the cluster
worker ID.
When multiple processes are all accept()ing on the same underlying
resource, the operating system load-balances across them very
efficiently. There is no routing logic in Node.js, or in your program,
and no shared state between the workers. Therefore, it is important to
design your program such that it does not rely too heavily on
in-memory data objects for things like sessions and login.
Because workers are all separate processes, they can be killed or
re-spawned depending on your program's needs, without affecting other
workers. As long as there are some workers still alive, the server
will continue to accept connections. Node does not automatically
manage the number of workers for you, however. It is your
responsibility to manage the worker pool for your application's needs.
NodeJS uses a round-robin decision to make load balancing between the child processes. It will give the incoming connections to an empty process, based on the RR algorithm.
The children and the parent do not actually share anything, the whole script is executed from the beginning to end, that is the main difference between the normal C fork. Traditional C forked child would continue executing from the instruction where it was left, not the beginning like NodeJS. So If you want to share anything, you need to connect to a cache like MemCache or Redis.
So the code below produces 6 6 6 (no evil means) on the console.
var cluster = require("cluster");
var a = 5;
a++;
console.log(a);
if ( cluster.isMaster){
worker = cluster.fork();
worker = cluster.fork();
}
Here is a blog post that explains this
As an update to #OpenUserX03's answer, nodejs has no longer use system load-balances but use a built in one. from this post:
To fix that Node v0.12 got a new implementation using a round-robin algorithm to distribute the load between workers in a better way. This is the default approach Node uses since then including Node v6.0.0

Resources