How many child_processes should I fork() in node.js? - node.js

My question is quite simple. though, it may require different variable to be answered (i guess)
I'm playing around with node.js and I'm thinking of how to use it in a multi core architecture.
Latest version provides child_process.fork()and child.spawn() methods for multi-process programming. I've read on this very good (but dated) article about using Node.js as a large-scale Comet server. Now then nodejs provides multi-process programming, I really have no idea about how many processes should I have to spawn for serving large number of request (assuming my server is running on just one machine). Is there a way of choosing the 'best' (or at least a good) number of child processes doing the same job?
Any link to a starting guide would be very appreciated.
Thank you

The Node.js official documentation has an example of how to use cluster:
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {
// Worker processes have a http server.
http.Server(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(8000);
}
As you can see in the code above, you should use as many forks as the number of cpus you have, so that all cores work in the same time (distribute the work between your processor's cores).

In Node.js example, they want each core to run a single process only. There is no technical restriction on how many number of workers per core, so you can run as many as you can as long as your systerm can handle.
However, running more than one process per core does not improve performance of the app.
hope it helps!

Related

Do all workers (child processes) process same sets of work

Hi I'm learning nodejs and I'm bit more confused with cluster module, Okay to the point, Master creates workers, in my case I'm using 32 bit windows operating system, so I'm provided with "2 workers". by considering the following simple program
var cluster = require('cluster');
var os = require('os');
var numCPUs = os.cpus().length;
console.log("start");
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; ++i) {
cluster.fork();
}
}
console.log(cluster.isMaster? "I'm Master":"I'm worker");
Output
start
I'm Master
start
I'm worker
start
I'm worker
By googling I found Master will create worker and allocate the incoming request to the available worker. Here my question is, if two workers are available for all time then every user request will be handled twice?, Thanks in advance
The cluster module handles requests and routes them to a single worker.
Only one worker will ever receive a single request, even if every worker is available all of the time.
Sources and good reading material: http://stackabuse.com/setting-up-a-node-js-cluster/ and https://nodejs.org/api/cluster.html

To take Advantage of Multi-Processor in Node.js, why the number of clusters we fork is CPU cores count?

I know maybe it's a stupid question. I'm not good at OS knowledge.
Usually the number of cluster/process we fork is the CPU cores count, like the Nodejs offical Doc display.
const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
console.log(`Master ${process.pid} is running`);
// Fork workers.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`worker ${worker.process.pid} died`);
});
} else {
// Workers can share any TCP connection
// In this case it is an HTTP server
http.createServer((req, res) => {
res.writeHead(200);
res.end('hello world\n');
}).listen(8000);
console.log(`Worker ${process.pid} started`);
}
But I'm confused what if we fork the cluster more than the number of CPU cores, what it gonna happen?
Is it the empirical value, best practice or any other reason?
But I'm confused what if we fork the cluster more than the number of CPU cores, what it gonna happen?
Nothing will happen, everything will still work, but putting more than one CPU-intensive thread or process per core will not make the application any faster - if anything, it may only make is slower because the core will waste some portion of the time on the context switches between the threads.
If you have CPU-bound operations then the optimal number of threads/processes per core is always 1. For I/O-bound operations it shouldn't really matter as most of the time the processes will not do any computations anyway.
Think about an ideal situation wherein all what you run in your system is node based application server.
This is not practical, as there would be a minimum of 5-10 processes running in your machine to house-keep your system environment.
Also assume that the event loop thread (aka application thread, aka main thread) is the one which performs the most useful work in the node.js process.
Again, this is not true, as there are few helper threads in node.js process to house-keep the asynchronous event-driven programming model.
Again assume that the throughput of your server measured through the eyes of connected clients is a direct function of how each process is able to avail free CPUs, and all the actions performed by the process as part of the request-response cycle is purely CPU bound.
This may or may not be TRUE. If the server has further connections with external servers such as DBs, web services etc. they gets initiated in the I/O channel, and the thread goes ahead and address some other request. But due to the fact all the I/O operations are made non-blocking, and all potential blocking operations are made asynchronous, the main thread will be relatively running only CPU bound actions, with only one blocking call in the whole circuit - the polling function which blocks when there is no ready I/O.
Cluster module suggests to use # of CPUs for the cluster member count, the above hypothetical scenario is assumed, and it gives relatively optimal performance, in the absence of any better and comprehensive reasoning.
Hope this helps.

Scalable architecture for socket.io

I am new to socket.io and Node JS and I am trying to build a scalable application with a high number of simultaneous socket connections (10,000+).
Currently, I started on a model where my server creates child process, and every child process listens a specific port with a sicket.io instance attached. Once a client connects, he is redirected on a specific port.
The big question is : Does having several socket.io instances on several ports increases the number of possible connections ?
Here is my code, just in case :
Server
var server = http.createServer(app);
server.childList = [];
for (var i = 0; i < app.portList.length; i++) {
server.childList[i] = require('child_process').fork('child.js');
}
server.listen(3443, () => {
for (var i = 0; i < app.portList.length; i++) {
server.childList[i].send({ message: 'createServer', port: app.portList[i] });;
}
});
child.js :
var app = require('./app');
var http = require('http');
var socket_io = require( "socket.io" );
process.on('message', (m) => {
if (m.message === 'createServer') {
var childServ = http.createServer(app);
childServ.listen(m.port, () => {
console.log("childServ listening on port "+m.port);
});
var io = socket_io();
io.attach( childServ );
io.sockets.on('connection', function (socket) {
console.log("A client just connected to my socket_io server on port "+m.port);
});
}
});
Feel free to release the kraken if I did something horrible there
First off, what you need to optimize depends on how busy your socket.io connections are and whether the activity is mostly asynchronous I/O operations or whether it's CPU-intensive stuff. As you may already know, node.js scales really well already for asynchronous I/O stuff, but it needs multiple processes to scale well for CPU-intensive stuff. Further, there are some situations where the garbage collector gets too busy (lots and lots of small requests being served) and you also need to go to multiple processes for that reason.
More server instances (up to at least the number of CPUs you have in the server) will give you more CPU processing power (if that's what you need). It won't necessarily increase the number of max connections you can support on a box if most of them are idle. For that, you have to custom tune your server to support lots and lots of connections.
Usually, you would NOT want N socket.io servers each listening on a different port. That puts the burden on the clients to somehow select a port and the client has to know exactly what ports to choose from (e.g. how many server instances you have).
Usually, you don't do it this way. Usually, you have N processes all listening on the same port and you use some sort of loadbalancer to distribute the load among them. This makes the server infrastructure transparent to the clients which means you can scale the servers up or down without changing the client behavior at all. In fact, you can even add more than one physical server box and increase capacity even further that way.
Here's an article from the socket.io doc on using multiple nodes with a load balancer to increase capacity: Socket.io - using multiple nodes (updated link). There's also explicit support by redis for a combination of multiple socket.io instances and redis so you can communicate with any socket.io instance regardless of process.
Does having several socket.io instances on several ports increases the number of possible connections ?
Yes, you have built a simple load-balancer which is a pretty common practice. There are several good tutorials about different ways of scaling node.js.
Horizontally scale socket.io with redis
http://goldfirestudios.com/blog/136/Horizontally-Scaling-Node.js-and-WebSockets-with-Redis
Your load balancer will speed up your code to a point because you utilize multiple threads but I read on some other thread a while ago that a rule of thumb is to start around 2-3 processes per cpu core. More than that cause more overhead then help, but that is highly dependent on situation.

How node.js guarantees that each worker will be on different cpu core?

In code cluster.fork()
How node.js guarantees that one will start on core-1 and second on core-2?
const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
// Whats happening here? numCPUs is just a length, no reference to any core
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`worker ${worker.process.pid} died`);
});
} else {
http.createServer((req, res) => {
res.writeHead(200);
res.end('hello world\n');
}).listen(8000);
}
And How client requests divided between workers?
Node does not make any guarantees that a worker will run on a particular cpu core. That is all handled and scheduled by the OS.
Per the docs, client requests are divided in one of two ways:
The cluster module supports two methods of distributing incoming connections.
The first one (and the default one on all platforms except Windows), is the round-robin approach, where the master process listens on a port, accepts new connections and distributes them across the workers in a round-robin fashion, with some built-in smarts to avoid overloading a worker process.
The second approach is where the master process creates the listen socket and sends it to interested workers. The workers then accept incoming connections directly.
The second approach should, in theory, give the best performance. In practice however, distribution tends to be very unbalanced due to operating system scheduler vagaries. Loads have been observed where over 70% of all connections ended up in just two processes, out of a total of eight.

Node.js is single threaded does this mean we cannot run multiple Node.js in different threads?

I've read in an article that Node.js is single threaded. The question is what if we run multiple Node.js files in different ports? Do they have their own thread or all of them will be under the main Node.js thread?
Could someone shed someone light on the subject I'm now on the dark side.
The question is what if we run multiple Node.js files in different ports? Do they have their own thread or all of them will be under the main Node.js thread?
From your question, it sounds to me like you are actually starting up multiple processes of Node.js. In those cases, they will work like any other set of multiple processes on your system, and your OS will attempt to balance the load out between all the cores and/or CPUs.
I've read in an article that Node.js is single threaded.
This is a bit more complicated. While the V8 JavaScript engine Node.js uses does run your JavaScript in a single thread, much of the Node.js libraries call out to native code which can use as many threads as it likes. These internals of Node.js use a thread pool and multithreading for disk and network IO, among other tasks.
The applications Node.js really shines in are those that are typically IO bound. In these cases, you get much of the benefit of multithreading without having to write any code for it. For example, when you make multiple requests to disk Node.js will use multiple threads to handle the buffering and management of that data while not blocking your main JavaScript thread.
In many of my applications, I have found that I can fully utilize an 8-core box without writing any code to fire up child processes. Node's internal multithreading does all the work for me. Your mileage will vary from application to application.
I've written a different explanation on a past question you might find helpful: https://stackoverflow.com/a/19324665/362536
http://nodejs.org/api/cluster.html
A single instance of Node runs in a single thread. To take advantage of multi-core systems the user will sometimes want to launch a cluster of Node processes to handle the load.
The cluster module allows you to easily create child processes that all share server ports.
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
} else {
// Workers can share any TCP connection
// In this case its a HTTP server
http.createServer(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(8000);
}
Which means you have to architect yourself, so if you want to listen to different port in different threads, either use cluster or child_process.

Resources