Do all workers (child processes) process same sets of work

Do all workers (child processes) process same sets of work - node.js

Hi I'm learning nodejs and I'm bit more confused with cluster module, Okay to the point, Master creates workers, in my case I'm using 32 bit windows operating system, so I'm provided with "2 workers". by considering the following simple program
var cluster = require('cluster');
var os = require('os');
var numCPUs = os.cpus().length;
console.log("start");
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; ++i) {
cluster.fork();
}
}
console.log(cluster.isMaster? "I'm Master":"I'm worker");
Output
start
I'm Master
start
I'm worker
start
I'm worker
By googling I found Master will create worker and allocate the incoming request to the available worker. Here my question is, if two workers are available for all time then every user request will be handled twice?, Thanks in advance

The cluster module handles requests and routes them to a single worker.
Only one worker will ever receive a single request, even if every worker is available all of the time.
Sources and good reading material: http://stackabuse.com/setting-up-a-node-js-cluster/ and https://nodejs.org/api/cluster.html

Related

handling cluster modules in nodejs

I'm trying to learn cluster module and I come across this piece of code I just cant get my mind around it. First it fork childs with child_process module and there it use cluster.fork().process , I've used both cluster module and child_process in an express web-server separately i know cluster module works a load balancer.
But I cant get the idea of using them together. and there also something else, cluster is listening to those worker process and when ever a disconnect and possibly exit event is emitted to master it reforked a process , but here is the question lets assume email worker crashes and the master is going to fork it again how does it know it should fork email ? I mean shouldn't it pass an id which I cant see in this code.
var cluster = require("cluster");
const numCPUs = require("os").cpus().length;
if (cluster.isMaster) {
// fork child process for notif/sms/email worker
global.smsWorker = require("child_process").fork("./smsWorker");
global.emailWorker = require("child_process").fork("./emailWorker");
global.notifiWorker = require("child_process").fork("./notifWorker");
// fork application workers
for (var i = 0; i < numCPUs; i++) {
var worker = cluster.fork().process;
console.log("worker started. process id %s", worker.pid);
}
// if application worker gets disconnected, start new one.
cluster.on("disconnect", function(worker) {
console.error("Worker disconnect: " + worker.id);
var newWorker = cluster.fork().process;
console.log("Worker started. Process id %s", newWorker.pid);
});
} else {
callback(cluster);
}

but here is the question lets assume email worker crashes and the
master is going to fork it again how does it know it should fork email
? I mean shouldn't it pass an id which I cant see in this code.
The disconnect event it is listening to comes from the cluster-specific code, not a generic process listener. So, that disconnect event only fires when one of the cluster child processes exits. If you have some other child processes processing email, then when one of those crashes, it would not trigger this disconnect event. You would have to monitor that child_process yourself separately from within the code that started it.
You can see where the monitoring is for the cluster.on('disconnect', ...) event here in the cluster source code.
Also, I should mention that the cluster module is when you want pure horizontal scaling where all new processes are sharing the exact same work, each taking new incoming connections in turn. The cluster module is not for firing up a specific worker to carry out a specific task. For that, you would use either the Worker Threads module (to fire up a thread) or the child_process module (to fire up a new child process with a specific purpose)

Should I have a new MongoDB connection per thread

When using the Node.js cluster library, should a connection to MongoDB be made in the master thread or in each child thread?
Firstly, can multiple threads use the same connection?
Secondly, would it be more performance-effective to use the same or separate connections

In my experience, each child needs a connection, I use the following pattern in the app code for example
const cluster = require('cluster');
const mongoose = require('mongoose');
...
if (cluster.isMaster) { // Parent, only creates clusters
global.processId = 'Master';
for (let i = 0; i < 2; ++i) {
cluster.fork();
}
...
} else { // Child cluster
// connect
mongoose.connect('mongodb://localhost/myDB');
...
}

The question suggests that the cluster library uses threads, but it does not, it uses processes.
Each process MUST have its own connection. Connections cannot be shared across a process.

Every process should have its own connection.
Don't mix sessions. Use connect-mongo for sessions.
Once client establishes session on one of the workers, it should not use any other instance for operations related to this client, this way you can cache clients on their respective server instances.

How node.js guarantees that each worker will be on different cpu core?

In code cluster.fork()
How node.js guarantees that one will start on core-1 and second on core-2?
const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
// Whats happening here? numCPUs is just a length, no reference to any core
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`worker ${worker.process.pid} died`);
});
} else {
http.createServer((req, res) => {
res.writeHead(200);
res.end('hello world\n');
}).listen(8000);
}
And How client requests divided between workers?

Node does not make any guarantees that a worker will run on a particular cpu core. That is all handled and scheduled by the OS.
Per the docs, client requests are divided in one of two ways:
The cluster module supports two methods of distributing incoming connections.
The first one (and the default one on all platforms except Windows), is the round-robin approach, where the master process listens on a port, accepts new connections and distributes them across the workers in a round-robin fashion, with some built-in smarts to avoid overloading a worker process.
The second approach is where the master process creates the listen socket and sends it to interested workers. The workers then accept incoming connections directly.
The second approach should, in theory, give the best performance. In practice however, distribution tends to be very unbalanced due to operating system scheduler vagaries. Loads have been observed where over 70% of all connections ended up in just two processes, out of a total of eight.

Node.js is single threaded does this mean we cannot run multiple Node.js in different threads?

I've read in an article that Node.js is single threaded. The question is what if we run multiple Node.js files in different ports? Do they have their own thread or all of them will be under the main Node.js thread?
Could someone shed someone light on the subject I'm now on the dark side.

The question is what if we run multiple Node.js files in different ports? Do they have their own thread or all of them will be under the main Node.js thread?
From your question, it sounds to me like you are actually starting up multiple processes of Node.js. In those cases, they will work like any other set of multiple processes on your system, and your OS will attempt to balance the load out between all the cores and/or CPUs.
I've read in an article that Node.js is single threaded.
This is a bit more complicated. While the V8 JavaScript engine Node.js uses does run your JavaScript in a single thread, much of the Node.js libraries call out to native code which can use as many threads as it likes. These internals of Node.js use a thread pool and multithreading for disk and network IO, among other tasks.
The applications Node.js really shines in are those that are typically IO bound. In these cases, you get much of the benefit of multithreading without having to write any code for it. For example, when you make multiple requests to disk Node.js will use multiple threads to handle the buffering and management of that data while not blocking your main JavaScript thread.
In many of my applications, I have found that I can fully utilize an 8-core box without writing any code to fire up child processes. Node's internal multithreading does all the work for me. Your mileage will vary from application to application.
I've written a different explanation on a past question you might find helpful: https://stackoverflow.com/a/19324665/362536

http://nodejs.org/api/cluster.html
A single instance of Node runs in a single thread. To take advantage of multi-core systems the user will sometimes want to launch a cluster of Node processes to handle the load.
The cluster module allows you to easily create child processes that all share server ports.
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
} else {
// Workers can share any TCP connection
// In this case its a HTTP server
http.createServer(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(8000);
}
Which means you have to architect yourself, so if you want to listen to different port in different threads, either use cluster or child_process.

How many child_processes should I fork() in node.js?

My question is quite simple. though, it may require different variable to be answered (i guess)
I'm playing around with node.js and I'm thinking of how to use it in a multi core architecture.
Latest version provides child_process.fork()and child.spawn() methods for multi-process programming. I've read on this very good (but dated) article about using Node.js as a large-scale Comet server. Now then nodejs provides multi-process programming, I really have no idea about how many processes should I have to spawn for serving large number of request (assuming my server is running on just one machine). Is there a way of choosing the 'best' (or at least a good) number of child processes doing the same job?
Any link to a starting guide would be very appreciated.
Thank you

The Node.js official documentation has an example of how to use cluster:
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {
// Worker processes have a http server.
http.Server(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(8000);
}
As you can see in the code above, you should use as many forks as the number of cpus you have, so that all cores work in the same time (distribute the work between your processor's cores).

In Node.js example, they want each core to run a single process only. There is no technical restriction on how many number of workers per core, so you can run as many as you can as long as your systerm can handle.
However, running more than one process per core does not improve performance of the app.
hope it helps!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string