handling cluster modules in nodejs - node.js

I'm trying to learn cluster module and I come across this piece of code I just cant get my mind around it. First it fork childs with child_process module and there it use cluster.fork().process , I've used both cluster module and child_process in an express web-server separately i know cluster module works a load balancer.
But I cant get the idea of using them together. and there also something else, cluster is listening to those worker process and when ever a disconnect and possibly exit event is emitted to master it reforked a process , but here is the question lets assume email worker crashes and the master is going to fork it again how does it know it should fork email ? I mean shouldn't it pass an id which I cant see in this code.
var cluster = require("cluster");
const numCPUs = require("os").cpus().length;
if (cluster.isMaster) {
// fork child process for notif/sms/email worker
global.smsWorker = require("child_process").fork("./smsWorker");
global.emailWorker = require("child_process").fork("./emailWorker");
global.notifiWorker = require("child_process").fork("./notifWorker");
// fork application workers
for (var i = 0; i < numCPUs; i++) {
var worker = cluster.fork().process;
console.log("worker started. process id %s", worker.pid);
}
// if application worker gets disconnected, start new one.
cluster.on("disconnect", function(worker) {
console.error("Worker disconnect: " + worker.id);
var newWorker = cluster.fork().process;
console.log("Worker started. Process id %s", newWorker.pid);
});
} else {
callback(cluster);
}

but here is the question lets assume email worker crashes and the
master is going to fork it again how does it know it should fork email
? I mean shouldn't it pass an id which I cant see in this code.
The disconnect event it is listening to comes from the cluster-specific code, not a generic process listener. So, that disconnect event only fires when one of the cluster child processes exits. If you have some other child processes processing email, then when one of those crashes, it would not trigger this disconnect event. You would have to monitor that child_process yourself separately from within the code that started it.
You can see where the monitoring is for the cluster.on('disconnect', ...) event here in the cluster source code.
Also, I should mention that the cluster module is when you want pure horizontal scaling where all new processes are sharing the exact same work, each taking new incoming connections in turn. The cluster module is not for firing up a specific worker to carry out a specific task. For that, you would use either the Worker Threads module (to fire up a thread) or the child_process module (to fire up a new child process with a specific purpose)

Related

Process vs Worker vs Thread vs Task vs Pool in Node.js

What is Process, Worker, Thread, Task, Pool in Node.js from a programmer point of view?
I went through a lot of material, but difficult to understand quickly for a beginner programmer. Here is a quick summary
A Node.js process is created when run a Node.js program like node app.js (or the child process created through child_process or cluster modules). Each process will have its own memory and resources
Worker is a Node.js built-in module which takes your module (.js) as an input and creates worker object, inside a process, that executes asynchronously.
//app.js
//TODO add modules
const { Worker } = require('worker_threads');
//TODO wrap the below code into your code
const worker = new Worker('./task_processor.js');
const workerMaxLifetime = 10000;
//Send message to worker
worker.postMessage('Message to thread');
//Receive message from worker
worker.on('message', (message) => { console.log(' App:', message); });
//Terminate worker
setTimeout(() => { worker.terminate(); }, workerMaxLifetime);
Task is your module (.js) where you write the code to run as a Thread. Actually, we should call it 'Task Processor'
//task_processor.js
//TODO add modules
const { parentPort } = require('worker_threads');
//TODO wrap the below code into your code
//Receive message from App
parentPort.on('message', (task_input) => {
//Send message to App
parentPort.postMessage(task_input.a + task_input.b);
});
Thread is nothing but the worker in execution.
Pool is a wrapper .js file which create/terminate worker objects and facilitates communication between App and worker. Worker pool is not mandatory though most real world scenarios implements pools where worker thread concept is implemented. Example
Node.js module is a .js file
App: The main(or default) thread in a process is also referred as App
Process vs Worker: Each process will have its own memory and resources, whereas worker uses the same memory and resources of the process from which it is created.

Node cluster: Ensure request completes before disconnecting worker

I am running a cluster of workers with nodejs. There is a memory leak which, due to an old and unfamiliar codebase, I have decided to fix by periodically killing workers and replacing them with new ones rather than diagnosing. As such I'm essentially using
setTimeout(() => {worker.disconnect();}, INTERVAL);
when we spawn a worker.
However, I want to make sure that when a worker is killed, it completes any request it is currently processing prior to being disconnected, so that requests aren't dropped. From experimenting with the library, calling worker.disconnect() drops a currently-processing request, causing an "empty reply from server" error. I would rather not manually implement logic to detect if a server is currently processing a request (e.g. by maintaining a set of active requests or something), due to edge cases. Is there a "standard" way of telling a cluster worker to "wait until the current request completes, and then exit"?
So I have discovered something which seems to work, as far as I can tell. The strategy is to not have the master shut the worker down but instead let the worker shut itself down, after it has closed its server. The master sends it a message which the worker responds to. Something like (pseudocode)
if (cluster.isMaster) {
var worker = cluster.fork();
// in 10 seconds, tell the worker to shutdown
setTimeout(() => {worker.send('shutdown');}, 10000);
} else {
var server = createServer(); // whatever server setup
server.listen(PORT);
process.on('message', (msg) => {
if (msg === 'shutdown') {
// disconnect from the cluster after the server closes
server.close(() => {
cluster.worker.disconnect();
}
}
}
}

Do all workers (child processes) process same sets of work

Hi I'm learning nodejs and I'm bit more confused with cluster module, Okay to the point, Master creates workers, in my case I'm using 32 bit windows operating system, so I'm provided with "2 workers". by considering the following simple program
var cluster = require('cluster');
var os = require('os');
var numCPUs = os.cpus().length;
console.log("start");
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; ++i) {
cluster.fork();
}
}
console.log(cluster.isMaster? "I'm Master":"I'm worker");
Output
start
I'm Master
start
I'm worker
start
I'm worker
By googling I found Master will create worker and allocate the incoming request to the available worker. Here my question is, if two workers are available for all time then every user request will be handled twice?, Thanks in advance
The cluster module handles requests and routes them to a single worker.
Only one worker will ever receive a single request, even if every worker is available all of the time.
Sources and good reading material: http://stackabuse.com/setting-up-a-node-js-cluster/ and https://nodejs.org/api/cluster.html

How to wait for a Redis connection?

I'm currently trying to use Node.js Kue for processing jobs in a queue, but I believe I'm not doing it right.
Indeed the way I'm working now, I have two different services (which in this case I'm running with Docker Compose): one Web API built with Express with sends jobs to the queue and one processing module. The issue here is with the processing module.
I've coded it as follows:
var kue = require('kue');
var config = require('./config');
var queue = kue.createQueue({
prefix: config.redis.queuePrefix,
redis: {
port: config.redis.port,
host: config.redis.host
}
});
queue.process('jobType', function (job, done) {
// do processing here...
});
When we run this with Node, it sits there waiting for things to be placed on the queue to do the processing.
There are two issues however:
It needs that Redis be available before running this module. If we run this without Redis already available, it crashes because the host is not accessible and ends the process.
If Redis suddenly becomes unavailable, the processing module also crashes because it cannot stablish the connection and the process is killed.
How can I avoid these problems?
My guess is that I should somehow make the code "wait" for Redis, but I have no idea on how to do this.
How can this be done in this case?
You can use promise to wait until redis is loaded. Then run your module.
loadRedis().then(() => {
//load your module
})
Or you can use generator to "stop" until redis is loaded.
function*(){
const redisLoaded = yield loadRedis();
//load your module
}

What is the best way to watch and signal file changes to a node.js app?

I periodically rsync code changes to my production server. I have the following cluster code that creates workers for my main app.
var cluster = require('cluster');
function startWorker() {
var worker = cluster.fork();
console.log('CLUSTER: Worker %d started', worker.id);
}
if(cluster.isMaster){
require('os').cpus().forEach(function(){
startWorker();
});
// log any workers that disconnect; if a worker disconnects, it
// should then exit, so we'll wait for the exit event to spawn
// a new worker to replace it
cluster.on('disconnect', function(worker){
console.log('CLUSTER: Worker %d disconnected from the cluster.',
worker.id);
});
// when a worker dies (exits), create a worker to replace it
cluster.on('exit', function(worker, code, signal){
console.log('CLUSTER: Worker %d died with exit code %d (%s)',
worker.id, code, signal);
startWorker();
});
} else {
// start our app on worker; see meadowlark.js
require('./app.js')();
}
This article shows how to listen to a signal (in the articles case SIGUSR2, to restart the workers). In a nutshell he does something like:
process.on("SIGUSR2", function() {
//Code to disconnect workers, delete cache and restart workers
});
My questions is:
- Is SIGUSR2 the best way to signal to the process that it should reload the workers ? What if I want to send some additional information ?
- Who is sending the signal ? Is it some OS feature that you can set to watch a file or directory ? How do I do that ?
Note:
- I would rather use some OS level feature that is not related to node.js. For example, some posts suggest using node.js specific modules like "naught" etc, but I would rather not use "naught". :)
The Linux kernel supports a file system notification subsystem called inotify. You can use it with nodejs: https://github.com/c4milo/node-inotify

Resources