Node cluster: Ensure request completes before disconnecting worker - node.js

I am running a cluster of workers with nodejs. There is a memory leak which, due to an old and unfamiliar codebase, I have decided to fix by periodically killing workers and replacing them with new ones rather than diagnosing. As such I'm essentially using
setTimeout(() => {worker.disconnect();}, INTERVAL);
when we spawn a worker.
However, I want to make sure that when a worker is killed, it completes any request it is currently processing prior to being disconnected, so that requests aren't dropped. From experimenting with the library, calling worker.disconnect() drops a currently-processing request, causing an "empty reply from server" error. I would rather not manually implement logic to detect if a server is currently processing a request (e.g. by maintaining a set of active requests or something), due to edge cases. Is there a "standard" way of telling a cluster worker to "wait until the current request completes, and then exit"?

So I have discovered something which seems to work, as far as I can tell. The strategy is to not have the master shut the worker down but instead let the worker shut itself down, after it has closed its server. The master sends it a message which the worker responds to. Something like (pseudocode)
if (cluster.isMaster) {
var worker = cluster.fork();
// in 10 seconds, tell the worker to shutdown
setTimeout(() => {worker.send('shutdown');}, 10000);
} else {
var server = createServer(); // whatever server setup
server.listen(PORT);
process.on('message', (msg) => {
if (msg === 'shutdown') {
// disconnect from the cluster after the server closes
server.close(() => {
cluster.worker.disconnect();
}
}
}
}

Related

Process vs Worker vs Thread vs Task vs Pool in Node.js

What is Process, Worker, Thread, Task, Pool in Node.js from a programmer point of view?
I went through a lot of material, but difficult to understand quickly for a beginner programmer. Here is a quick summary
A Node.js process is created when run a Node.js program like node app.js (or the child process created through child_process or cluster modules). Each process will have its own memory and resources
Worker is a Node.js built-in module which takes your module (.js) as an input and creates worker object, inside a process, that executes asynchronously.
//app.js
//TODO add modules
const { Worker } = require('worker_threads');
//TODO wrap the below code into your code
const worker = new Worker('./task_processor.js');
const workerMaxLifetime = 10000;
//Send message to worker
worker.postMessage('Message to thread');
//Receive message from worker
worker.on('message', (message) => { console.log(' App:', message); });
//Terminate worker
setTimeout(() => { worker.terminate(); }, workerMaxLifetime);
Task is your module (.js) where you write the code to run as a Thread. Actually, we should call it 'Task Processor'
//task_processor.js
//TODO add modules
const { parentPort } = require('worker_threads');
//TODO wrap the below code into your code
//Receive message from App
parentPort.on('message', (task_input) => {
//Send message to App
parentPort.postMessage(task_input.a + task_input.b);
});
Thread is nothing but the worker in execution.
Pool is a wrapper .js file which create/terminate worker objects and facilitates communication between App and worker. Worker pool is not mandatory though most real world scenarios implements pools where worker thread concept is implemented. Example
Node.js module is a .js file
App: The main(or default) thread in a process is also referred as App
Process vs Worker: Each process will have its own memory and resources, whereas worker uses the same memory and resources of the process from which it is created.

handling cluster modules in nodejs

I'm trying to learn cluster module and I come across this piece of code I just cant get my mind around it. First it fork childs with child_process module and there it use cluster.fork().process , I've used both cluster module and child_process in an express web-server separately i know cluster module works a load balancer.
But I cant get the idea of using them together. and there also something else, cluster is listening to those worker process and when ever a disconnect and possibly exit event is emitted to master it reforked a process , but here is the question lets assume email worker crashes and the master is going to fork it again how does it know it should fork email ? I mean shouldn't it pass an id which I cant see in this code.
var cluster = require("cluster");
const numCPUs = require("os").cpus().length;
if (cluster.isMaster) {
// fork child process for notif/sms/email worker
global.smsWorker = require("child_process").fork("./smsWorker");
global.emailWorker = require("child_process").fork("./emailWorker");
global.notifiWorker = require("child_process").fork("./notifWorker");
// fork application workers
for (var i = 0; i < numCPUs; i++) {
var worker = cluster.fork().process;
console.log("worker started. process id %s", worker.pid);
}
// if application worker gets disconnected, start new one.
cluster.on("disconnect", function(worker) {
console.error("Worker disconnect: " + worker.id);
var newWorker = cluster.fork().process;
console.log("Worker started. Process id %s", newWorker.pid);
});
} else {
callback(cluster);
}
but here is the question lets assume email worker crashes and the
master is going to fork it again how does it know it should fork email
? I mean shouldn't it pass an id which I cant see in this code.
The disconnect event it is listening to comes from the cluster-specific code, not a generic process listener. So, that disconnect event only fires when one of the cluster child processes exits. If you have some other child processes processing email, then when one of those crashes, it would not trigger this disconnect event. You would have to monitor that child_process yourself separately from within the code that started it.
You can see where the monitoring is for the cluster.on('disconnect', ...) event here in the cluster source code.
Also, I should mention that the cluster module is when you want pure horizontal scaling where all new processes are sharing the exact same work, each taking new incoming connections in turn. The cluster module is not for firing up a specific worker to carry out a specific task. For that, you would use either the Worker Threads module (to fire up a thread) or the child_process module (to fire up a new child process with a specific purpose)

How to handle connection timeout in ZeroMQ.js properly?

Consider a Node.js application with few processes:
single main process sitting in the memory and working like a web server;
system user's commands that can be run through CLI and exit when they are done.
I want to implement something like IPC between main and CLI processes, and it seems that ZeroMQ bindings for Node.js is a quite good candidate for doing that. I've chosen 6.0.0-beta.4 version:
Version 6.0.0 (in beta) features a brand new API that solves many fundamental issues and is recommended for new projects.
Using Request/Reply I was able to achieve what I wanted: CLI process notifies the main process about some occurred event (and optionally receives some data as a response) and continues its execution. A problem I have right now is that my CLI process hangs if the main process is off (is not available). The command still has to be executed and exit without notifying the main process if it's unable to establish a connection to a socket.
Here is a simplified code snippet of my CLI running in asynchronous method:
const { Request } = require('zeromq');
async function notify() {
let parsedResponse;
try {
const message = { event: 'hello world' };
const socket = new Request({ connectTimeout: 500 });
socket.connect('tcp://127.0.0.1:33332');
await socket.send(JSON.stringify(message));
const response = await socket.receive();
parsedResponse = JSON.parse(response.toString());
}
catch (e) {
console.error(e);
}
return parsedResponse;
}
(async() => {
const response = await notify();
if (response) {
console.log(response);
}
else {
console.log('Nothing is received.');
}
})();
I set connectTimeout option but wonder how to use it. The docs state:
Sets how long to wait before timing-out a connect() system call. The connect() system call normally takes a long time before it returns a time out error. Setting this option allows the library to time out the call at an earlier interval.
Looking at connect one see that it's not asynchronous:
Connects to the socket at the given remote address and returns immediately. The connection will be made asynchronously in the background.
Ok, probably send method of the socket will wait for connection establishment and reject a promise on connection timeout...but nothing happens there. send method is executed and the code is stuck at resolving receive. It's waiting for reply from the main process that will never come. So the main question is: "How to use connectTimeout option to handle socket's connection timeout?" I found an answer to similar question related to C++ but it actually doesn't answer the question (or I can't understand it). Can't believe that this option is useless and that it was added to the API in order to nobody can't use it.
I also would be happy with some kind of a workaround, and found receiveTimeout option. Changing socket creation to
const socket = new Request({ receiveTimeout: 500 });
leads to the the rejection in receive method and the following output:
{ [Error: Socket temporarily unavailable] errno: 11, code: 'EAGAIN' }
Nothing is received.
Source code executed but the process doesn't exit in this case. Seems that some resources are busy and are not freed. When main process is on the line everything works fine, process exits and I have the following reply in output:
{ status: 'success' }
So another question is: "How to exit the process gracefully on rejecting receive method with receiveTimeout?". Calling process.exit() is not an option here!
P.S. My environment is:
Kubuntu 18.04.1;
Node 10.15.0;
ZeroMQ bindings are installed this way:
$ yarn add zeromq#6.0.0-beta.4 --zmq-shared
ZeroMQ decouples the socket connection mechanics from message delivery. As the documentation states connectTimeout only influences the timeout of the connect() system call and does not affect the timeouts of sending/receiving messages.
For example:
const zmq = require("zeromq")
async function run() {
const socket = new zmq.Dealer({connectTimeout: 2000})
socket.events.on("connect:retry", event => {
console.log(new Date(), event.type)
})
socket.connect("tcp://example.com:12345")
}
run()
The connect:retry event occurs every ~2 seconds:
> node test.js
2019-11-25T13:35:53.375Z connect:retry
2019-11-25T13:35:55.536Z connect:retry
2019-11-25T13:35:57.719Z connect:retry
If we change connectTimeout to 200 then you can see the event will occur much more frequently. The timeout is not the only thing influencing the delay between the events, but it should be clear that it happens much quicker.
> node test.js
2019-11-25T13:36:05.271Z connect:retry
2019-11-25T13:36:05.531Z connect:retry
2019-11-25T13:36:05.810Z connect:retry
Hope this clarifies the effect of connectTimeout.

keep a nodejs request in waiting until first is completed

I have a situation with nodejs api. what I want to archive is when same user is hitting the same API at same time , i want to block or queue his second request until first is completed.
PS- i want to apply this for same user
thanks in advance
I am not sure doing anything on the server side (like semaphores) will solve this issue if the app is both stateless and is going be scaled horizontally in Production over two or more replicas.
All the Pods (app servers) will have to maintain the same semaphore value for the end-point being used
I think you can achieve the same mechanism with a Database Flag or use Redis to indicate the operation is in progress on one of the app servers.
It is as good as having sessions (in terms of maintaining a certain state) as per the client request
You will also need a recovery mechanism to reset the semaphore if the operation carried out by that end-point fails or crashes the thread.
You can do this by using semaphore. The semaphore will be based on the client where each client will have only 1 semaphore while receiving a request the server stops receiving another request by locking mechanism and after responding, the lock should be released.
Demo:
let clientSemaphores = {};
const semaphore = require('semaphore');
var server = require('http').createServer(function(req, res) {
var client = req.url.split("/")[1]; //client id to specify
console.log(client, " request recieved");
if (!clientSemaphores[client] || clientSemaphores[client].current < clientSemaphores[client].capacity){
clientSemaphores[client] = clientSemaphores[client] || semaphore(1);
clientSemaphores[client].take(function() {
setTimeout(() => {
res.write(client + " Then good day, madam!\n");
res.end(client + " We hope to see you soon for tea.");
clientSemaphores[client].leave();
}, 5000);
});
} else {
res.end(client + " Request already processing... please wait...");
}
});
server.listen(8000);
OR
HTTP Pipelining
Persistent HTTP allows us to reuse an existing connection between multiple application requests, but it implies a strict first in, first out (FIFO) queuing order on the client: dispatch request, wait for the full response, dispatch next request from the client queue. HTTP pipelining is a small but important optimization to this workflow, which allows us to relocate the FIFO queue from the client (request queuing) to the server (response queuing).
Reference: HTTP Pipelining

Efficient HTTP shutdown with keepalives?

This Node.js server will shutdown cleanly on a Ctrl+C when all connections are closed.
var http = require('http');
var app = http.createServer(function (req, res) {
res.end('Hello');
});
process.on('SIGINT', function() {
console.log('Closing...');
app.close(function () {
console.log('Closed.');
process.exit();
});
});
app.listen(3000);
The problem with this is that it includes keepalive connections. If you open a tab to this app in Chrome and then try to Ctrl+C it, it won't shutdown for about 2 minutes when Chrome finally releases the connection.
Is there a clean way of detecting when there are no more HTTP requests, even if some connections are still open?
By default there's no socket timeout, that means that connections will be open forever until the client closes them. If you want to set a timeout use this function: socket.setTimeout.
If you try to close the server you simply can't because there are active connections, so if you try to gracefully shutdown the shutdown function will hang up. The only way is to set a timeout and when it expires kill the app.
If you have workers it's not as simple as killing the app with process.exit(), so I made a module that does extacly what you're asking: grace.
You can hack some request tracking with the finish event on response:
var reqCount = 0;
var app = http.createServer(function (req, res) {
reqCount++;
res.on('finish', function() { reqCount--; });
res.end('Hello');
});
Allowing you to check whether reqCount is zero when you come to close the server.
The correct thing to do, though, is probably to not care about the old server and just start a new one. Usually the restart is to get new code, so you can start a fresh process without waiting for the old one to end, optionally using the child_process module to have a toplevel script managing the whole thing. Or even use the cluster module, allowing you to start the new process before you've even shut down the old one (since cluster manages balancing traffic between its child instances).
One thing I haven't actually tested very far, is whether it's guaranteed safe to start a new server as soon as server.close() returns. If not, then the new server could potentially fail to bind. There's an example in the server.listen() docs about how to handle such an EADDRINUSE error.

Resources