Node.js Kue - Pause workers in app with multiple instances - node.js

When using kue in an app that has multiple instances (say multiple containers in docker) that all use the same redis database, if you pause a worker, do you need to pause that worker on all instances or is that handled at the redis level and hence handled for you?
https://github.com/Automattic/kue#pause-processing
queue.process('email', function(job, ctx, done){
ctx.pause( 5000, function(err){
console.log("Worker is paused... ");
setTimeout( function(){ ctx.resume(); }, 10000 );
});
});
It'd be great if we didn't have to use any instance-to-instance communication to get all workers to pause.

From reading through the code, it appears that pausing a worker simply cleans up the redis client for that worker and stops listening for events from redis.
Here is the relevant code.
https://github.com/Automattic/kue/blob/master/lib/queue/worker.js#L288-L302
So there is nothing that would prevent another instance of the same application from continuing to process messages. Therefore, if you have multiple instances of the same application running, and you want to pause a worker, you must signal to every instance of the application that a pause event has occurred.
We're using the pubsub functionality in redis to signal to all workers that they should pause/resume.

Related

In Node js, what happens if a new request arrives and event loop is already busy processing a request?

I have this file named index.js:
const express = require('express')
const app = express()
const port = 3000
app.get('/home', (req, res) => {
res.send('Hello World!')
})
app.get('/route1', (req, res) => {
var num = 0;
for(var i=0; i<1000000; i++) {
num = num+1;
console.log(num);
}
res.send('This is Route1 '+ num)
})
app.listen(port, () => console.log(`Example app listening on port ${port}!`))
I first call the endpoint /route1 and then immediately the endpoint /home. The /route1 has for loop and takes some time to finish and then /home runs and finishes. My question is while app was busy processing /route1, how was the request to /home handled, given node js is single threaded?
The incoming request will be queued in the nodejs event queue until nodejs gets a chance to process the next event (when your long running event handler is done).
Since nodejs is an event-driven system, it gets an event from the event queue, runs that event's callback until completion, then gets the next event, runs it to completion and so on. The internals of nodejs add things that are waiting to be run to the event queue so they are queued up ready for the next cycle of the event loop.
Depending upon the internals of how nodejs does networking, the incoming request might be queued in the OS for a bit and then later moved to the event queue until nodejs gets a chance to serve that event.
My question is while app was busy processing /route1, how was the request to /home handled, given node js is single threaded?
Keep in mind that node.js runs your Javascript as single threaded (though we do now have Worker Threads if you want), but it does use threads internally to manage things like file I/O and some other types of asynchronous operations. It does not need threads for networking, though. That is managed with actual asynchronous interfaces from the OS.
Nodejs has event loop and event loop allows nodejs to perform non blocking I/O operation. Each event loop iteration is called a tick. There are different phases of the event loop.
First is timer phase, since there are no timers in your script event loop will go further to check I/O script.
When you hit route /route1, Node JS Web Server internally maintains a Limited Thread pool to provide services to the Client Requests. It will be placed in FIFO queue then event loop will go further to polling phase.
Polling phase will wait for pending I/O, which is route /route1. Even Loop checks any Client Request is placed in Event Queue. If no, then wait for incoming requests for indefinitely.
Meanwhile next I/O script arrives in FIFO queue which is route /home.
FIFO means, first in first out. Therefore first /route1 will get execute the route /home
Below you can see this via diagram.
A Node.js application runs on single thread and the event loop also runs on the same thread
Node.js internally uses the libuv library which is responsible for handling operating system related tasks, like asynchronous I/O based operation systems, networking, concurrency.
More info
Node has an internal thread pool from which a thread is assigned when a blocking(io or memeory or network) request is sent. If not, then the request is processed and sent back as such. If the thread pool is full, the request waits in the queue. Refer How, in general, does Node.js handle 10,000 concurrent requests? for more clear answers.

handling cluster modules in nodejs

I'm trying to learn cluster module and I come across this piece of code I just cant get my mind around it. First it fork childs with child_process module and there it use cluster.fork().process , I've used both cluster module and child_process in an express web-server separately i know cluster module works a load balancer.
But I cant get the idea of using them together. and there also something else, cluster is listening to those worker process and when ever a disconnect and possibly exit event is emitted to master it reforked a process , but here is the question lets assume email worker crashes and the master is going to fork it again how does it know it should fork email ? I mean shouldn't it pass an id which I cant see in this code.
var cluster = require("cluster");
const numCPUs = require("os").cpus().length;
if (cluster.isMaster) {
// fork child process for notif/sms/email worker
global.smsWorker = require("child_process").fork("./smsWorker");
global.emailWorker = require("child_process").fork("./emailWorker");
global.notifiWorker = require("child_process").fork("./notifWorker");
// fork application workers
for (var i = 0; i < numCPUs; i++) {
var worker = cluster.fork().process;
console.log("worker started. process id %s", worker.pid);
}
// if application worker gets disconnected, start new one.
cluster.on("disconnect", function(worker) {
console.error("Worker disconnect: " + worker.id);
var newWorker = cluster.fork().process;
console.log("Worker started. Process id %s", newWorker.pid);
});
} else {
callback(cluster);
}
but here is the question lets assume email worker crashes and the
master is going to fork it again how does it know it should fork email
? I mean shouldn't it pass an id which I cant see in this code.
The disconnect event it is listening to comes from the cluster-specific code, not a generic process listener. So, that disconnect event only fires when one of the cluster child processes exits. If you have some other child processes processing email, then when one of those crashes, it would not trigger this disconnect event. You would have to monitor that child_process yourself separately from within the code that started it.
You can see where the monitoring is for the cluster.on('disconnect', ...) event here in the cluster source code.
Also, I should mention that the cluster module is when you want pure horizontal scaling where all new processes are sharing the exact same work, each taking new incoming connections in turn. The cluster module is not for firing up a specific worker to carry out a specific task. For that, you would use either the Worker Threads module (to fire up a thread) or the child_process module (to fire up a new child process with a specific purpose)

Node cluster: Ensure request completes before disconnecting worker

I am running a cluster of workers with nodejs. There is a memory leak which, due to an old and unfamiliar codebase, I have decided to fix by periodically killing workers and replacing them with new ones rather than diagnosing. As such I'm essentially using
setTimeout(() => {worker.disconnect();}, INTERVAL);
when we spawn a worker.
However, I want to make sure that when a worker is killed, it completes any request it is currently processing prior to being disconnected, so that requests aren't dropped. From experimenting with the library, calling worker.disconnect() drops a currently-processing request, causing an "empty reply from server" error. I would rather not manually implement logic to detect if a server is currently processing a request (e.g. by maintaining a set of active requests or something), due to edge cases. Is there a "standard" way of telling a cluster worker to "wait until the current request completes, and then exit"?
So I have discovered something which seems to work, as far as I can tell. The strategy is to not have the master shut the worker down but instead let the worker shut itself down, after it has closed its server. The master sends it a message which the worker responds to. Something like (pseudocode)
if (cluster.isMaster) {
var worker = cluster.fork();
// in 10 seconds, tell the worker to shutdown
setTimeout(() => {worker.send('shutdown');}, 10000);
} else {
var server = createServer(); // whatever server setup
server.listen(PORT);
process.on('message', (msg) => {
if (msg === 'shutdown') {
// disconnect from the cluster after the server closes
server.close(() => {
cluster.worker.disconnect();
}
}
}
}

Does node js worker thread and rabbitmq consumer are same?

This is to gain more knowledge on how rabbitmq queuing and node js master/worker threads combined.
Node.js master worker threads are different then rabbitmq queuing as rabbitmq provides the facility to store the tasks into queue so that they can be consumed by a worker process when a worker is free. Combining these two would have very specific use cases and generally not needed.
There are couple of things required for the combined implementation of these two which mainly includes node-amqp client and cluster. Cluster is the default feature of node which provides the api for master worker threads. Without rabbitmq you would generally distribute the tasks using one master process i.e. send the task to all worker process and worker threads listens to receive the tasks.
Now since you want to use rabbitmq you have to first subscribe to a exchange to listen for all the tasks and when you receive the task you pass that to your worker process. Below is an small snippet to provide the gist of explaination.
connection.on('ready', function() {
connection.exchange('exchange-name', function(exchange) {
_exchange = exchange;
connection.queue('queue-name', function(queue) {
_queue = queue;
// Bind to the exchange
queue.bind('exchange-name', 'routing-key');
// Subscribe to the queue
queue
.subscribe(function(message) {
// When receiving the message call the worker thread to complete the task
console.log('Got message', message);
queue.shift(false, false);
})
.addCallback(function(res) {
// Hold on to the consumer tag so we can unsubscribe later
_consumerTag = res.consumerTag;
})
;
});
});
});
Message exchange between master and worker: Instead of directly sending message to master worker needs to put the success message to a queue. The master will listen to that queue to receive the acknowledgements and success messages.

How to wait for a Redis connection?

I'm currently trying to use Node.js Kue for processing jobs in a queue, but I believe I'm not doing it right.
Indeed the way I'm working now, I have two different services (which in this case I'm running with Docker Compose): one Web API built with Express with sends jobs to the queue and one processing module. The issue here is with the processing module.
I've coded it as follows:
var kue = require('kue');
var config = require('./config');
var queue = kue.createQueue({
prefix: config.redis.queuePrefix,
redis: {
port: config.redis.port,
host: config.redis.host
}
});
queue.process('jobType', function (job, done) {
// do processing here...
});
When we run this with Node, it sits there waiting for things to be placed on the queue to do the processing.
There are two issues however:
It needs that Redis be available before running this module. If we run this without Redis already available, it crashes because the host is not accessible and ends the process.
If Redis suddenly becomes unavailable, the processing module also crashes because it cannot stablish the connection and the process is killed.
How can I avoid these problems?
My guess is that I should somehow make the code "wait" for Redis, but I have no idea on how to do this.
How can this be done in this case?
You can use promise to wait until redis is loaded. Then run your module.
loadRedis().then(() => {
//load your module
})
Or you can use generator to "stop" until redis is loaded.
function*(){
const redisLoaded = yield loadRedis();
//load your module
}

Resources