Close a Redis connection based on Node.js Event loop - node.js

Issue/ How I ran into it
I'm writing a batch processing Lambda function in Node.JS that makes calls to Redis in two different places. Some batch items may never reach the second Redis call. This is all happening asynchronously, so if i close the connection as soon as the batch queue is empty, any future Redis calls would fail. How do I close the connection?
What i've tried
process.on('beforeExit', callback) -- Doesn't get called as the event loop still contains the Redis connection
client.unref()
-- Closes connection if no commands are pending. Doesn't handle future calls.
client.on('idle', callback)
-- Works but is deprecated and may still miss future calls
What I'm currently doing
Once the batch queue is empty, I call:
intervalId = setInterval(closeRedis, 1000);
I close the Redis connection and clear the interval in the callback after a timeout:
function closeRedis() {
redis.client('list', (err, result) => {
var idle = parseClientList(result, 'idle');
if (idle > timeout) {
redis.quit();
clearInterval(intervalId);
}
});
}
This approach mostly works, but if just checking for a timeout, there is still a chance that other processes are going on and a Redis call may be made in the future. I'd like to close the connection when there's only an idle connection remaining in the event loop. Is there a way to do this?

I ended up using process._getActiveHandles(). Once the batch queue is empty, I set an interval to check every half a second if only the minimum processes remain. If so I unref the redisClient.
redisIntervalId = setInterval(closeRedis, 500);
// close redis client connection if it's the last required process
function closeRedis() {
// 2 core processes plus Redis and interval
var minimumProcesses = 4;
if (process._getActiveHandles().length > minimumProcesses)
return;
clearInterval(redisIntervalId);
redisClient.unref();
}
The advantage of this approach is that I can be sure Redis client will not close the connection while other important processes are running. I can also be sure that the client won't keep the event loop alive after all the important processes have been completed.
The downside is _getActiveHandles() is an undocumented node function so it may get changed or removed later. Also, unref() is experimental and doesn't consider some Redis commands when closing the connection.

Related

How should a NodeJs "graceful shutdown" handle setInterval?

I have a node backend taking HTTP requests using express. I am shutting down gracefully like this:
process.on( 'SIGINT', function() {
console.log("SIGINT signal received.");
server.close(function(err) {
if (err) {
console.error(err)
process.exit(1)
}
//Stop reoccurring tasks
//Close database connection
process.exit(0);
});
process.exit(0);
});
What I have is working fine, but I am concerned about my "Stop reoccurring tasks" step. Elsewhere in my code, I call a function that looks like this:
export async function launchSectionFinalizer() {
finalizeSections();
//1 hr * 60 min/hr * 60 s/min * 1,000 ms/s = 3,600,000 ms
return setInterval(finalizeSections, 3_6000_000);
}
Where finalizeSections is an async function that performs a series of database operations (postgres database).
My question is about the nature and behavior of setInterval. How can I make sure that finalizeSections isn't in the middle of its execution when I receive SIGINT? I'm worried that if my program receives SIGINT and closes the server at the wrong time it could catch finalizeSections in the middle of its operations. If that happens, I could end up with those database operations partially complete (ie if I execute a series of sql commands one after another, insert1, insert2, and insert3, I do not want to execute 1 and 2 without also executing 3).
I have done some googling and read something about how node will wait for all of its processes and events to complete before closing. Would that include waiting for my call to finalizeSections to complete?
Also, I am aware of clearInterval, but I am not sure if that function only stops the timer or if it will also cause node to wait for finalizeSections to complete.
Calling clearInterval will only cancel the timer and not wait for finalizeSections to finish.
Because your graceful shutdown calls process.exit(0) it will not wait for pending asynchronous tasks to finish and it will exit immediately:
Calling process.exit() will force the process to exit as quickly as possible even if there are still asynchronous operations pending that have not yet completed fully, including I/O operations to process.stdout and process.stderr
One way to solve this without using any packages is to save a reference to the promise returned by finalizeSections() and the intervalId returned by setInterval():
intervalId = setInterval(() => {
finalizeSectionsPromise = finalizeSections();
}, 3_6000_000)
Then in the shutdown code.
clearInterval(intervalId);
if (finalizeSectionsPromise) {
await finalizeSectionsPromise;
}
...
process.exit(0);
If you are able to use other packages I would use a job scheduling library like Agenda or Bull, or even cron jobs:
https://github.com/OptimalBits/bull
https://github.com/agenda/agenda
Also take a look a stoppable or terminus to gracefully shutdown servers without killing requests that are in-flight:
https://www.npmjs.com/package/stoppable
https://github.com/godaddy/terminus

Are Database connections asynchronous in Node?

Node is single-threaded, but there are a lot of functions(modules like http, fs) that allow us to do a background task and the event loop takes care of executing the callbacks.
However, is this true for a database connection?
Let's say I have the following code.
const mysql = require('mysql');
function callDatabase(id) {
var result;
var connection = mysql.createConnection(
{
host : '192.168.1.14',
user : 'root',
password : '',
database : 'test'
}
);
connection.connect();
var queryString = 'SELECT name FROM test WHERE id = 1';
connection.query(queryString, function(err, rows, fields) {
if (err) throw err;
for (var i in rows) {
result = rows[i].name;
}
connection.end();
return result;
});
}
Does, mysql.createConnection, connection.connect, connection.query, connection.end spin up a new thread to execute in the background, leaving Node to run the remaining synchronous code?
If yes, in what queue will the callback be enqueued and how to write this sort of code such that a background task is initiated.
Anything that may be blocking (file system operations, network connections, etc) are generally asynchronous in Node, in order to avoid blocking on the main thread. That these functions take a parameter for a callback function is a sure hint that you have asynchronous operations (or "background tasks") in progress.
You don't show it in your sample code, but connect() and end() do take callback functions so you know when a connection is actually made or ends. It looks like the mysql library, however, also maintains an internal queue to make sure you can't attempt a query until a connection has been made and that only one operation at a time can be executed.
Note that createConnection() does not have a callback function. All it does is create a new data structure (connection) that gets used. It doesn't do any I/O itself, so doesn't need to run asynchronously.
Also note that you don't generally "spin up" your own threads. Node takes care of this thread management for you (largely by running things on the main worker thread), for the most part, and hides how threads themselves work for most developers. You typically hear that Node is "single threaded", and you should treat it this way.
Modern Node code makes extensive use of async/await and Promises to do this sort of thing. Slightly older code uses callback functions. Even older code uses Node events. In reality - if you dig far enough down, they're all using events and possibly presenting the simplified (more modern) interfaces.
The mysql module appears to date from the "callback" era and hasn't yet been updated for Promises/async/await. Under the covers, as noted, it uses Node events to track network (or unix domain socket) connections and transfers.

NodeJs: How to handle a very high amount of timers?

I am using socket.io to send packets via websockets. They seem to disappear from time to time. So I have to implement some sort of acknowledge-system. My Idea was to immediatelly respond to a packet with an ACK-packet. If the server does not receive this ACK-packet within a given time, he will resend it (up to 3 times, then disconnect the socket).
My first thought was to start a timer (setTimeout) after sending a packet. if the timeout-event occurs, the packet has to be sent again. If the ACK will arrive, the timeout will get deleted. Quite easy and short.
var io = require('socket.io').listen(80);
// ... connection handling ...
function sendData(someData, socket) {
// TODO: Some kind of counter to stop after 3 tries.
socket.emit("someEvent", someData);
var timeout = setTimeout(function(){ sendData(someData, socket); }, 2000);
socket.on("ack", function(){
// Everything went ok.
clearTimeout(timeout);
});
}
But I will have 1k-3k clients connected with much traffic. I can't imagine, that 10k timers running at the same time are handlable by NodeJS. Even worse: I read that NodeJS will not fire the event if there is no time for it.
How to implement a good working and efficient packet acknowledge system?
If socket.io is not reliable enough for you, you might want to consider implementing your own websocket interface instead of adding a layer on top of socket.io. But to answer your question, I don't think running 10k timers is going to be a big deal. For example, the following code ran in under 3 seconds for me and printed out the expected result of 100000:
var x = 0;
for (var i = 0; i < 100000; i++) {
setTimeout(function() { x++; }, 1000);
}
setTimeout(function() { console.log(x); }, 2000);
There isn't actually that much overhead for a timeout; it essentially just gets put in a queue until it's time to execute it.
I read that NodeJS will not fire the event if there is no time for it.
This is a bit of an exaggeration, node.js timers are reliable. A timer set by setTimeout will fire at some point. It may be delayed if the process is busy at the exact scheduled time, but the callback will be called eventually.
Quoted from Node.js docs for setTimeout:
The callback will likely not be invoked in precisely delay milliseconds. Node.js makes no guarantees about the exact timing of when callbacks will fire, nor of their ordering. The callback will be called as close as possible to the time specified.

Asynchronous calls using postgres as an example in NodeJS

When implementing this code (example taken directly from https://github.com/brianc/node-postgres):
var pg = require('pg');
var conString = "tcp://postgres:1234#localhost/postgres";
pg.connect(conString, function(err, client) {
client.query("SELECT NOW() as when", function(err, result) {
console.log("Row count: %d",result.rows.length); // 1
console.log("Current year: %d", result.rows[0].when.getFullYear());
//Code halts here
});
});
After the last console.log, node hangs. I think this is because the asynchronous nature, and I suspect at this point, one should call a callback function.
I have two questions:
Is my thinking correct?
If my thinking is correct, then how does the mechanics work. I know NodeJS is using an event loop, but what is making this event loop halt at this point?
It appears to hang because the connection to Postgres is still open. Until it's closed, or "ended"...
client.end(); // Code halts here
Node will continue to wait in idle for another event to be added to the queue.
Not quite. This is a detail of node-postgres and its dependencies, not of Node or of its "asynchronous nature" in general.
The idling is due to and documented for the generic-pool module that node-postgres uses:
If you are shutting down a long-lived process, you may notice that node fails to exit for 30 seconds or so. This is a side effect of the idleTimeoutMillis behavior -- the pool has a setTimeout() call registered that is in the event loop queue, so node won't terminate until all resources have timed out, and the pool stops trying to manage them.
And, as it explains under Draining:
If you know would like to terminate all the resources in your pool before their timeouts have been reached, you can use destroyAllNow() in conjunction with drain():
pool.drain(function() {
pool.destroyAllNow();
});
One side-effect of calling drain() is that subsequent calls to acquire() will throw an Error.
Which is what pg.end() does and can certainly be done if your intention is to exit at the end of a serial application, such as unit testing or your given snippet.

Everytime a new socket connects, my process.nextTick() is disrupted

For some reason, every time a new socket connects to the server, my emitDraw function stalls midway through a draw(the process.nextTick() call cycle gets broken unexpectedly). Is there anyway to keep my emitDraw function running while also accepting new connections?
io.sockets.on('connection', function(socket) {
socket.on('drawing', function() {
//some code
emitDraw(socket);
});
});
function emitDraw(socket) {
//some code
process.nextTick(function(){emitDraw(socket)});
};
Thanks
process.nextTick() merely helps pushing code off the current executing stack to the top of the next one. Callbacks for events fired between these two stack executions are therefore allowed execution in the next stack.
In the current case, a normal executing stack only has emitDraw() executing and the call to process.nextTick() allows for events (notably the socket connection) to fire and execute in the next stack. Once the socket connection is fired the stack has two sets of code to execute, the callback associated with socket.on(connection) and emitDraw().
If emitDraw() turns unresponsive, it only means that it cannot co-execute on the current stack without performance degradation.
Since the main Node.js process runs on a single thread you cannot expect any better performance by keeping it in the same thread, except by increasing processing power. Alternatively you could fork emitDraw() to a child process so that it can execute independent of the main process.

Resources