Asynchronous calls using postgres as an example in NodeJS - node.js

When implementing this code (example taken directly from https://github.com/brianc/node-postgres):
var pg = require('pg');
var conString = "tcp://postgres:1234#localhost/postgres";
pg.connect(conString, function(err, client) {
client.query("SELECT NOW() as when", function(err, result) {
console.log("Row count: %d",result.rows.length); // 1
console.log("Current year: %d", result.rows[0].when.getFullYear());
//Code halts here
});
});
After the last console.log, node hangs. I think this is because the asynchronous nature, and I suspect at this point, one should call a callback function.
I have two questions:
Is my thinking correct?
If my thinking is correct, then how does the mechanics work. I know NodeJS is using an event loop, but what is making this event loop halt at this point?

It appears to hang because the connection to Postgres is still open. Until it's closed, or "ended"...
client.end(); // Code halts here
Node will continue to wait in idle for another event to be added to the queue.
Not quite. This is a detail of node-postgres and its dependencies, not of Node or of its "asynchronous nature" in general.
The idling is due to and documented for the generic-pool module that node-postgres uses:
If you are shutting down a long-lived process, you may notice that node fails to exit for 30 seconds or so. This is a side effect of the idleTimeoutMillis behavior -- the pool has a setTimeout() call registered that is in the event loop queue, so node won't terminate until all resources have timed out, and the pool stops trying to manage them.
And, as it explains under Draining:
If you know would like to terminate all the resources in your pool before their timeouts have been reached, you can use destroyAllNow() in conjunction with drain():
pool.drain(function() {
pool.destroyAllNow();
});
One side-effect of calling drain() is that subsequent calls to acquire() will throw an Error.
Which is what pg.end() does and can certainly be done if your intention is to exit at the end of a serial application, such as unit testing or your given snippet.

Related

How should a NodeJs "graceful shutdown" handle setInterval?

I have a node backend taking HTTP requests using express. I am shutting down gracefully like this:
process.on( 'SIGINT', function() {
console.log("SIGINT signal received.");
server.close(function(err) {
if (err) {
console.error(err)
process.exit(1)
}
//Stop reoccurring tasks
//Close database connection
process.exit(0);
});
process.exit(0);
});
What I have is working fine, but I am concerned about my "Stop reoccurring tasks" step. Elsewhere in my code, I call a function that looks like this:
export async function launchSectionFinalizer() {
finalizeSections();
//1 hr * 60 min/hr * 60 s/min * 1,000 ms/s = 3,600,000 ms
return setInterval(finalizeSections, 3_6000_000);
}
Where finalizeSections is an async function that performs a series of database operations (postgres database).
My question is about the nature and behavior of setInterval. How can I make sure that finalizeSections isn't in the middle of its execution when I receive SIGINT? I'm worried that if my program receives SIGINT and closes the server at the wrong time it could catch finalizeSections in the middle of its operations. If that happens, I could end up with those database operations partially complete (ie if I execute a series of sql commands one after another, insert1, insert2, and insert3, I do not want to execute 1 and 2 without also executing 3).
I have done some googling and read something about how node will wait for all of its processes and events to complete before closing. Would that include waiting for my call to finalizeSections to complete?
Also, I am aware of clearInterval, but I am not sure if that function only stops the timer or if it will also cause node to wait for finalizeSections to complete.
Calling clearInterval will only cancel the timer and not wait for finalizeSections to finish.
Because your graceful shutdown calls process.exit(0) it will not wait for pending asynchronous tasks to finish and it will exit immediately:
Calling process.exit() will force the process to exit as quickly as possible even if there are still asynchronous operations pending that have not yet completed fully, including I/O operations to process.stdout and process.stderr
One way to solve this without using any packages is to save a reference to the promise returned by finalizeSections() and the intervalId returned by setInterval():
intervalId = setInterval(() => {
finalizeSectionsPromise = finalizeSections();
}, 3_6000_000)
Then in the shutdown code.
clearInterval(intervalId);
if (finalizeSectionsPromise) {
await finalizeSectionsPromise;
}
...
process.exit(0);
If you are able to use other packages I would use a job scheduling library like Agenda or Bull, or even cron jobs:
https://github.com/OptimalBits/bull
https://github.com/agenda/agenda
Also take a look a stoppable or terminus to gracefully shutdown servers without killing requests that are in-flight:
https://www.npmjs.com/package/stoppable
https://github.com/godaddy/terminus

Are Database connections asynchronous in Node?

Node is single-threaded, but there are a lot of functions(modules like http, fs) that allow us to do a background task and the event loop takes care of executing the callbacks.
However, is this true for a database connection?
Let's say I have the following code.
const mysql = require('mysql');
function callDatabase(id) {
var result;
var connection = mysql.createConnection(
{
host : '192.168.1.14',
user : 'root',
password : '',
database : 'test'
}
);
connection.connect();
var queryString = 'SELECT name FROM test WHERE id = 1';
connection.query(queryString, function(err, rows, fields) {
if (err) throw err;
for (var i in rows) {
result = rows[i].name;
}
connection.end();
return result;
});
}
Does, mysql.createConnection, connection.connect, connection.query, connection.end spin up a new thread to execute in the background, leaving Node to run the remaining synchronous code?
If yes, in what queue will the callback be enqueued and how to write this sort of code such that a background task is initiated.
Anything that may be blocking (file system operations, network connections, etc) are generally asynchronous in Node, in order to avoid blocking on the main thread. That these functions take a parameter for a callback function is a sure hint that you have asynchronous operations (or "background tasks") in progress.
You don't show it in your sample code, but connect() and end() do take callback functions so you know when a connection is actually made or ends. It looks like the mysql library, however, also maintains an internal queue to make sure you can't attempt a query until a connection has been made and that only one operation at a time can be executed.
Note that createConnection() does not have a callback function. All it does is create a new data structure (connection) that gets used. It doesn't do any I/O itself, so doesn't need to run asynchronously.
Also note that you don't generally "spin up" your own threads. Node takes care of this thread management for you (largely by running things on the main worker thread), for the most part, and hides how threads themselves work for most developers. You typically hear that Node is "single threaded", and you should treat it this way.
Modern Node code makes extensive use of async/await and Promises to do this sort of thing. Slightly older code uses callback functions. Even older code uses Node events. In reality - if you dig far enough down, they're all using events and possibly presenting the simplified (more modern) interfaces.
The mysql module appears to date from the "callback" era and hasn't yet been updated for Promises/async/await. Under the covers, as noted, it uses Node events to track network (or unix domain socket) connections and transfers.

AWS Lambda times out after running successfully

I created a Node.js Lambda Function for AWS using the Serverless framework for increasing different counters in a Postgres database based on event parameters. The function itself runs without any errors when invoking with serverless invoke local, it runs and works as expected, however, when invoked from Java, while it should finish and return, it simply times out.
I've tried several things including waiting for the Postgres pool to close, increasing timeout, returning with the callback function (which I thing is a good practice nevertheless as it makes more clear that the function ends there), and using promise chains instead of async-await, with no luck. The real question is if it's just how it works and I have to always add callbackWaitsForEmptyEventLoop(false) or is there a more elegant solution? I even tried the why-is-node-running package, and it says that 4 handles are keeping the process running, a TCPWRAP, a Timeout, and two TickObjects. I'm almost sure that node-postgres is causing this as I created multiple lambda functions suffering from the same issue.
// These are the last lines of the handler function
const insertQueries = [
// Multiple queries using a node-postgres pool, e.g.
// pool.query(...);
];
try {
await Promise.all(insertQueries);
} catch(err) {
return callback('Couldn\'t insert API stats: ' + err);
}
return callback(null, 'API stats inserted successfully!');
The AWS Java SDK only prints a debug message telling me that task timed out after 10.01 seconds (serverless.yml has 10 seconds set).

Close a Redis connection based on Node.js Event loop

Issue/ How I ran into it
I'm writing a batch processing Lambda function in Node.JS that makes calls to Redis in two different places. Some batch items may never reach the second Redis call. This is all happening asynchronously, so if i close the connection as soon as the batch queue is empty, any future Redis calls would fail. How do I close the connection?
What i've tried
process.on('beforeExit', callback) -- Doesn't get called as the event loop still contains the Redis connection
client.unref()
-- Closes connection if no commands are pending. Doesn't handle future calls.
client.on('idle', callback)
-- Works but is deprecated and may still miss future calls
What I'm currently doing
Once the batch queue is empty, I call:
intervalId = setInterval(closeRedis, 1000);
I close the Redis connection and clear the interval in the callback after a timeout:
function closeRedis() {
redis.client('list', (err, result) => {
var idle = parseClientList(result, 'idle');
if (idle > timeout) {
redis.quit();
clearInterval(intervalId);
}
});
}
This approach mostly works, but if just checking for a timeout, there is still a chance that other processes are going on and a Redis call may be made in the future. I'd like to close the connection when there's only an idle connection remaining in the event loop. Is there a way to do this?
I ended up using process._getActiveHandles(). Once the batch queue is empty, I set an interval to check every half a second if only the minimum processes remain. If so I unref the redisClient.
redisIntervalId = setInterval(closeRedis, 500);
// close redis client connection if it's the last required process
function closeRedis() {
// 2 core processes plus Redis and interval
var minimumProcesses = 4;
if (process._getActiveHandles().length > minimumProcesses)
return;
clearInterval(redisIntervalId);
redisClient.unref();
}
The advantage of this approach is that I can be sure Redis client will not close the connection while other important processes are running. I can also be sure that the client won't keep the event loop alive after all the important processes have been completed.
The downside is _getActiveHandles() is an undocumented node function so it may get changed or removed later. Also, unref() is experimental and doesn't consider some Redis commands when closing the connection.

Single thread synchronous and asynchronous confusion

Assume makeBurger() will take 10 seconds
In synchronous program,
function serveBurger() {
makeBurger();
makeBurger();
console.log("READY") // Assume takes 5 seconds to log.
}
This will take a total of 25 seconds to execute.
So for NodeJs lets say we make an async version of makeBurgerAsync() which also takes 10 seconds.
function serveBurger() {
makeBurgerAsync(function(count) {
});
makeBurgerAsync(function(count) {
});
console.log("READY") // Assume takes 5 seconds to log.
}
Since it is a single thread. I have troubling imagine what is really going on behind the scene.
So for sure when the function run, both async functions will enter event loops and console.log("READY") will get executed straight away.
But while console.log("READY") is executing, no work is really done for both async function right? Since single thread is hogging console.log for 5 seconds.
After console.log is done. CPU will have time to switch between both async so that it can run a bit of each function each time.
So according to this, the function doesn't necessarily result in faster execution, async is probably slower due to switching between event loop? I imagine that, at the end of the day, everything will be spread on a single thread which will be the same thing as synchronous version?
I am probably missing some very big concept so please let me know. Thanks.
EDIT
It makes sense if the asynchronous operations are like query DB etc. Basically nodejs will just say "Hey DB handle this for me while I'll do something else". However, the case I am not understanding is the self-defined callback function within nodejs itself.
EDIT2
function makeBurger() {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
return count;
}
function makeBurgerAsync(callback) {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
callback(count);
}
In node.js, all asynchronous operations accomplish their tasks outside of the node.js Javascript single thread. They either use a native code thread (such as disk I/O in node.js) or they don't use a thread at all (such as event driven networking or timers).
You can't take a synchronous operation written entirely in node.js Javascript and magically make it asynchronous. An asynchronous operation is asynchronous because it calls some function that is implemented in native code and written in a way to actually be asynchronous. So, to make something asynchronous, it has to be specifically written to use lower level operations that are themselves asynchronous with an asynchronous native code implementation.
These out-of-band operations, then communicate with the main node.js Javascript thread via the event queue. When one of these asynchronous operations completes, it adds an event to the Javascript event queue and then when the single node.js thread finishes what it is currently doing, it grabs the next event from the event queue and calls the callback associated with that event.
Thus, you can have multiple asynchronous operations running in parallel. And running 3 operations in parallel will usually have a shorter end-to-end running time than running those same 3 operations in sequence.
Let's examine a real-world async situation rather than your pseudo-code:
function doSomething() {
fs.readFile(fname, function(err, data) {
console.log("file read");
});
setTimeout(function() {
console.log("timer fired");
}, 100);
http.get(someUrl, function(err, response, body) {
console.log("http get finished");
});
console.log("READY");
}
doSomething();
console.log("AFTER");
Here's what happens step-by-step:
fs.readFile() is initiated. Since node.js implements file I/O using a thread pool, this operation is passed off to a thread in node.js and it will run there in a separate thread.
Without waiting for fs.readFile() to finish, setTimeout() is called. This uses a timer sub-system in libuv (the cross platform library that node.js is built on). This is also non-blocking so the timer is registered and then execution continues.
http.get() is called. This will send the desired http request and then immediately return to further execution.
console.log("READY") will run.
The three asynchronous operations will complete in an indeterminate order (whichever one completes it's operation first will be done first). For purposes of this discussion, let's say the setTimeout() finishes first. When it finishes, some internals in node.js will insert an event in the event queue with the timer event and the registered callback. When the node.js main JS thread is done executing any other JS, it will grab the next event from the event queue and call the callback associated with it.
For purposes of this description, let's say that while that timer callback is executing, the fs.readFile() operation finishes. Using it's own thread, it will insert an event in the node.js event queue.
Now the setTimeout() callback finishes. At that point, the JS interpreter checks to see if there are any other events in the event queue. The fs.readfile() event is in the queue so it grabs that and calls the callback associated with that. That callback executes and finishes.
Some time later, the http.get() operation finishes. Internal to node.js, an event is added to the event queue. Since there is nothing else in the event queue and the JS interpreter is not currently executing, that event can immediately be serviced and the callback for the http.get() can get called.
Per the above sequence of events, you would see this in the console:
READY
AFTER
timer fired
file read
http get finished
Keep in mind that the order of the last three lines here is indeterminate (it's just based on unpredictable execution speed) so that precise order here is just an example. If you needed those to be executed in a specific order or needed to know when all three were done, then you would have to add additional code in order to track that.
Since it appears you are trying to make code run faster by making something asynchronous that isn't currently asynchronous, let me repeat. You can't take a synchronous operation written entirely in Javascript and "make it asynchronous". You'd have to rewrite it from scratch to use fundamentally different asynchronous lower level operations or you'd have to pass it off to some other process to execute and then get notified when it was done (using worker processes or external processes or native code plugins or something like that).

Resources