I have a node backend taking HTTP requests using express. I am shutting down gracefully like this:
process.on( 'SIGINT', function() {
console.log("SIGINT signal received.");
server.close(function(err) {
if (err) {
console.error(err)
process.exit(1)
}
//Stop reoccurring tasks
//Close database connection
process.exit(0);
});
process.exit(0);
});
What I have is working fine, but I am concerned about my "Stop reoccurring tasks" step. Elsewhere in my code, I call a function that looks like this:
export async function launchSectionFinalizer() {
finalizeSections();
//1 hr * 60 min/hr * 60 s/min * 1,000 ms/s = 3,600,000 ms
return setInterval(finalizeSections, 3_6000_000);
}
Where finalizeSections is an async function that performs a series of database operations (postgres database).
My question is about the nature and behavior of setInterval. How can I make sure that finalizeSections isn't in the middle of its execution when I receive SIGINT? I'm worried that if my program receives SIGINT and closes the server at the wrong time it could catch finalizeSections in the middle of its operations. If that happens, I could end up with those database operations partially complete (ie if I execute a series of sql commands one after another, insert1, insert2, and insert3, I do not want to execute 1 and 2 without also executing 3).
I have done some googling and read something about how node will wait for all of its processes and events to complete before closing. Would that include waiting for my call to finalizeSections to complete?
Also, I am aware of clearInterval, but I am not sure if that function only stops the timer or if it will also cause node to wait for finalizeSections to complete.
Calling clearInterval will only cancel the timer and not wait for finalizeSections to finish.
Because your graceful shutdown calls process.exit(0) it will not wait for pending asynchronous tasks to finish and it will exit immediately:
Calling process.exit() will force the process to exit as quickly as possible even if there are still asynchronous operations pending that have not yet completed fully, including I/O operations to process.stdout and process.stderr
One way to solve this without using any packages is to save a reference to the promise returned by finalizeSections() and the intervalId returned by setInterval():
intervalId = setInterval(() => {
finalizeSectionsPromise = finalizeSections();
}, 3_6000_000)
Then in the shutdown code.
clearInterval(intervalId);
if (finalizeSectionsPromise) {
await finalizeSectionsPromise;
}
...
process.exit(0);
If you are able to use other packages I would use a job scheduling library like Agenda or Bull, or even cron jobs:
https://github.com/OptimalBits/bull
https://github.com/agenda/agenda
Also take a look a stoppable or terminus to gracefully shutdown servers without killing requests that are in-flight:
https://www.npmjs.com/package/stoppable
https://github.com/godaddy/terminus
Related
I have the following code...
async function finish(){
console.log("Finishing");
console.time("fin");
let test = await new Promise(function(res){
setTimeout(()=>{res(test)}, 2000);
});
console.timeEnd("fin");
console.log(test);
};
process.on('exit', finish);
I would expect this to wait two second on exit and print out a timestamp close to 2s. However, when I run the timestamp is shorter and doesn't print any line after Finishing.
How do I wait for a timeout on exit?
From the node docs, you cannot use asynchronous code in the exit event.
Listener functions must only perform synchronous operations. The Node.js process will exit immediately after calling the 'exit' event listeners causing any additional work still queued in the event loop to be abandoned.
If you want to schedule additional work before exiting (e.g. your asynchronous function), you need to use beforeExit.
process.on('beforeExit', finish);
Having said that, you'll also need to recognize that beforeExit is only emitted when the process is out of work to do, so a) it'll not emit if something explicitly calls for termination (e.g. process.exit()) and b) it'll keep emitting unless that happens.
Issue/ How I ran into it
I'm writing a batch processing Lambda function in Node.JS that makes calls to Redis in two different places. Some batch items may never reach the second Redis call. This is all happening asynchronously, so if i close the connection as soon as the batch queue is empty, any future Redis calls would fail. How do I close the connection?
What i've tried
process.on('beforeExit', callback) -- Doesn't get called as the event loop still contains the Redis connection
client.unref()
-- Closes connection if no commands are pending. Doesn't handle future calls.
client.on('idle', callback)
-- Works but is deprecated and may still miss future calls
What I'm currently doing
Once the batch queue is empty, I call:
intervalId = setInterval(closeRedis, 1000);
I close the Redis connection and clear the interval in the callback after a timeout:
function closeRedis() {
redis.client('list', (err, result) => {
var idle = parseClientList(result, 'idle');
if (idle > timeout) {
redis.quit();
clearInterval(intervalId);
}
});
}
This approach mostly works, but if just checking for a timeout, there is still a chance that other processes are going on and a Redis call may be made in the future. I'd like to close the connection when there's only an idle connection remaining in the event loop. Is there a way to do this?
I ended up using process._getActiveHandles(). Once the batch queue is empty, I set an interval to check every half a second if only the minimum processes remain. If so I unref the redisClient.
redisIntervalId = setInterval(closeRedis, 500);
// close redis client connection if it's the last required process
function closeRedis() {
// 2 core processes plus Redis and interval
var minimumProcesses = 4;
if (process._getActiveHandles().length > minimumProcesses)
return;
clearInterval(redisIntervalId);
redisClient.unref();
}
The advantage of this approach is that I can be sure Redis client will not close the connection while other important processes are running. I can also be sure that the client won't keep the event loop alive after all the important processes have been completed.
The downside is _getActiveHandles() is an undocumented node function so it may get changed or removed later. Also, unref() is experimental and doesn't consider some Redis commands when closing the connection.
In the following code:
function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
sleep(10000);
Why NodeJS doesn't exit immediately? What causes the Promise to be waited for?
I thought I needed await sleep(10000) - but this gives an error.
Nodejs waits for timers that are still active before exiting. You can tell it to NOT wait for a timer by calling .unref() on the timer object itself.
So, it's not actually the promise it is waiting for, but rather the timer it is waiting for.
Internally, node.js keeps a reference count of the number of open timers that have not been .unref() and will not exit until that count (among other things) gets to zero.
Here's a couple excerpts from the node.js doc for timers:
Class: Timeout
By default, when a timer is scheduled using either setTimeout() or setInterval(), the Node.js event loop will continue running as long as the timer is active. Each of the Timeout objects returned by these functions export both timeout.ref() and timeout.unref() functions that can be used to control this default behavior.
timeout.unref()
When called, the active Timeout object will not require the Node.js event loop to remain active. If there is no other activity keeping the event loop running, the process may exit before the Timeout object's callback is invoked. Calling timeout.unref() multiple times will have no effect.
Take a look at the unref() function for timers in node - https://nodejs.org/api/timers.html#timers_timeout_unref
When called, the active Timeout object will not require the Node.js event loop to remain active. If there is no other activity keeping the event loop running, the process may exit before the Timeout object's callback is invoked.
You can create a timeout and call the unref() function on it - this will prevent node from staying alive if the only thing it is waiting for is the timeout.
function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms).unref();
});
}
As a side note, the same unref function can also be used for setTimeout calls.
As correctly noted by jfriend00, it is not the promise that is keeping node alive, it is the timeout.
Assume makeBurger() will take 10 seconds
In synchronous program,
function serveBurger() {
makeBurger();
makeBurger();
console.log("READY") // Assume takes 5 seconds to log.
}
This will take a total of 25 seconds to execute.
So for NodeJs lets say we make an async version of makeBurgerAsync() which also takes 10 seconds.
function serveBurger() {
makeBurgerAsync(function(count) {
});
makeBurgerAsync(function(count) {
});
console.log("READY") // Assume takes 5 seconds to log.
}
Since it is a single thread. I have troubling imagine what is really going on behind the scene.
So for sure when the function run, both async functions will enter event loops and console.log("READY") will get executed straight away.
But while console.log("READY") is executing, no work is really done for both async function right? Since single thread is hogging console.log for 5 seconds.
After console.log is done. CPU will have time to switch between both async so that it can run a bit of each function each time.
So according to this, the function doesn't necessarily result in faster execution, async is probably slower due to switching between event loop? I imagine that, at the end of the day, everything will be spread on a single thread which will be the same thing as synchronous version?
I am probably missing some very big concept so please let me know. Thanks.
EDIT
It makes sense if the asynchronous operations are like query DB etc. Basically nodejs will just say "Hey DB handle this for me while I'll do something else". However, the case I am not understanding is the self-defined callback function within nodejs itself.
EDIT2
function makeBurger() {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
return count;
}
function makeBurgerAsync(callback) {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
callback(count);
}
In node.js, all asynchronous operations accomplish their tasks outside of the node.js Javascript single thread. They either use a native code thread (such as disk I/O in node.js) or they don't use a thread at all (such as event driven networking or timers).
You can't take a synchronous operation written entirely in node.js Javascript and magically make it asynchronous. An asynchronous operation is asynchronous because it calls some function that is implemented in native code and written in a way to actually be asynchronous. So, to make something asynchronous, it has to be specifically written to use lower level operations that are themselves asynchronous with an asynchronous native code implementation.
These out-of-band operations, then communicate with the main node.js Javascript thread via the event queue. When one of these asynchronous operations completes, it adds an event to the Javascript event queue and then when the single node.js thread finishes what it is currently doing, it grabs the next event from the event queue and calls the callback associated with that event.
Thus, you can have multiple asynchronous operations running in parallel. And running 3 operations in parallel will usually have a shorter end-to-end running time than running those same 3 operations in sequence.
Let's examine a real-world async situation rather than your pseudo-code:
function doSomething() {
fs.readFile(fname, function(err, data) {
console.log("file read");
});
setTimeout(function() {
console.log("timer fired");
}, 100);
http.get(someUrl, function(err, response, body) {
console.log("http get finished");
});
console.log("READY");
}
doSomething();
console.log("AFTER");
Here's what happens step-by-step:
fs.readFile() is initiated. Since node.js implements file I/O using a thread pool, this operation is passed off to a thread in node.js and it will run there in a separate thread.
Without waiting for fs.readFile() to finish, setTimeout() is called. This uses a timer sub-system in libuv (the cross platform library that node.js is built on). This is also non-blocking so the timer is registered and then execution continues.
http.get() is called. This will send the desired http request and then immediately return to further execution.
console.log("READY") will run.
The three asynchronous operations will complete in an indeterminate order (whichever one completes it's operation first will be done first). For purposes of this discussion, let's say the setTimeout() finishes first. When it finishes, some internals in node.js will insert an event in the event queue with the timer event and the registered callback. When the node.js main JS thread is done executing any other JS, it will grab the next event from the event queue and call the callback associated with it.
For purposes of this description, let's say that while that timer callback is executing, the fs.readFile() operation finishes. Using it's own thread, it will insert an event in the node.js event queue.
Now the setTimeout() callback finishes. At that point, the JS interpreter checks to see if there are any other events in the event queue. The fs.readfile() event is in the queue so it grabs that and calls the callback associated with that. That callback executes and finishes.
Some time later, the http.get() operation finishes. Internal to node.js, an event is added to the event queue. Since there is nothing else in the event queue and the JS interpreter is not currently executing, that event can immediately be serviced and the callback for the http.get() can get called.
Per the above sequence of events, you would see this in the console:
READY
AFTER
timer fired
file read
http get finished
Keep in mind that the order of the last three lines here is indeterminate (it's just based on unpredictable execution speed) so that precise order here is just an example. If you needed those to be executed in a specific order or needed to know when all three were done, then you would have to add additional code in order to track that.
Since it appears you are trying to make code run faster by making something asynchronous that isn't currently asynchronous, let me repeat. You can't take a synchronous operation written entirely in Javascript and "make it asynchronous". You'd have to rewrite it from scratch to use fundamentally different asynchronous lower level operations or you'd have to pass it off to some other process to execute and then get notified when it was done (using worker processes or external processes or native code plugins or something like that).
When implementing this code (example taken directly from https://github.com/brianc/node-postgres):
var pg = require('pg');
var conString = "tcp://postgres:1234#localhost/postgres";
pg.connect(conString, function(err, client) {
client.query("SELECT NOW() as when", function(err, result) {
console.log("Row count: %d",result.rows.length); // 1
console.log("Current year: %d", result.rows[0].when.getFullYear());
//Code halts here
});
});
After the last console.log, node hangs. I think this is because the asynchronous nature, and I suspect at this point, one should call a callback function.
I have two questions:
Is my thinking correct?
If my thinking is correct, then how does the mechanics work. I know NodeJS is using an event loop, but what is making this event loop halt at this point?
It appears to hang because the connection to Postgres is still open. Until it's closed, or "ended"...
client.end(); // Code halts here
Node will continue to wait in idle for another event to be added to the queue.
Not quite. This is a detail of node-postgres and its dependencies, not of Node or of its "asynchronous nature" in general.
The idling is due to and documented for the generic-pool module that node-postgres uses:
If you are shutting down a long-lived process, you may notice that node fails to exit for 30 seconds or so. This is a side effect of the idleTimeoutMillis behavior -- the pool has a setTimeout() call registered that is in the event loop queue, so node won't terminate until all resources have timed out, and the pool stops trying to manage them.
And, as it explains under Draining:
If you know would like to terminate all the resources in your pool before their timeouts have been reached, you can use destroyAllNow() in conjunction with drain():
pool.drain(function() {
pool.destroyAllNow();
});
One side-effect of calling drain() is that subsequent calls to acquire() will throw an Error.
Which is what pg.end() does and can certainly be done if your intention is to exit at the end of a serial application, such as unit testing or your given snippet.