Stop nodejs from garbage collection / automatic closing of File Descriptors - node.js

Consider a database engine, which operates on an externally opened file - like SQLite, except the file handle is passed to its constructor. I'm using a setup like this for my app, but can't seem to figure out why NodeJS insists on closing the file descriptor after 2 seconds of operation. I need that thing to stay open!
const db = await DB.open(await fs.promises.open('/path/to/db/file', 'r+'));
...
(node:100840) Warning: Closing file descriptor 19 on garbage collection
(Use `node --trace-warnings ...` to show where the warning was created)
(node:100840) [DEP0137] DeprecationWarning: Closing a FileHandle object on garbage collection is deprecated. Please close FileHandle objects explicitly using FileHandle.prototype.close(). In the future, an error will be thrown if a file descriptor is closed during garbage collection.
The class DB uses the provided file descriptors extensively, over an extended period of time, so it closing is rather annoying. In that class, I'm using methods such as readFile, createReadStream() and the readline module to step through the lines of the file. I'm passing { autoClose: false, emitClose: false } to any read/write streams I'm using, but to no avail.
Why is this happening?
How can I stop it?
Thanks

I suspect you're running into an evil problem in using await in this
for await (const line of readline.createInterface({input: file.createReadStream({start: 0, autoClose: false})}))
If you use await anywhere else in the for loop block (which you are), the underlying stream fires all its data events and finishes (while you are at the other await and, in some cases, your process even exits before you got to process any of the data or line events from the stream. This is a truly flawed design and has bitten many others.
The safest way around this is to not use the asyncIterator at all, and just wrap a promise yourself around the regular eveents from the readline object.

Close the file handle after waiting for any pending operation.
import { open } from 'fs/promises';
let filehandle;
try {
filehandle = await open('thefile.txt', 'r');
} finally {
await filehandle?.close();
}

Related

Nodejs "closing directory handle on garbage collection"

Нello, the following is an excerpt from my code:
let dirUtility = async (...args) => {
let dir = await require('fs').promises.opendir('/path/to/some/dir...');
let entries = dir.entries();
for await (let childDir of dir) doStuffWithChildDir(childDir);
return entries;
};
This function is called a fair bit in my code. I have the following in my logs:
(node:7920) Warning: Closing directory handle on garbage collection
(Use `node --trace-warnings ...` to show where the warning was created)
(node:7920) Warning: Closing directory handle on garbage collection
(node:7920) Warning: Closing directory handle on garbage collection
(node:7920) Warning: Closing directory handle on garbage collection
(node:7920) Warning: Closing directory handle on garbage collection
What exactly is the significance of these errors?
Do they indicate a large issue? (Should I simply seek to silence these errors?)
What is the best way to avoid this issue?
Thanks!
Raina77ow’s answer tells you why the warning is displayed.
Basically what's happening is that the NodeJS runtime is implicity calling the close() method on the dir object, but the best practice is that you would explicity call the close() method on the handle, or even better wrap it in a try..finally block.
Like this:
let dirUtility = async (...args) => {
let dir = await require('fs').promises.opendir('/path/to/some/dir...');
try {
let entries = dir.entries();
for await (let childDir of dir) doStuffWithChildDir(childDir);
return entries;
}
finally {
dir.close();
// return some dummy value, or return undefined.
}
};
Quoting the comments (source):
// If the close was successful, we still want to emit a process
// warning to notify that the file descriptor was gc'd. We want to be
// noisy about this because not explicitly closing the DirHandle is a
// bug.
While your code seems to be really similar to the code in this question, there's a difference:
let entries = dir.entries();
...
return entries;
That, in a nutshell, seems to create an additional iterator over directory, which is passed outside as the function's return value. How exactly this iterator is employed is not clear (as you don't show what happens next with dirUtility), but either it's not exhausted before GC takes its toll, or it's done in a way that confuses NodeJS.
Overall, the whole approach doesn't seem right to me: the function seems both to do something with a directory AND, essentially, give that directory back as its result, without actually caring how this object will be used. That, at least, looks like a leaky abstraction.
So it seems you need to decide: if you actually don't use the return value of dirUtility, just drop the corresponding lines of code. But if you actually do need to preserve the open directory (for example, for performance reasons), consider creating a stateful wrapper around it, encapsulating the value. That should prevent GC this handle, as long as the corresponding object lives in your code.

NodeJS and RethinkDB - How to handle connection interruption (connection retry) when listening for table changes (realtime)?

ReqlRuntimeError: Connection is closed in:
r.table("users").changes()
^^^^^^^^^^^^^^^^^^^^^^^^^^
at ReqlRuntimeError.ReqlError [as constructor] (/home/user/DEV/express-socketio/node_modules/rethinkdb/errors.js:23:13)
at new ReqlRuntimeError (/home/user/DEV/express-socketio/node_modules/rethinkdb/errors.js:90:51)
at mkErr (/home/user/DEV/express-socketio/node_modules/rethinkdb/util.js:177:10)
at Feed.IterableResult._promptNext (/home/user/DEV/express-socketio/node_modules/rethinkdb/cursor.js:169:16)
at Feed.IterableResult._addResponse (/home/user/DEV/express-socketio/node_modules/rethinkdb/cursor.js:84:12)
at TcpConnection.<anonymous> (/home/user/DEV/express-socketio/node_modules/rethinkdb/net.js:360:22)
at TcpConnection.cancel (/home/user/DEV/express-socketio/node_modules/rethinkdb/util.js:26:16)
at TcpConnection.cancel (/home/user/DEV/express-socketio/node_modules/rethinkdb/net.js:789:43)
at wrappedCb (/home/user/DEV/express-socketio/node_modules/rethinkdb/net.js:270:17)
at /home/user/DEV/express-socketio/node_modules/rethinkdb/net.js:280:18
at tryCatcher (/home/user/DEV/express-socketio/node_modules/bluebird/js/main/util.js:26:23)
at Promise._resolveFromResolver (/home/user/DEV/express-socketio/node_modules/bluebird/js/main/promise.js:483:31)
at new Promise (/home/user/DEV/express-socketio/node_modules/bluebird/js/main/promise.js:71:37)
at TcpConnection.<anonymous> (/home/user/DEV/express-socketio/node_modules/rethinkdb/net.js:264:33)
at TcpConnection.close (/home/user/DEV/express-socketio/node_modules/rethinkdb/util.js:43:16)
at /home/user/DEV/express-socketio/node_modules/rethinkdb/net.js:782:46
[ERROR] 22:55:08 ReqlRuntimeError: Connection is closed in:
r.table("users").changes()
^^^^^^^^^^^^^^^^^^^^^^^^^^
I had this error when executing a test as following:
My nodejs is listening for changes in a table (realtime),
then I simulate a connection interruption by turning off the rethinkdb docker container
and the error breaks the whole application.
I'm looking for how to handle this kind of error, so the application knows the connection was lost,
but, in some way, after a period of X minutes/seconds/etc, try to reconnect, or even restart the application, I don't know...
I found this https://github.com/rethinkdb/rethinkdb/issues/4886,
but nothing about avoiding app crash or trying to reconnect after a connection loss.
How can I proceed with this? Any thoughts?
Thanks in advance.
The example isn't that complete. Consider this code that might run when a websocket is opened on a server:
// write every change to file
function onChange(err, cursor) {
// log every change to console
// Change format: https://rethinkdb.com/docs/changefeeds/javascript/
cursor.each(console.log);
// if file not open, open file
openFile()
// write changes to file
cursor.each(writeChangesToFile)
// if you decide you no longer want these changes
if (stopChangesNow) {
cursor.close() // https://rethinkdb.com/api/javascript/#close-cursor
cancel()
}
}
// stop writing changes
function cancel(stream) {
// close file opened above
closeFile()
}
try {
r.table('users').changes().run(conn, onChange)
} catch(e) {
cancel()
}
You need the run() so your application can process the changes.
You might not need the cancel() function unless you need to cleanup a socket, output stream, file handles, etc. You need the try/catch to keep from crashing.
When a REST server crashes, it only terminates requests in progress.
When a websocket server crashes, it terminates all it's sessions causing problems for a lot more users.

fspromises.writeFile() Writes Empty File on process.exit()

I've been looking all over, but I can't seem to find the answer why I'm getting nothing in the file when exiting.
For context, I'm writing a discord bot. The bot stores its data once an hour. Sometime between stores I want to store the data in case I decide I want to update the bot. When I manually store the data with a command, then kill the process, things work fine. Now, I want to be able to just kill the process without having to manually send the command. So, I have a handler for SIGINT that stores the data the same way I was doing manually and after the promise is fulfilled, I exit. For some reason, the file contains nothing after the process ends. Here's the code (trimmed).
app.ts
function exit() {
client.users.fetch(OWNER)
.then(owner => owner.send('Rewards stored. Bot shutting down'))
.then(() => process.exit());
}
process.once('SIGINT', () => {
currencyService.storeRewards().then(exit);
});
process.once('exit', () => {
currencyService.storeRewards().then(exit);
});
currency.service.ts
private guildCurrencies: Map<string, Map<string, number>> = new Map<string, Map<string, number>>();
storeRewards(): Promise<void[]> {
const promises = new Array<Promise<void>>();
this.guildCurrencies.forEach((memberCurrencies, guildId) => {
promises.push(this.storageService.store(guildId, memberCurrencies));
})
return Promise.all(promises)
}
storage.service.ts
store(guild: string, currencies: Map<string, number>): Promise<void> {
return writeFile(`${this.storageLocation}/${guild}.json`, JSON.stringify([...currencies]))
.catch(err => {
console.error('could not store currencies', err);
})
}
So, as you can see, when SIGINT is received, I get the currency service to store its data, which maps guilds to guild member currencies which is a map of guild members to their rewards. It stores the data in different files (each guild gets its own file) using the storage service. The storage service returns a promise from writeFile (should be a promise of undefined when the file is finished writing). The currency service accumulates all the promises and returns a promise that resolves when all of the store promises resolve. Then, after all of the promises are resolved, a message is sent to the bot owner (me), which returns a promise. After that promise resolves, then we exit the process. It should be a clean exit with all the data written and the bot letting me know that it's shutting down, but when I read the file later, it's empty.
I've tried logging in all sorts of different places to make sure the steps are being done in the right order and I'm not getting weird async stuff, and everything seems to be proceeding as expected, but I'm still getting an empty file. I'm not sure what's going on, and I'd really appreciate some guidance.
EDIT: I remembered something else. As another debugging step, I tried reading the files after the currency service storeRewards() promise resolved, and the contents of the files were valid* (they contained valid data, but it was probably old data as the data doesn't change often). So, one of my thoughts is that the promise for writeFile resolves before the file is fully written, but that isn't indicated in the documentation.
EDIT 2: The answer was that I was writing twice. None of the code shown in the post or the first edit would have made it clear that I was having a double write issue, so I am adding the code causing the issue so that future readers can get the same conclusion.
Thanks to #leitning for their help finding the answer in the comments on my question. After writing a random UUID in the file name, I found the file was being written twice. I had assumed when asking the question, that I had shared all the relevant info, but I had missed something. process.once('exit', ...) was being called after calling process.exit() (more details here). The callback function for the exit event does not handle asynchronous calls. When the callback function returns, the process exits. Since I had duplicated the logic in the SIGINT callback function in the exit callback function, the file was being written a second time and the process was exiting before the file could be written, resulting in an empty file. Removing the process.once('exit', ...) logic fixed the issue.

Node.js: closing a file after writing

I'm currently getting a writable stream to a file using writer = fs.createWriteStream(url), doing a number of writes using writer.write(), and at the end I do writer.end(). I notice that when I do writer.end(), the file size is still zero, and remains at zero until the program terminates, at which point it reaches its correct size and the contents are visible and correct.
So it seems that writer.end() isn't closing the file.
The spec says "If autoClose is set to true (default behavior) on 'error' or 'finish' the file descriptor will be closed automatically."
I'm not sure what 'error' or 'finish' refer to here. Events presumably: but how do they relate to my call on writer.end()? Is there something I have to do to cause these events?
I would try getting a file descriptor directly and calling fd.close() explicitly to close it after writing, but I don't see a method to get a writeable stream given a file descriptor.
Any advice please?
When you call .write Node does not write immediately to the file, but it buffers all chunks until highWaterMark bytes are reached. At that point it will try to flush the contents to disk.
Reason why it's important to check .write return value, if false is returned it means that you need to wait until drain event is emitted, if you don't do this, you can exhaust the memory of the application, see:
why does attempting to write a large file cause js heap to run out of memory
The same happens for .end. It won't close the file immediately, first it will flush the buffer and after everything has been written into the file, it will close the fd.
So once you call .end you'll have to wait until finish event has been emitted.
The 'finish' event is emitted after the stream.end() method has been
called, and all data has been flushed to the underlying system.
const { once } = require('events');
const fs = require('fs');
const writer = fs.createWriteStream('/tmp/some-file');
// using top-level await, wrap in IIFE if you're running an older version
for(let i = 0; i < 10; i++) {
if(!writer.write('a'))
await once(writer, 'drain');
}
writer.end();
await once(writer, 'finish');
consle.log('File is closed and all data has been flushed');

Is there any advantage in nodejs calling an async function followed by an await versus a synchronous function?

I recently saw this kind of code in an asynchronous function:
async function foo() {
const readFile = util.promisify(fs.readFile);
const data = await readFile(file, 'utf8');
// .... other code
}
Is there any advantange of doing this over:
async function foo() {
const data = readFileSync(file, 'utf8');
// ... other code
}
?
In general I am interested in if there is any advantage over calling an asynchronous function followed by an immediate wait on the return, as opposed to calling the corresponding synchronous function without the wait.
And suppose this async function had already been wrapped say in a promise?
Note: some of the comments and answers refer to an earlier version of this code which was buggy and less clear about the intent.
The answer to your very valid question is that the synchronous methods will block the thread, whereas the async methods will allow other JavaScript to run while the operation takes place, giving your application more room to scale.
For example, if you wrote an http server and only have one request coming in per minute and never simultaneously, it wouldn't matter much what you do. The thread is being blocked for mere milliseconds and that work is unavoidable with respect to the request that is relying on it. However, if you have 100 requests per minute coming through, and maybe some other work being done on some cron schedule in the same JS app, then using the asynchronous methods would allow those requests to continue to come through your server code (JS) while the filesystem work is being done behind the scenes (in C++, I believe, but you'd have to google that part). When the thread is blocked by a synchronous function, then all of those requests and events pile up in the event queue waiting for the JS to handle them, even though the JS event loop probably isn't busy with anything directly. It's only waiting. That's not good, so take advantage of the asynchronous nature of single-threaded JS as much as possible.
With respect to error handling, the async versions of those functions have an error object (or null when successful) as the callback's first parameter, which is transferred to the .catch method when you promisify it. This allows for natural and built-in error handling. You will observe with the *Sync functions that there is no callback, and, if it errors, you will not receive that error in a helpful way (usually it crashes the app). Here are two examples to illustrate:
Unhelpful.js
const fs = require('fs')
const util = require('util')
const test = fs.readFileSync('./test.js') // spoiler alert: ./test.js doesn't exist
console.log('test:', test) // never runs, because the error thrown is not caught and the process exits :(
/*
Output:
fs.js:113
throw err;
^
Error: ENOENT: no such file or directory, open './test.js'
at Object.openSync (fs.js:434:3)
at Object.readFileSync (fs.js:339:35)
...etc...
*/
Better.js
const fs = require('fs')
const util = require('util')
const readFile = util.promisify(fs.readFile)
;(async function () {
// The catch handler actually returns the error object, so test will hold
// the value/result of the readFile operation whether it succeeds or fails
const test = await readFile('./test.js').catch(err => err instanceof Error ? err : new Error(err))
console.log('test:', test)
// helpful log, as a result:
// test: { [Error: ENOENT: no such file or directory, open './test.js'] errno: -2, code: 'ENOENT', syscall: 'open', path: './test.js' }
})()
So now that I think I understand the situation, let me try to explain my confusion.
I had thought V8 or nodejs maintained a list of runnable threads even if only one was ever running at a given time. Also I assued that that async functions created a new thread. Thus if that thread was blocked, no problem, because some other runabble thread would be swapped in.
From what people have posted, I now am given to believe that there is only one thread (at least visible to the programmer), so blocking anywhere (whether or not there are coroutines/async functions) blocks everywhere.

Resources