Nodejs "closing directory handle on garbage collection" - node.js

Нello, the following is an excerpt from my code:
let dirUtility = async (...args) => {
let dir = await require('fs').promises.opendir('/path/to/some/dir...');
let entries = dir.entries();
for await (let childDir of dir) doStuffWithChildDir(childDir);
return entries;
};
This function is called a fair bit in my code. I have the following in my logs:
(node:7920) Warning: Closing directory handle on garbage collection
(Use `node --trace-warnings ...` to show where the warning was created)
(node:7920) Warning: Closing directory handle on garbage collection
(node:7920) Warning: Closing directory handle on garbage collection
(node:7920) Warning: Closing directory handle on garbage collection
(node:7920) Warning: Closing directory handle on garbage collection
What exactly is the significance of these errors?
Do they indicate a large issue? (Should I simply seek to silence these errors?)
What is the best way to avoid this issue?
Thanks!

Raina77ow’s answer tells you why the warning is displayed.
Basically what's happening is that the NodeJS runtime is implicity calling the close() method on the dir object, but the best practice is that you would explicity call the close() method on the handle, or even better wrap it in a try..finally block.
Like this:
let dirUtility = async (...args) => {
let dir = await require('fs').promises.opendir('/path/to/some/dir...');
try {
let entries = dir.entries();
for await (let childDir of dir) doStuffWithChildDir(childDir);
return entries;
}
finally {
dir.close();
// return some dummy value, or return undefined.
}
};

Quoting the comments (source):
// If the close was successful, we still want to emit a process
// warning to notify that the file descriptor was gc'd. We want to be
// noisy about this because not explicitly closing the DirHandle is a
// bug.
While your code seems to be really similar to the code in this question, there's a difference:
let entries = dir.entries();
...
return entries;
That, in a nutshell, seems to create an additional iterator over directory, which is passed outside as the function's return value. How exactly this iterator is employed is not clear (as you don't show what happens next with dirUtility), but either it's not exhausted before GC takes its toll, or it's done in a way that confuses NodeJS.
Overall, the whole approach doesn't seem right to me: the function seems both to do something with a directory AND, essentially, give that directory back as its result, without actually caring how this object will be used. That, at least, looks like a leaky abstraction.
So it seems you need to decide: if you actually don't use the return value of dirUtility, just drop the corresponding lines of code. But if you actually do need to preserve the open directory (for example, for performance reasons), consider creating a stateful wrapper around it, encapsulating the value. That should prevent GC this handle, as long as the corresponding object lives in your code.

Related

Stop nodejs from garbage collection / automatic closing of File Descriptors

Consider a database engine, which operates on an externally opened file - like SQLite, except the file handle is passed to its constructor. I'm using a setup like this for my app, but can't seem to figure out why NodeJS insists on closing the file descriptor after 2 seconds of operation. I need that thing to stay open!
const db = await DB.open(await fs.promises.open('/path/to/db/file', 'r+'));
...
(node:100840) Warning: Closing file descriptor 19 on garbage collection
(Use `node --trace-warnings ...` to show where the warning was created)
(node:100840) [DEP0137] DeprecationWarning: Closing a FileHandle object on garbage collection is deprecated. Please close FileHandle objects explicitly using FileHandle.prototype.close(). In the future, an error will be thrown if a file descriptor is closed during garbage collection.
The class DB uses the provided file descriptors extensively, over an extended period of time, so it closing is rather annoying. In that class, I'm using methods such as readFile, createReadStream() and the readline module to step through the lines of the file. I'm passing { autoClose: false, emitClose: false } to any read/write streams I'm using, but to no avail.
Why is this happening?
How can I stop it?
Thanks
I suspect you're running into an evil problem in using await in this
for await (const line of readline.createInterface({input: file.createReadStream({start: 0, autoClose: false})}))
If you use await anywhere else in the for loop block (which you are), the underlying stream fires all its data events and finishes (while you are at the other await and, in some cases, your process even exits before you got to process any of the data or line events from the stream. This is a truly flawed design and has bitten many others.
The safest way around this is to not use the asyncIterator at all, and just wrap a promise yourself around the regular eveents from the readline object.
Close the file handle after waiting for any pending operation.
import { open } from 'fs/promises';
let filehandle;
try {
filehandle = await open('thefile.txt', 'r');
} finally {
await filehandle?.close();
}

fspromises.writeFile() Writes Empty File on process.exit()

I've been looking all over, but I can't seem to find the answer why I'm getting nothing in the file when exiting.
For context, I'm writing a discord bot. The bot stores its data once an hour. Sometime between stores I want to store the data in case I decide I want to update the bot. When I manually store the data with a command, then kill the process, things work fine. Now, I want to be able to just kill the process without having to manually send the command. So, I have a handler for SIGINT that stores the data the same way I was doing manually and after the promise is fulfilled, I exit. For some reason, the file contains nothing after the process ends. Here's the code (trimmed).
app.ts
function exit() {
client.users.fetch(OWNER)
.then(owner => owner.send('Rewards stored. Bot shutting down'))
.then(() => process.exit());
}
process.once('SIGINT', () => {
currencyService.storeRewards().then(exit);
});
process.once('exit', () => {
currencyService.storeRewards().then(exit);
});
currency.service.ts
private guildCurrencies: Map<string, Map<string, number>> = new Map<string, Map<string, number>>();
storeRewards(): Promise<void[]> {
const promises = new Array<Promise<void>>();
this.guildCurrencies.forEach((memberCurrencies, guildId) => {
promises.push(this.storageService.store(guildId, memberCurrencies));
})
return Promise.all(promises)
}
storage.service.ts
store(guild: string, currencies: Map<string, number>): Promise<void> {
return writeFile(`${this.storageLocation}/${guild}.json`, JSON.stringify([...currencies]))
.catch(err => {
console.error('could not store currencies', err);
})
}
So, as you can see, when SIGINT is received, I get the currency service to store its data, which maps guilds to guild member currencies which is a map of guild members to their rewards. It stores the data in different files (each guild gets its own file) using the storage service. The storage service returns a promise from writeFile (should be a promise of undefined when the file is finished writing). The currency service accumulates all the promises and returns a promise that resolves when all of the store promises resolve. Then, after all of the promises are resolved, a message is sent to the bot owner (me), which returns a promise. After that promise resolves, then we exit the process. It should be a clean exit with all the data written and the bot letting me know that it's shutting down, but when I read the file later, it's empty.
I've tried logging in all sorts of different places to make sure the steps are being done in the right order and I'm not getting weird async stuff, and everything seems to be proceeding as expected, but I'm still getting an empty file. I'm not sure what's going on, and I'd really appreciate some guidance.
EDIT: I remembered something else. As another debugging step, I tried reading the files after the currency service storeRewards() promise resolved, and the contents of the files were valid* (they contained valid data, but it was probably old data as the data doesn't change often). So, one of my thoughts is that the promise for writeFile resolves before the file is fully written, but that isn't indicated in the documentation.
EDIT 2: The answer was that I was writing twice. None of the code shown in the post or the first edit would have made it clear that I was having a double write issue, so I am adding the code causing the issue so that future readers can get the same conclusion.
Thanks to #leitning for their help finding the answer in the comments on my question. After writing a random UUID in the file name, I found the file was being written twice. I had assumed when asking the question, that I had shared all the relevant info, but I had missed something. process.once('exit', ...) was being called after calling process.exit() (more details here). The callback function for the exit event does not handle asynchronous calls. When the callback function returns, the process exits. Since I had duplicated the logic in the SIGINT callback function in the exit callback function, the file was being written a second time and the process was exiting before the file could be written, resulting in an empty file. Removing the process.once('exit', ...) logic fixed the issue.

What is the most efficient way to keep writing a frequently changing JavaScript object to a file in NodeJS?

I have a JavaScript object with many different properties, and it might look something like this:
var myObj = {
prop1: "val1",
prop2: [...],
...
}
The values in this object keep updating very frequently (several times every second) and there could be thousands of them. New values could be added, existing ones could be changed or removed.
I want to have a file that always has the updated version of this object. The simple approach for doing this would be just writing the entire object to the file all over again after each time that it changes like so:
fs.writeFileSync("file.json", JSON.stringify(myObj));
This doesn't seem very efficient for big objects that need to be written very frequently. Is there a better way of doing this?
You should use a database. Something simple like sqlite3 would be a good option. Have a table with just two columns 'Key' 'Value' and use it as a key value store. You will gain advantages like transactions and better performance than a file as well as simplifying your access.
Maintaining a file (on the filesystem) containing the current state of a rapidly changing object is surprisingly difficult. Specifically, setting things up so some other program can read the file at any time is the hard part. Why? At any time the file may be in the process of being written, so the reader can get inconsistent results.
Here's an outline of a good way to do this.
1) write the file less often than each time the state changes. Whenever the state changes call updateFile (myObj). It sets a timer for, let's say, 500ms, then writes the very latest state to the file when the timer expires. Something like this: not debugged:
let latestObj
let updateFileTimer = 0
function updateFile (myObj) {
latestObj = myObj
if (updateFileTimer === 0) {
updateFileTimer = setTimeout (
function () {
/* write latestObj to the file */
updateFileTimer = 0
}, 500)
}
}
This writes the latest state of your object to the file, but no more than every 500ms.
Inside that timeout function, write out a temporary file. When it's written delete the existing file and rename the temp file to have the existing file's name. Do all this asynchronously so the rest of your program won't have to wait for the filesystem to work. Your timeout function will look like this
updateFileTimer = setTimeout (
function () {
/* write latestObj to the file */
fs.writeFile("file.json.tmp",
JSON.stringify(myObj),
function (err) {
if (err) throw err;
fs.unlink ( "file.json",
function (err) {
if (!err)
fs.renameSync( "file.json.tmp", "file.json")
} )
} )
updateFileTimer = 0
}, 500)
There's one more thing to worry about. There's a brief period of time between the unlink and the renameSync operation where the "file.json" file does not exist in the file system. So, any program you write that READs "file.json" needs to try again if the file isn't found.
If you use a Linux, MacOs, FreeBSD, or other UNIX-derived operating system for this code it will work well. Those operating systems' file systems allow one program to unlink a file while another program is reading it. If you're running it on a DOS-derived operating system like Windows, the unlink operation will fail when another program is reading the file.

range.address throws context related errors

We've been developing using Excel JavaScript API for quite a few months now. We have been coming across context related issues which got resolved for unknown reasons. We weren't able to replicate these issues and wondered how they got resolved. Recently similar issues have started popping up again.
Error we consistently get:
property 'address' is not available. Before reading the property's
value, call the load method on the containing object and call
"context.sync()" on the associated request context.
We thought as we have multiple functions defined to modularise code in project, may be context differs somewhere among these functions which has gone unnoticed. So we came up with single context solution implemented via JavaScript Module pattern.
var ContextManager = (function () {
var xlContext;//single context for entire project/application.
function loadContext() {
xlContext = new Excel.RequestContext();
}
function sync(object) {
return (object === undefined) ? xlContext.sync() : xlContext.sync(object);
}
function getWorksheetByName(name) {
return xlContext.workbook.worksheets.getItem(name.toString());
}
//public
return {
loadContext: loadContext,
sync: sync,
getWorksheetByName: getWorksheetByName
};
})();
NOTE: above code shortened. There are other methods added to ensure that single context gets used throughout application.
While implementing single context, this time round, we have been able to replicate the issue though.
Office.initialize = function (reason) {
$(document).ready(function () {
ContextManager.loadContext();
function loadRangeAddress(rng, index) {
rng.load("address");
ContextManager.sync().then(function () {
console.log("Address: " + rng.address);
}).catch(function (e) {
console.log("Failed address for index: " + index);
});
}
for (var i = 1; i <= 1000; i++) {
var sheet = ContextManager.getWorksheetByName("Sheet1");
loadRangeAddress(sheet.getRange("A" + i), i);//I expect to see a1 to a1000 addresses in console. Order doesn't matter.
}
});
}
In above case, only "A1" gets printed as range address to console. I can't see any of the other addresses (A2 to A1000)being printed. Only catch block executes. Can anyone explain why this happens?
Although I've written for loop above, that isn't my use case. In real use case, such situations occur where one range object in function a needs to load range address. In the mean while another function b also wants to load range address. Both function a and function b work asynchronously on separate tasks such as one creates table object (table needs address) and other pastes data to sheet (there's debug statement to see where data was pasted).
This is something our team hasn't been able to figure out or find a solution for.
There is a lot packed into this code, but the issue you have is that you're calling sync a whole bunch of times without awaiting the previous sync.
There are several problems with this:
If you were using different contexts, you would actually see that there is a limit of ~50 simultaneous requests, after which you'll get errors.
In your case, you're running into a different (and almost opposite) problem. Given the async nature of the APIs, and the fact that you're not awaiting on the sync-s, your first sync request (which you'd think is for just A1) will actually contain all the load requests from the execution of the entire for loop. Now, once this first sync is dispatched, the action queue will be cleared. Which means that your second, third, etc. sync will see that there is no pending work, and will no-op, executing before the first sync ever came back with the values!
[This might be considered a bug, and I'll discuss with the team about fixing it. But it's still a very dangerous thing to not await the completion of a sync before moving on to the next batch of instructions that use the same context.]
The fix is to await the sync. This is far and away the simplest to do in TypeScript 2.1 and its async/await feature, otherwise you need to do the async version of the for loop, which you can look up, but it's rather unintuitive (requires creating an uber-promise that keeps chaining a bunch of .then-s)
So, your modified TypeScript-ified code would be
ContextManager.loadContext();
async function loadRangeAddress(rng, index) {
rng.load("address");
await ContextManager.sync().then(function () {
console.log("Address: " + rng.address);
}).catch(function (e) {
OfficeHelpers.Utilities.log(e);
});
}
for (var i = 1; i <= 1000; i++) {
var sheet = ContextManager.getWorksheetByName("Sheet1");
await loadRangeAddress(sheet.getRange("A" + i), i);//I expect to see a1 to a1000 addresses in console. Order doesn't matter.
}
Note the async in front of the loadRangeAddress function, and the two await-s in front of ContextManager.sync() and loadRangeAddress.
Note that this code will also run quite slowly, as you're making an async roundtrip for each cell. Which means you're not using batching, which is at the very core of the object-model for the new APIs.
For completeness sake, I should also note that creating a "raw" RequestContext instead of using Excel.run has some disadvantages. Excel.run does a number of useful things, the most important of which is automatic object tracking and un-tracking (not relevant here, since you're only reading back data; but would be relevant if you were loading and then wanting to write back into the object).
Finally, if I may recommend (full disclosure: I am the author of the book), you will probably find a good bit of useful info about Office.js in the e-book "Building Office Add-ins using Office.js", available at https://leanpub.com/buildingofficeaddins. In particular, it has a very detailed (10-page) section on the internal workings of the object model ("Section 5.5: Implementation details, for those who want to know how it really works"). It also offers advice on using TypeScript, has a general Promise/async-await primer, describes what .run does, and has a bunch more info about the OM. Also, though not available yet, it will soon offer information on how to resume using the same context (using a newer technique than what was originally described in How can a range be used across different Word.run contexts?). The book is a lean-published "evergreen" book, son once I write the topic in the coming weeks, an update will be available to all existing readers.
Hope this helps!

How to remove module after "require" in node.js?

Let say, after I require a module and do something as below:
var b = require('./b.js');
--- do something with b ---
Then I want to take away module b (i.e. clean up the cache). how I can do it?
The reason is that I want to dynamically load/ remove or update the module without restarting node server. any idea?
------- more --------
based on the suggestion to delete require.cache, it still doesn't work...
what I did are few things:
1) delete require.cache[require.resolve('./b.js')];
2) loop for every require.cache's children and remove any child who is b.js
3) delete b
However, when i call b, it is still there! it is still accessible. unless I do that:
b = {};
not sure if it is a good way to handle that.
because if later, I require ('./b.js') again while b.js has been modified. Will it require the old cached b.js (which I tried to delete), or the new one?
----------- More finding --------------
ok. i do more testing and playing around with the code.. here is what I found:
1) delete require.cache[] is essential. Only if it is deleted,
then the next time I load a new b.js will take effect.
2) looping through require.cache[] and delete any entry in the
children with the full filename of b.js doesn't take any effect. i.e.
u can delete or leave it. However, I'm unsure if there is any side
effect. I think it is a good idea to keep it clean and delete it if
there is no performance impact.
3) of course, assign b={} doesn't really necessary, but i think it is
useful to also keep it clean.
You can use this to delete its entry in the cache:
delete require.cache[require.resolve('./b.js')]
require.resolve() will figure out the full path of ./b.js, which is used as a cache key.
Spent some time trying to clear cache in Jest tests for Vuex store with no luck. Seems like Jest has its own mechanism that doesn't need manual call to delete require.cache.
beforeEach(() => {
jest.resetModules();
});
And tests:
let store;
it("1", () => {
process.env.something = true;
store = require("#/src/store.index");
});
it("2", () => {
process.env.something = false;
store = require("#/src/store.index");
});
Both stores will be different modules.
One of the easiest ways (although not the best in terms of performance as even unrelated module's caches get cleared) would be to simply purge every module in the cache
Note that clearing the cache for *.node files (native modules) might cause undefined behaviour and therefore is not supported (https://github.com/nodejs/node/commit/5c14d695d2c1f924cf06af6ae896027569993a5c), so there needs to be an if statement to ensure those don't get removed from the cache, too.
for (const path in require.cache) {
if (path.endsWith('.js')) { // only clear *.js, not *.node
delete require.cache[path]
}
}
I found this useful for client side applications. I wanted to import code as I needed it and then garbage collect it when I was done. This seems to work. I'm not sure about the cache, but it should get garbage collected once there is no more reference to module and CONTAINER.sayHello has been deleted.
/* my-module.js */
function sayHello { console.log("hello"); }
export { sayHello };
/* somewhere-else.js */
const CONTAINER = {};
import("my-module.js").then(module => {
CONTAINER.sayHello = module.sayHello;
CONTAINER.sayHello(); // hello
delete CONTAINER.sayHello;
console.log(CONTAINER.sayHello); // undefined
});
I have found the easiest way to handle invalidating the cache is actually to reset the exposed cache object. When deleting individual entries from the cache, the child dependencies become a bit troublesome to iterate through.
require.cache = {};

Resources