Node.js: closing a file after writing - node.js

I'm currently getting a writable stream to a file using writer = fs.createWriteStream(url), doing a number of writes using writer.write(), and at the end I do writer.end(). I notice that when I do writer.end(), the file size is still zero, and remains at zero until the program terminates, at which point it reaches its correct size and the contents are visible and correct.
So it seems that writer.end() isn't closing the file.
The spec says "If autoClose is set to true (default behavior) on 'error' or 'finish' the file descriptor will be closed automatically."
I'm not sure what 'error' or 'finish' refer to here. Events presumably: but how do they relate to my call on writer.end()? Is there something I have to do to cause these events?
I would try getting a file descriptor directly and calling fd.close() explicitly to close it after writing, but I don't see a method to get a writeable stream given a file descriptor.
Any advice please?

When you call .write Node does not write immediately to the file, but it buffers all chunks until highWaterMark bytes are reached. At that point it will try to flush the contents to disk.
Reason why it's important to check .write return value, if false is returned it means that you need to wait until drain event is emitted, if you don't do this, you can exhaust the memory of the application, see:
why does attempting to write a large file cause js heap to run out of memory
The same happens for .end. It won't close the file immediately, first it will flush the buffer and after everything has been written into the file, it will close the fd.
So once you call .end you'll have to wait until finish event has been emitted.
The 'finish' event is emitted after the stream.end() method has been
called, and all data has been flushed to the underlying system.
const { once } = require('events');
const fs = require('fs');
const writer = fs.createWriteStream('/tmp/some-file');
// using top-level await, wrap in IIFE if you're running an older version
for(let i = 0; i < 10; i++) {
if(!writer.write('a'))
await once(writer, 'drain');
}
writer.end();
await once(writer, 'finish');
consle.log('File is closed and all data has been flushed');

Related

Stop nodejs from garbage collection / automatic closing of File Descriptors

Consider a database engine, which operates on an externally opened file - like SQLite, except the file handle is passed to its constructor. I'm using a setup like this for my app, but can't seem to figure out why NodeJS insists on closing the file descriptor after 2 seconds of operation. I need that thing to stay open!
const db = await DB.open(await fs.promises.open('/path/to/db/file', 'r+'));
...
(node:100840) Warning: Closing file descriptor 19 on garbage collection
(Use `node --trace-warnings ...` to show where the warning was created)
(node:100840) [DEP0137] DeprecationWarning: Closing a FileHandle object on garbage collection is deprecated. Please close FileHandle objects explicitly using FileHandle.prototype.close(). In the future, an error will be thrown if a file descriptor is closed during garbage collection.
The class DB uses the provided file descriptors extensively, over an extended period of time, so it closing is rather annoying. In that class, I'm using methods such as readFile, createReadStream() and the readline module to step through the lines of the file. I'm passing { autoClose: false, emitClose: false } to any read/write streams I'm using, but to no avail.
Why is this happening?
How can I stop it?
Thanks
I suspect you're running into an evil problem in using await in this
for await (const line of readline.createInterface({input: file.createReadStream({start: 0, autoClose: false})}))
If you use await anywhere else in the for loop block (which you are), the underlying stream fires all its data events and finishes (while you are at the other await and, in some cases, your process even exits before you got to process any of the data or line events from the stream. This is a truly flawed design and has bitten many others.
The safest way around this is to not use the asyncIterator at all, and just wrap a promise yourself around the regular eveents from the readline object.
Close the file handle after waiting for any pending operation.
import { open } from 'fs/promises';
let filehandle;
try {
filehandle = await open('thefile.txt', 'r');
} finally {
await filehandle?.close();
}

When does PassThrough stream.write throws ERR_STREAM_DESTROYED or just do no-op after destroyed?

My code:
const PassThrough = require('stream').PassThrough
const strm = new PassThrough()
strm.destroy()
strm.write('abcd') // no-op
Last strm.write does nothing and return false in node 14.7 (latest), and throws ERR_STREAM_DESTROYEDin node 12.18.3 (LTS).
In documentation of LTS node,
Destroy the stream. Optionally emit an 'error' event, and emit a
'close' event (unless emitClose is set to false). After this call, the
writable stream has ended and subsequent calls to write() or end()
will result in an ERR_STREAM_DESTROYED error. This is a destructive
and immediate way to destroy a stream. Previous calls to write() may
not have drained, and may trigger an ERR_STREAM_DESTROYED error. Use
end() instead of destroy if data should flush before close, or wait
for the 'drain' event before destroying the stream.
So it clearly says that it will throw ERR_STREAM_DESTROYED.
However, in documentation of latest node, the following sentence is added.
Once destroy() has been called any further calls will be a noop and no
further errors except from _destroy may be emitted as 'error'.
While the previous documentation about ERR_STREAM_DESTROYED is not invalidated.
What is the exact condition for the write method of a destroyed PassThrough stream to throw ERR_STREAM_DESTROYED or to do no-op in node 14? Please provide proof-of-concept code throwing ERR_STREAM_DESTROYED.

How to ensure we listen to a child process' events before they occur?

Here is some node.js code that spawns a linux ls command and prompts its result
const spawn = require('child_process').spawn;
const ls = spawn('ls', ['-l']);
let content = "";
ls.stdout.on('data', function(chunk){
content += chunk.toString();
});
ls.stdout.on('end', function(){
console.log(content);
});
This works well. However, the ls command is launched asynchronously, completely separated from the main nodeJs thread. My concern is that the data and end events on the process' stdout may have occurred before I attached event listeners.
Is there a way to attach event listeners before starting that sub-process ?
Note : I don't think I can wrap a Promise around the spawn function to make this work, as it would rely on events to be properly catched to trigger success/failure (leading back to the problem)
There is no problem here.
Readable streams (since node v0.10) have a (limited) internal buffer that stores data until you read from the stream. If the internal buffer fills up, the backpressure mechanism will kick in, causing the stream to stop reading data from its source.
Once you call .read() or add a data event handler, the internal buffer will start to drain and will then start reading from its source again.

Must I repeatedly call readable.read() within a readable event handler?

Suppose I have created a transform stream called Parser which can be written to like a normal stream but is read from as an object stream. I am using the readable event for the code that uses this transform stream:
var parser = new Parser();
parser.on('readable', function () {
var data = parser.read();
console.log(data);
});
In this event handler, must I repeatedly call parser.read()? Or, will readable fire on its own for every single object being pushed from my transform stream?
According to the node docs, "Once the internal buffer is drained, a readable event will fire again when more data is available," so if you call read() just once and there's still more data to be read, you'll have to remember to read() some more later on.
You could call read() in a while loop (inside your 'readable' event handler) until it returns null, then wait for the next 'readable' event.
If you don't specify a size you only need to call it once per event fire. Readable will fire on its own each time there is more data.
You then have readable.on('end', ... that will allow you to know when no more data is available.

NodeJS writable streams: how to wait for data to be flushed?

I have a simple situation in which an https.get pipes its response stream into a file stream created with fs.createWriteStream, something like this:
var file = fs.createWriteStream('some-file');
var downloadComplete = function() {
// check file size with fs.stat
};
https.get(options, function(response) {
file.on('finish', downloadComplete);
response.pipe(file);
});
Almost all the time this works fine and the file size determined in downloadComplete is what is expected. Every so often however, it's a bit too small, almost like the underlying file stream hasn't written to the disk even though it has raised the finish event.
Does anyone know what's happening here, or have any particular way to make this safer to delays in finish being called and the underlying data being written to disk?

Resources