Why destroy stream in error? - node.js

I see some modules that pipe readable streams in writable streams, and if any error occurr, they use the destroy method:
const readable = fs.createReadStream("file");
const writable = fs.createWriteStream("file2");
readable.pipe(writable);
readable.on("error", (error) => {
readable.destroy();
writable.destroy();
writable.removeListener("close");
callback(error);
});
writable.on("error", (error) => {
readable.destroy();
writable.destroy();
writable.removeListener("close");
callback(error);
});
What is the necessity of destroying the streams and removing the close event on the writable stream? If i don't do that, what could happen?
Thanks.

I believe this is necessary to avoid memory leaks. As per the Node.js documentation on the readable.pipe() method,
One important caveat is that if the Readable stream emits an error during processing, the Writable destination is not closed automatically. If an error occurs, it will be necessary to manually close each stream in order to prevent memory leaks.
In the script below, comment out the line w.destroy(err) and notice none of the Writeable events emit. Not sure why Node.js designers chose to not automatically destroy the Writeable's, maybe they didn't want Stream.pipe() to be too opinionated.
const r = new Readable({
objectMode: true,
read() {
try {
this.push(JSON.parse('{"prop": "I am the data"'))
this.push(null) // make sure we let Writeable's know there's no more to read
} catch (e) {
console.error(`Problem encountered while reading data`, e)
this.destroy(e)
}
}
}).on('error', (err) => {
console.log(`Reader error: ${err}`)
w.destroy(err)
done()
})
const w = new Writable({
objectMode: true,
write(chunk, encoding, callback) {
callback()
}
}).on('error', (err) => {
console.error(`Writer error: ${err}`)
})
.on('close', () => {
console.error(`Writer close`)
})
.on('finish', () => {
console.error(`Writer finish`)
})
r.pipe(w)

Related

stream - 'done event is being called even when pausing the connection'

I am trying to read from csv and i am pausing the stream as i need to do do some async task using await. However the done event is called before the all the row has been processed. as I understand even when paused that doesn't stop the done even fron being called. Is there any workaroung to this?
let res = csv({
delimiter: '|',
// noheader: true,
output: "csv",
nullObject: true
})
.fromStream(fs.createReadStream(`./dbscvs/new_${table.name}.csv`))
.on('data', async (data) => {
res.pause()
await new Promise(resolve=>{
setTimeout(resolve,5000)
})
res.resume()
})
.on('done', async e => {
console.log('done')
})
.on('error', (err) => {
console.log(err)
})
Try the end event, which is emitted after all data has been output.

NodeJS: Reliably act upon a closed stream

I'm looking for a way to reliably know for sure whether all the data in the stream has been processed. An asynchronous data listener might be called after the end event, in which case I cannot use the end event to, for instance, close a database connection when the data event is still executing database queries.
Example:
const fs = require('fs')
const stream = fs.createReadStream('./big.file', { encoding: 'utf8' });
stream
.on('data', () => {
stream.pause();
setTimeout(() => {
console.log('data');
stream.resume();
}, 10);
})
.on('close', function() {
console.log('end');
});
This will log "data" a lot of times, then "end", and then "data" one more time.
So in a real-world example, if "data" is doing queries, and "end" would close the connection, the last query would throw an error because the database connection was closed prematurely.
I've noticed a closed property on the stream, and of course there is the isPaused() methods, and I can use those to fix my problem kind of:
stream
.on('data', () => {
stream.pause();
databaseQuery().then(result => {
stream.resume();
if (stream.closed) {
closeConnection();
}
});
})
.on('close', function() {
if (!stream.isPaused()) {
closeConnection();
}
});
I'm unsure however if this is the best way to go about this.
Can I be sure the connection will be closed at all?
Edit: I'm seeing similar results for the "end" event, it doesn't matter whether I use "end" or "close", the test logs are identical.

Question about end of request for node/JS request package

I'm trying to understand what .on('end', ...) does in the node package request.
My code:
const fs = require('fs');
const request = require('request');
function downloadAsset(relativeAssetURL, fileName) {
return new Promise((resolve, reject) => {
try {
let writeStream = fs.createWriteStream(fileName);
var remoteImage = request(`https:${relativeAssetURL}`);
remoteImage.on('data', function(chunk) {
writeStream.write(chunk);
});
remoteImage.on('end', function() {
let stats = fs.statSync(fileName);
resolve({ fileName: fileName, stats: stats });
});
} catch (err) {
reject(err);
}
});
}
What I'm trying to do is download a remote image, get some file statistics, and then resolve the promise so my code can do other things.
What I'm finding is that the promise doesn't always resolve after the file has been downloaded; it may resolve a little before then. I thought that's what .on('end', ... ) was for.
What can I do to have this promise resolve after the image has been downloaded in full?
As the docs say:
The writable.write() method writes some data to the stream, and calls the supplied callback once the data has been fully handled.
So, writable.write() is asynchronous. Just because your last writeStream.write has been called does not necessarily mean that all write operations have been completed. You probably want to call the .end method, which means:
Calling the writable.end() method signals that no more data will be written to the Writable. The optional chunk and encoding arguments allow one final additional chunk of data to be written immediately before closing the stream. If provided, the optional callback function is attached as a listener for the 'finish' event.
So, try calling writeStream.end when the remoteImage request ends, and pass a callback to writeStream.end that resolves the Promise once the writing is finished:
function downloadAsset(relativeAssetURL, fileName) {
return new Promise((resolve, reject) => {
try {
const writeStream = fs.createWriteStream(fileName);
const remoteImage = request(`https:${relativeAssetURL}`);
remoteImage.on('data', function(chunk) {
writeStream.write(chunk);
});
remoteImage.on('end', function() {
writeStream.end(() => {
const stats = fs.statSync(fileName);
resolve({ fileName: fileName, stats: stats });
});
});
} catch (err) {
reject(err);
}
});
}
(also try not to mix var and let/const - in an ES6+ environment, prefer const, which is generally easier to read and has fewer problems, like hoisting)

How can I create a Node.js surrogate readable stream that will wrap another stream that's not available at the time the surrogate stream was created?

I can make a simple HTTP request and get a stream back.
But what if I have to make an HTTP request, then poll to find out if the data is ready, then make another request to get the data?
I'd like to do that all in a single method that returns a stream so I can do:
multiStepMethod(options).pipe(wherever);
Instead of:
multiStepMethod(options, (err, stream) => {
stream.pipe(wherever);
})
I need multiStepMethod to return a surrogate readable stream that will wait for some event and then wrap the (now available) stream and start sending it's data down the pipe.
#!/usr/bin/env node
'use strict';
const stream = require('stream');
// This is an example of a 'readable' stream that has to go through a multi-
// step process to finally get the actual readable stream. So we are
// asynchronously wrapping another readable stream.
// The key to simplicity here was to use a transform stream instead of a
// readable stream because it allows us to pipe the stream to ourselves.
class ReadableWrappingTransform extends stream.Transform {
constructor() {
super({
objectMode: true,
// Our _transform method doesn't really do anything and we don't want to
// hog up any more additional memory than necessary.
highWaterMark: 1
});
process.nextTick(() => {
if (new Date().getTime() % 5 === 1) {
// Here we simulate an error that happened somewhere in the multi-step
// process to get the final stream. So we just emit 'error' and we're
// done.
this.emit('error', new Error('Could not get the stream.'));
//Assuming based on the node docs that we should not emit
// 'close' or 'end' on error. If we do emit 'end', it will trigger the
// writable's 'finish' event, which is probably not desired. You either
// want an 'error' OR a 'finish'.
// NODE END EVENT DOCS
// The 'end' event is emitted when there is no more data to be consumed
// from the stream.
// Note: The 'end' event will not be emitted unless the data is
// completely consumed. This can be accomplished by switching the stream
// into flowing mode, or by calling stream.read() repeatedly until all
// data has been consumed.
// this.emit('end');
// NODE CLOSE EVENT DOCS
// The 'close' event is emitted when the stream and any of its
// underlying resources (a file descriptor, for example) have been
// closed. The event indicates that no more events will be emitted, and
// no further computation will occur.
// Not all Readable streams will emit the 'close' event.
// this.emit('close');
} else {
// We successfully got the stream we wanted after a long, hard, multi-
// step process, so first we need to copy all our listeners over to it
// -- NOT.
// ['close', 'data', 'end', 'error'].forEach((eventName) => {
// this.listeners(eventName).forEach((l) => {
// readable.on(eventName, l);
// });
// });
// Turns out that .pipe propagates ALL listeners EXCEPT the 'error'
// listener. What's up with that !?! If we copy any of the others we
// get double the events -- including double the data. So here we just
// copy over the 'error' listener to make sure we get 'error' events.
['error'].forEach((eventName) => {
this.listeners(eventName).forEach((l) => {
readable.on(eventName, l);
});
});
// Then just pipe the final readable to ourselves, and we are good.
readable
.pipe(this);
}
});
}
_transform(data, encoding, callback) {
// Nothing special to do here just pass along the data.
this.push(data);
callback();
}
}
// This is just a very unreliable test readable stream.
const readable = new stream.Readable({
objectMode: true,
read() {
for (let i = 0; i < 10; i++) {
if (new Date().getTime() % 13 === 1) {
this.__err = new Error('Sorry, error reading data.');
this.emit('error', this.__err);
return;
}
this.push({
Name: `Mikey ${i}`
});
}
this.push(null);
}
});
// Any old writable that we can pipe to.
const writable = new stream.Writable({
objectMode: true,
write(chunk, encoding, callback) {
console.log(chunk, encoding);
callback();
}
});
new ReadableWrappingTransform()
// if your stream emits close you get close.
.on('close', () => {
console.error('CLOSE');
})
// if you push null you get end from read.
.on('end', () => {
console.error('END');
})
// error needs to be both places !?! seriously node?
.on('error', (error) => {
console.error('ERROR', error);
})
// Finish does no good here. It's a writable event.
// .on('finish', () => {
// console.error('FINISH');
// })
.pipe(writable)
// Close and End do no good here, they are readable events.
// They are not propagated to the writable.
//
// // if your stream emits close you get close.
// .on('close', () => {
// console.error('CLOSE');
// })
// // if you push null you get end from read.
// .on('end', () => {
// console.error('END');
// })
// error needs to be both places !?! seriously node?
.on('error', (error) => {
console.error('ERROR', error);
})
// you should always get either finish or error or something was done
// incorrectly.
.on('finish', () => {
console.error('FINISH');
});

No end event when piping inside "open"

I am piping a download into a file, but wanting to make sure the file doesn't already exist. I've put the code up here for an easier exploration: https://tonicdev.com/tolmasky/streaming-piping-on-open-tester <-- this will show you the outputs (code also below inline).
So the thing is, it seems to work fine except for the done (end) event. The file ends up on the hard drive fine, each step is followed correctly (the structure is to ensure no "parallel" steps happen that aren't necessary -- if I do got.stream(url).pipe(fs.createWriteStream({ flags: ... })), then the download will actually get kicked off even if the createWriteStream returns an error because the file is already there -- undesirable for the network).
The code is the following:
var fs = require("fs");
var got = require("got");
await download("https://www.apple.com", "./index.html");
function download(aURL, aDestinationFilePath)
{
return new Promise(function(resolve, reject)
{
fs.createWriteStream(aDestinationFilePath, { flags: "wx" })
.on("open", function()
{
const writeStream = this;
console.log("SUCCESSFULLY OPENED!");
got.stream(aURL)
.on("response", function(aResponse)
{
const contentLength = +aResponse.headers["content-length"] || 0;
console.log(aResponse.headers);
console.log("STARTING DOWNLOAD! " + contentLength);
this.on("data", () => console.log("certainly getting data"))
this.pipe(writeStream)
.on("error", reject)
.on("end", () => console.log("DONE!"))
.on("end", resolve);
})
})
.on("error", function(anError)
{
if (anError.code === "EEXIST") { console.log("oh");
resolve();}
else
reject(anError);
});
});
}
According to the stream docs, readable.pipe returns the destination Writable stream, and the correct event emitted when a Writable is done would be Event: 'finish'.

Resources