NodeJS: Reliably act upon a closed stream

NodeJS: Reliably act upon a closed stream - node.js

I'm looking for a way to reliably know for sure whether all the data in the stream has been processed. An asynchronous data listener might be called after the end event, in which case I cannot use the end event to, for instance, close a database connection when the data event is still executing database queries.
Example:
const fs = require('fs')
const stream = fs.createReadStream('./big.file', { encoding: 'utf8' });
stream
.on('data', () => {
stream.pause();
setTimeout(() => {
console.log('data');
stream.resume();
}, 10);
})
.on('close', function() {
console.log('end');
});
This will log "data" a lot of times, then "end", and then "data" one more time.
So in a real-world example, if "data" is doing queries, and "end" would close the connection, the last query would throw an error because the database connection was closed prematurely.
I've noticed a closed property on the stream, and of course there is the isPaused() methods, and I can use those to fix my problem kind of:
stream
.on('data', () => {
stream.pause();
databaseQuery().then(result => {
stream.resume();
if (stream.closed) {
closeConnection();
}
});
})
.on('close', function() {
if (!stream.isPaused()) {
closeConnection();
}
});
I'm unsure however if this is the best way to go about this.
Can I be sure the connection will be closed at all?
Edit: I'm seeing similar results for the "end" event, it doesn't matter whether I use "end" or "close", the test logs are identical.

Related

Nodejs `fs.createReadStream` as promise

Im trying to get fs.createReadStream working as a promise, so after the entire file has been read, it will be resolved.
In the case bellow, im pausing the stream, executing the awaitable method and resuming.
How to make .on('end'... to be be executed in the end.
if 1. is not possible, why the `.on('wont be fired', maybe i can use it to resolve the promise.
function parseFile<T>(filePath: string, row: (x: T) => void, err: (x) => void, end: (x) => void) {
return new Promise((resolve, reject) => {
const stream = fs.createReadStream(filePath);
stream.on('data', async data => {
try {
stream.pause();
await row(data);
} finally {
stream.resume();
}
})
.on('end', (rowCount: number) => {
resolve();// NOT REALLY THE END row(data) is still being called after this
})
.on('close', () => {
resolve();// NEVER BEING CALLED
})
.on('error', (rowCount: number) => {
reject();// NEVER GETS HERE, AS EXPECTED
})
})
}
UPDATE
Here you can actually test it: https://stackblitz.com/edit/node-czktjh?file=index.js
run node index.js
The output should be 1000 and not 1
Thanks

Something to be aware of. You've removed the line processing from the current version of the question so the stream is being read in large chunks. It appears to be reading the entire file in just two chunks, thus just two data events so the expected count here is 2, not 1000.
I think the problem with this code occurs because stream.pause() does not pause the generation of the end event - it only pauses future data events. If the last data event has been fired and you then await inside the processing of that data event (which causes your data event handler to immediately return a promise, the stream will think it's done and the end event will still fire before you're done awaiting the function inside the processing of that last data event. Remember, the data event handler is NOT promise-aware. And, it appears that stream.pause() only affects data events, not the end event.
I can imagine a work-around with a flag that keeps track of whether you're still processing a data event and postpones processing the end event until you're done with that last data event. I will add code for that in a second that illustrates how to use the flag.
FYI, the missing close event is another stream weirdness. Your nodejs program actually terminates before the close event gets to fire. If you put this at the start of your program:
setTimeout(() => { console.log('done with timer');}, 5000);
Then, you will see the close event because the timer will prevent your nodejs program from exiting before the close event gets to fire. I'm not suggesting this as a solution to any problem, just to illustrate that the close event is still there and wants to fire if your program doesn't exit before it gets a chance.
Here's code that demonstrated the use of flags to work-around the pause issue. When you run this code, you will only see 2 data events, not 1000 because this code is not reading lines, it's reading much larger chunks that that. So, the expected result of this is not 1000.
// run `node index.js` in the terminal
const fs = require('fs');
const parseFile = row => {
let paused = true;
let ended = false;
let dataCntr = 0;
return new Promise((resolve, reject) => {
const stream = fs.createReadStream('./generated.data.csv');
stream
.on('data', async data => {
++dataCntr;
try {
stream.pause();
paused = true;
await row(data);
} finally {
paused = false;
stream.resume();
if (ended) {
console.log(`received ${dataCntr} data events`);
resolve();
}
}
})
.on('end', rowCount => {
ended = true;
if (!paused) {
console.log(`received ${dataCntr} data events`);
resolve();
}
})
.on('close', () => {
//resolve();
})
.on('error', rowCount => {
reject();
});
});
};
(async () => {
let count = 0;
await parseFile(async row => {
await new Promise(resolve => setTimeout(resolve, 50)); //sleep
count++;
});
console.log(`lines executed: ${count}, the expected is more than 1`);
})();
FYI, I still think your original version of the question had the problem I mentioned in my first comment - that you weren't pausing the right stream. What is documented here is yet another problem (where you can get end before your await in the last data event is done).

How to asynchronously createReadStream in node.js with async/await

I am having difficulty with using fs.creadReadStream to process my csv file asynchronously:
async function processData(row) {
// perform some asynchronous function
await someAsynchronousFunction();
}
fs.createReadStream('/file')
.pipe(parse({
delimiter: ',',
columns: true
})).on('data', async (row) => {
await processData(row);
}).on('end', () => {
console.log('done processing!')
})
I want to perform some asynchronous function after reading each record one by one before the createReadStream reaches on('end').
However, the on('end') gets hit before all of my data finishes processing. Does anyone know what I might be doing wrong?
Thanks in advance!

.on('data, ...) does not wait for your await. Remember, an async function returns a promise immediately and .on() is not paying any attention to that promise so it just keeps merrily going on.
The await only waits inside the function, it does not stop your function from returning immediately and thus the stream thinks you've process the data and keeps sending more data and generating more data events.
There are several possible approaches here, but the simplest might be to pause the stream until processData() is done and then restart the stream.
Also, does processData() return a promise that is linked to the completion of the async operation? That is also required for await to be able to do its job.
The readable stream doc contains an example of pausing the stream during a data event and then resuming it after some asynchronous operation finishes. Here's their example:
const readable = getReadableStreamSomehow();
readable.on('data', (chunk) => {
console.log(`Received ${chunk.length} bytes of data.`);
readable.pause();
console.log('There will be no additional data for 1 second.');
setTimeout(() => {
console.log('Now data will start flowing again.');
readable.resume();
}, 1000);
});

I ran into the same problem recently. I fixed it by using an array of promises, and waiting for all of them to resolve when .on("end") was triggered.
import parse from "csv-parse";
export const parseCsv = () =>
new Promise((resolve, reject) => {
const promises = [];
fs.createReadStream('/file')
.pipe(parse({ delimiter: ',', columns: true }))
.on("data", row => promises.push(processData(row)))
.on("error", reject)
.on("end", async () => {
await Promise.all(promises);
resolve();
});
});

Listen to reconnect events in MongoDB driver

I would like to add event listeners to a MongoDB connection to run something when the connection drops, each reconnection attempt and at a successful reconnection attempt.
I read all the official docs and the API, but I can't find a solution.
Currently, I have this, but only the timeout event works.
// If we didn't already initialize a 'MongoClient', initialize one and save it.
if(!this.client) this.client = new MongoClient();
this.connection = await this.client.connect(connectionString, this.settings);
this.client.server.on('connect', event => {
console.log(event);
});
this.client.server.on('error', event => {
console.log(event);
});
this.client.server.on('reconnect', event => {
console.log(event);
});
this.client.server.on('connections', event => {
console.log(event);
});
this.client.server.on('timeout', event => {
console.log(event);
});
this.client.server.on('all', event => {
console.log(event);
});
I tried the events listed here, and they work, but there is no "reconnect" event:
http://mongodb.github.io/node-mongodb-native/2.2/reference/management/sdam-monitoring/

Sure you can. Basically though you need to tap into the EventEmitter at a lower level than basically off the MongoClient itself.
You can clearly see that such things exist since they are visible in "logging", which can be turned on in the driver via the setting:
{ "loggerLevel": "info" }
From then it's really just a matter of tapping into the actual source emitter. I've done these in the following listing, as well as including a little trick for getting the enumerated events from a given emitted, which was admittedly used by myself in tracking this down:
const MongoClient = require('mongodb').MongoClient;
function patchEmitter(emitter) {
var oldEmit = emitter.emit;
emitter.emit = function() {
var emitArgs = arguments;
console.log(emitArgs);
oldEmit.apply(emitter, arguments);
}
}
(async function() {
let db;
try {
const client = new MongoClient();
client.on('serverOpening', () => console.log('connected') );
db = await client.connect('mongodb://localhost/test', {
//loggerLevel: 'info'
});
//patchEmitter(db.s.topology);
db.s.topology.on('close', () => console.log('Connection closed') );
db.s.topology.on('reconnect', () => console.log('Reconnected') );
} catch(e) {
console.error(e)
}
})()
So those two listeners defined:
db.s.topology.on('close', () => console.log('Connection closed') );
db.s.topology.on('reconnect', () => console.log('Reconnected') );
Are going to fire when the connection drops, and when an reconnect is achieved. There are also other things like reconnect attempts which are also in the event emitter just like you would see with the loggerLevel setting turned on.

Why destroy stream in error?

I see some modules that pipe readable streams in writable streams, and if any error occurr, they use the destroy method:
const readable = fs.createReadStream("file");
const writable = fs.createWriteStream("file2");
readable.pipe(writable);
readable.on("error", (error) => {
readable.destroy();
writable.destroy();
writable.removeListener("close");
callback(error);
});
writable.on("error", (error) => {
readable.destroy();
writable.destroy();
writable.removeListener("close");
callback(error);
});
What is the necessity of destroying the streams and removing the close event on the writable stream? If i don't do that, what could happen?
Thanks.

I believe this is necessary to avoid memory leaks. As per the Node.js documentation on the readable.pipe() method,
One important caveat is that if the Readable stream emits an error during processing, the Writable destination is not closed automatically. If an error occurs, it will be necessary to manually close each stream in order to prevent memory leaks.
In the script below, comment out the line w.destroy(err) and notice none of the Writeable events emit. Not sure why Node.js designers chose to not automatically destroy the Writeable's, maybe they didn't want Stream.pipe() to be too opinionated.
const r = new Readable({
objectMode: true,
read() {
try {
this.push(JSON.parse('{"prop": "I am the data"'))
this.push(null) // make sure we let Writeable's know there's no more to read
} catch (e) {
console.error(`Problem encountered while reading data`, e)
this.destroy(e)
}
}
}).on('error', (err) => {
console.log(`Reader error: ${err}`)
w.destroy(err)
done()
})
const w = new Writable({
objectMode: true,
write(chunk, encoding, callback) {
callback()
}
}).on('error', (err) => {
console.error(`Writer error: ${err}`)
})
.on('close', () => {
console.error(`Writer close`)
})
.on('finish', () => {
console.error(`Writer finish`)
})
r.pipe(w)

How can I create a Node.js surrogate readable stream that will wrap another stream that's not available at the time the surrogate stream was created?

I can make a simple HTTP request and get a stream back.
But what if I have to make an HTTP request, then poll to find out if the data is ready, then make another request to get the data?
I'd like to do that all in a single method that returns a stream so I can do:
multiStepMethod(options).pipe(wherever);
Instead of:
multiStepMethod(options, (err, stream) => {
stream.pipe(wherever);
})
I need multiStepMethod to return a surrogate readable stream that will wait for some event and then wrap the (now available) stream and start sending it's data down the pipe.

#!/usr/bin/env node
'use strict';
const stream = require('stream');
// This is an example of a 'readable' stream that has to go through a multi-
// step process to finally get the actual readable stream. So we are
// asynchronously wrapping another readable stream.
// The key to simplicity here was to use a transform stream instead of a
// readable stream because it allows us to pipe the stream to ourselves.
class ReadableWrappingTransform extends stream.Transform {
constructor() {
super({
objectMode: true,
// Our _transform method doesn't really do anything and we don't want to
// hog up any more additional memory than necessary.
highWaterMark: 1
});
process.nextTick(() => {
if (new Date().getTime() % 5 === 1) {
// Here we simulate an error that happened somewhere in the multi-step
// process to get the final stream. So we just emit 'error' and we're
// done.
this.emit('error', new Error('Could not get the stream.'));
//Assuming based on the node docs that we should not emit
// 'close' or 'end' on error. If we do emit 'end', it will trigger the
// writable's 'finish' event, which is probably not desired. You either
// want an 'error' OR a 'finish'.
// NODE END EVENT DOCS
// The 'end' event is emitted when there is no more data to be consumed
// from the stream.
// Note: The 'end' event will not be emitted unless the data is
// completely consumed. This can be accomplished by switching the stream
// into flowing mode, or by calling stream.read() repeatedly until all
// data has been consumed.
// this.emit('end');
// NODE CLOSE EVENT DOCS
// The 'close' event is emitted when the stream and any of its
// underlying resources (a file descriptor, for example) have been
// closed. The event indicates that no more events will be emitted, and
// no further computation will occur.
// Not all Readable streams will emit the 'close' event.
// this.emit('close');
} else {
// We successfully got the stream we wanted after a long, hard, multi-
// step process, so first we need to copy all our listeners over to it
// -- NOT.
// ['close', 'data', 'end', 'error'].forEach((eventName) => {
// this.listeners(eventName).forEach((l) => {
// readable.on(eventName, l);
// });
// });
// Turns out that .pipe propagates ALL listeners EXCEPT the 'error'
// listener. What's up with that !?! If we copy any of the others we
// get double the events -- including double the data. So here we just
// copy over the 'error' listener to make sure we get 'error' events.
['error'].forEach((eventName) => {
this.listeners(eventName).forEach((l) => {
readable.on(eventName, l);
});
});
// Then just pipe the final readable to ourselves, and we are good.
readable
.pipe(this);
}
});
}
_transform(data, encoding, callback) {
// Nothing special to do here just pass along the data.
this.push(data);
callback();
}
}
// This is just a very unreliable test readable stream.
const readable = new stream.Readable({
objectMode: true,
read() {
for (let i = 0; i < 10; i++) {
if (new Date().getTime() % 13 === 1) {
this.__err = new Error('Sorry, error reading data.');
this.emit('error', this.__err);
return;
}
this.push({
Name: `Mikey ${i}`
});
}
this.push(null);
}
});
// Any old writable that we can pipe to.
const writable = new stream.Writable({
objectMode: true,
write(chunk, encoding, callback) {
console.log(chunk, encoding);
callback();
}
});
new ReadableWrappingTransform()
// if your stream emits close you get close.
.on('close', () => {
console.error('CLOSE');
})
// if you push null you get end from read.
.on('end', () => {
console.error('END');
})
// error needs to be both places !?! seriously node?
.on('error', (error) => {
console.error('ERROR', error);
})
// Finish does no good here. It's a writable event.
// .on('finish', () => {
// console.error('FINISH');
// })
.pipe(writable)
// Close and End do no good here, they are readable events.
// They are not propagated to the writable.
//
// // if your stream emits close you get close.
// .on('close', () => {
// console.error('CLOSE');
// })
// // if you push null you get end from read.
// .on('end', () => {
// console.error('END');
// })
// error needs to be both places !?! seriously node?
.on('error', (error) => {
console.error('ERROR', error);
})
// you should always get either finish or error or something was done
// incorrectly.
.on('finish', () => {
console.error('FINISH');
});

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

NodeJS: Reliably act upon a closed stream - node.js

Related

Nodejs `fs.createReadStream` as promise

How to asynchronously createReadStream in node.js with async/await

Listen to reconnect events in MongoDB driver

Why destroy stream in error?

How can I create a Node.js surrogate readable stream that will wrap another stream that's not available at the time the surrogate stream was created?

Categories

Resources