stream - 'done event is being called even when pausing the connection' - node.js

I am trying to read from csv and i am pausing the stream as i need to do do some async task using await. However the done event is called before the all the row has been processed. as I understand even when paused that doesn't stop the done even fron being called. Is there any workaroung to this?
let res = csv({
delimiter: '|',
// noheader: true,
output: "csv",
nullObject: true
})
.fromStream(fs.createReadStream(`./dbscvs/new_${table.name}.csv`))
.on('data', async (data) => {
res.pause()
await new Promise(resolve=>{
setTimeout(resolve,5000)
})
res.resume()
})
.on('done', async e => {
console.log('done')
})
.on('error', (err) => {
console.log(err)
})

Try the end event, which is emitted after all data has been output.

Related

Nodejs `fs.createReadStream` as promise

Im trying to get fs.createReadStream working as a promise, so after the entire file has been read, it will be resolved.
In the case bellow, im pausing the stream, executing the awaitable method and resuming.
How to make .on('end'... to be be executed in the end.
if 1. is not possible, why the `.on('wont be fired', maybe i can use it to resolve the promise.
function parseFile<T>(filePath: string, row: (x: T) => void, err: (x) => void, end: (x) => void) {
return new Promise((resolve, reject) => {
const stream = fs.createReadStream(filePath);
stream.on('data', async data => {
try {
stream.pause();
await row(data);
} finally {
stream.resume();
}
})
.on('end', (rowCount: number) => {
resolve();// NOT REALLY THE END row(data) is still being called after this
})
.on('close', () => {
resolve();// NEVER BEING CALLED
})
.on('error', (rowCount: number) => {
reject();// NEVER GETS HERE, AS EXPECTED
})
})
}
UPDATE
Here you can actually test it: https://stackblitz.com/edit/node-czktjh?file=index.js
run node index.js
The output should be 1000 and not 1
Thanks
Something to be aware of. You've removed the line processing from the current version of the question so the stream is being read in large chunks. It appears to be reading the entire file in just two chunks, thus just two data events so the expected count here is 2, not 1000.
I think the problem with this code occurs because stream.pause() does not pause the generation of the end event - it only pauses future data events. If the last data event has been fired and you then await inside the processing of that data event (which causes your data event handler to immediately return a promise, the stream will think it's done and the end event will still fire before you're done awaiting the function inside the processing of that last data event. Remember, the data event handler is NOT promise-aware. And, it appears that stream.pause() only affects data events, not the end event.
I can imagine a work-around with a flag that keeps track of whether you're still processing a data event and postpones processing the end event until you're done with that last data event. I will add code for that in a second that illustrates how to use the flag.
FYI, the missing close event is another stream weirdness. Your nodejs program actually terminates before the close event gets to fire. If you put this at the start of your program:
setTimeout(() => { console.log('done with timer');}, 5000);
Then, you will see the close event because the timer will prevent your nodejs program from exiting before the close event gets to fire. I'm not suggesting this as a solution to any problem, just to illustrate that the close event is still there and wants to fire if your program doesn't exit before it gets a chance.
Here's code that demonstrated the use of flags to work-around the pause issue. When you run this code, you will only see 2 data events, not 1000 because this code is not reading lines, it's reading much larger chunks that that. So, the expected result of this is not 1000.
// run `node index.js` in the terminal
const fs = require('fs');
const parseFile = row => {
let paused = true;
let ended = false;
let dataCntr = 0;
return new Promise((resolve, reject) => {
const stream = fs.createReadStream('./generated.data.csv');
stream
.on('data', async data => {
++dataCntr;
try {
stream.pause();
paused = true;
await row(data);
} finally {
paused = false;
stream.resume();
if (ended) {
console.log(`received ${dataCntr} data events`);
resolve();
}
}
})
.on('end', rowCount => {
ended = true;
if (!paused) {
console.log(`received ${dataCntr} data events`);
resolve();
}
})
.on('close', () => {
//resolve();
})
.on('error', rowCount => {
reject();
});
});
};
(async () => {
let count = 0;
await parseFile(async row => {
await new Promise(resolve => setTimeout(resolve, 50)); //sleep
count++;
});
console.log(`lines executed: ${count}, the expected is more than 1`);
})();
FYI, I still think your original version of the question had the problem I mentioned in my first comment - that you weren't pausing the right stream. What is documented here is yet another problem (where you can get end before your await in the last data event is done).

How to asynchronously createReadStream in node.js with async/await

I am having difficulty with using fs.creadReadStream to process my csv file asynchronously:
async function processData(row) {
// perform some asynchronous function
await someAsynchronousFunction();
}
fs.createReadStream('/file')
.pipe(parse({
delimiter: ',',
columns: true
})).on('data', async (row) => {
await processData(row);
}).on('end', () => {
console.log('done processing!')
})
I want to perform some asynchronous function after reading each record one by one before the createReadStream reaches on('end').
However, the on('end') gets hit before all of my data finishes processing. Does anyone know what I might be doing wrong?
Thanks in advance!
.on('data, ...) does not wait for your await. Remember, an async function returns a promise immediately and .on() is not paying any attention to that promise so it just keeps merrily going on.
The await only waits inside the function, it does not stop your function from returning immediately and thus the stream thinks you've process the data and keeps sending more data and generating more data events.
There are several possible approaches here, but the simplest might be to pause the stream until processData() is done and then restart the stream.
Also, does processData() return a promise that is linked to the completion of the async operation? That is also required for await to be able to do its job.
The readable stream doc contains an example of pausing the stream during a data event and then resuming it after some asynchronous operation finishes. Here's their example:
const readable = getReadableStreamSomehow();
readable.on('data', (chunk) => {
console.log(`Received ${chunk.length} bytes of data.`);
readable.pause();
console.log('There will be no additional data for 1 second.');
setTimeout(() => {
console.log('Now data will start flowing again.');
readable.resume();
}, 1000);
});
I ran into the same problem recently. I fixed it by using an array of promises, and waiting for all of them to resolve when .on("end") was triggered.
import parse from "csv-parse";
export const parseCsv = () =>
new Promise((resolve, reject) => {
const promises = [];
fs.createReadStream('/file')
.pipe(parse({ delimiter: ',', columns: true }))
.on("data", row => promises.push(processData(row)))
.on("error", reject)
.on("end", async () => {
await Promise.all(promises);
resolve();
});
});

Make sure promise resolved inside transformFunction

I am studying through2 and sequelize.
My codes:
return Doc.createReadStream({
where: { /*...*/ },
include: [
{
/*...*/
},
],
})
.pipe(through({ objectMode: true }, (doc, enc, cb) => {
Comment.findOne(null, { where: { onId: doc.id } }).then((com) => { /* sequelize: findOne*/
com.destroy(); /* sequelize instance destroy: http://docs.sequelizejs.com/manual/tutorial/instances.html#destroying-deleting-persistent-instances */
cb();
});
}))
.on('finish', () => {
console.log('FINISHED');
})
.on('error', err => console.log('ERR', err));
I am trying to express my question clearly. Doc and Comment are sequelize Models. I want to use stream to read Doc instances from database one by one and delete comments on each Doc instance. Comment.findOne and com.destroy() will both return promises. I want to the promises resolved for each doc and then call cb(). But my above codes cannot work, before com be destroyed, the codes already finish running.
How to fix it? Thanks
I wrap the above piece of codes in mocha test, like
it('should be found by readstream', function _testStream(){
/* wrap the first piece of codes here*/
});
But before stream finished reading, the test exist.
You can wait for another promise by returning the promise and using another .then.
You may need to check for the com result being null as well, before running .destroy().
.pipe(through({ objectMode: true }, (doc, enc, cb) => {
Comment.findOne(null, { where: { onId: doc.id } })
.then(com => com.destroy())
.then(()=> cb())
.catch(cb)
}))
Then when running the test in mocha, you need to wait for the asynchronous stream by adding done to the test function signature and calling done() on completion or error.
it('should be found by readstream', function _testStream(done){
...
.on('finish', () => done())
.on('error', done)
})

Why destroy stream in error?

I see some modules that pipe readable streams in writable streams, and if any error occurr, they use the destroy method:
const readable = fs.createReadStream("file");
const writable = fs.createWriteStream("file2");
readable.pipe(writable);
readable.on("error", (error) => {
readable.destroy();
writable.destroy();
writable.removeListener("close");
callback(error);
});
writable.on("error", (error) => {
readable.destroy();
writable.destroy();
writable.removeListener("close");
callback(error);
});
What is the necessity of destroying the streams and removing the close event on the writable stream? If i don't do that, what could happen?
Thanks.
I believe this is necessary to avoid memory leaks. As per the Node.js documentation on the readable.pipe() method,
One important caveat is that if the Readable stream emits an error during processing, the Writable destination is not closed automatically. If an error occurs, it will be necessary to manually close each stream in order to prevent memory leaks.
In the script below, comment out the line w.destroy(err) and notice none of the Writeable events emit. Not sure why Node.js designers chose to not automatically destroy the Writeable's, maybe they didn't want Stream.pipe() to be too opinionated.
const r = new Readable({
objectMode: true,
read() {
try {
this.push(JSON.parse('{"prop": "I am the data"'))
this.push(null) // make sure we let Writeable's know there's no more to read
} catch (e) {
console.error(`Problem encountered while reading data`, e)
this.destroy(e)
}
}
}).on('error', (err) => {
console.log(`Reader error: ${err}`)
w.destroy(err)
done()
})
const w = new Writable({
objectMode: true,
write(chunk, encoding, callback) {
callback()
}
}).on('error', (err) => {
console.error(`Writer error: ${err}`)
})
.on('close', () => {
console.error(`Writer close`)
})
.on('finish', () => {
console.error(`Writer finish`)
})
r.pipe(w)

No end event when piping inside "open"

I am piping a download into a file, but wanting to make sure the file doesn't already exist. I've put the code up here for an easier exploration: https://tonicdev.com/tolmasky/streaming-piping-on-open-tester <-- this will show you the outputs (code also below inline).
So the thing is, it seems to work fine except for the done (end) event. The file ends up on the hard drive fine, each step is followed correctly (the structure is to ensure no "parallel" steps happen that aren't necessary -- if I do got.stream(url).pipe(fs.createWriteStream({ flags: ... })), then the download will actually get kicked off even if the createWriteStream returns an error because the file is already there -- undesirable for the network).
The code is the following:
var fs = require("fs");
var got = require("got");
await download("https://www.apple.com", "./index.html");
function download(aURL, aDestinationFilePath)
{
return new Promise(function(resolve, reject)
{
fs.createWriteStream(aDestinationFilePath, { flags: "wx" })
.on("open", function()
{
const writeStream = this;
console.log("SUCCESSFULLY OPENED!");
got.stream(aURL)
.on("response", function(aResponse)
{
const contentLength = +aResponse.headers["content-length"] || 0;
console.log(aResponse.headers);
console.log("STARTING DOWNLOAD! " + contentLength);
this.on("data", () => console.log("certainly getting data"))
this.pipe(writeStream)
.on("error", reject)
.on("end", () => console.log("DONE!"))
.on("end", resolve);
})
})
.on("error", function(anError)
{
if (anError.code === "EEXIST") { console.log("oh");
resolve();}
else
reject(anError);
});
});
}
According to the stream docs, readable.pipe returns the destination Writable stream, and the correct event emitted when a Writable is done would be Event: 'finish'.

Resources