How to asynchronously createReadStream in node.js with async/await - node.js

I am having difficulty with using fs.creadReadStream to process my csv file asynchronously:
async function processData(row) {
// perform some asynchronous function
await someAsynchronousFunction();
}
fs.createReadStream('/file')
.pipe(parse({
delimiter: ',',
columns: true
})).on('data', async (row) => {
await processData(row);
}).on('end', () => {
console.log('done processing!')
})
I want to perform some asynchronous function after reading each record one by one before the createReadStream reaches on('end').
However, the on('end') gets hit before all of my data finishes processing. Does anyone know what I might be doing wrong?
Thanks in advance!

.on('data, ...) does not wait for your await. Remember, an async function returns a promise immediately and .on() is not paying any attention to that promise so it just keeps merrily going on.
The await only waits inside the function, it does not stop your function from returning immediately and thus the stream thinks you've process the data and keeps sending more data and generating more data events.
There are several possible approaches here, but the simplest might be to pause the stream until processData() is done and then restart the stream.
Also, does processData() return a promise that is linked to the completion of the async operation? That is also required for await to be able to do its job.
The readable stream doc contains an example of pausing the stream during a data event and then resuming it after some asynchronous operation finishes. Here's their example:
const readable = getReadableStreamSomehow();
readable.on('data', (chunk) => {
console.log(`Received ${chunk.length} bytes of data.`);
readable.pause();
console.log('There will be no additional data for 1 second.');
setTimeout(() => {
console.log('Now data will start flowing again.');
readable.resume();
}, 1000);
});

I ran into the same problem recently. I fixed it by using an array of promises, and waiting for all of them to resolve when .on("end") was triggered.
import parse from "csv-parse";
export const parseCsv = () =>
new Promise((resolve, reject) => {
const promises = [];
fs.createReadStream('/file')
.pipe(parse({ delimiter: ',', columns: true }))
.on("data", row => promises.push(processData(row)))
.on("error", reject)
.on("end", async () => {
await Promise.all(promises);
resolve();
});
});

Related

Why need to new a promise when reading streams?

ok i saw this example of reading a stream and returning a promise using new Promise.
function readStream(stream, encoding = "utf8") {
stream.setEncoding(encoding);
return new Promise((resolve, reject) => {
let data = "";
stream.on("data", chunk => data += chunk);
stream.on("end", () => resolve(data));
stream.on("error", error => reject(error));
});
}
const text = await readStream(process.stdin);
My question is why "new Promise" ? can i do it in the 2nd version like
function readStream(stream, encoding = "utf8") {
stream.setEncoding(encoding);
let data = "";
stream.on("data", chunk => data += chunk);
stream.on("end", () => Promise.resolve(data));
stream.on("error", error => Promise.reject(error));
}
const text = await readStream(process.stdin);
Haven't tried it yet, but basically want to avoid the new keyword.
some updates on the 2nd version, since async functions always return a Promise.
A function/method will return a Promise under the following circumstances:
You explicitly created and returned a Promise from it's body.
You returned a Promise that exists outside the method.
You marked it as async.
const readStream = async (stream, encoding = "utf8") => {
stream.setEncoding(encoding);
let data = "";
stream.on("data", chunk => data += chunk);
stream.on("end", () => Promise.resolve(data));
stream.on("error", error => Promise.reject(error));
}
const text = await readStream(process.stdin);
How's this 3rd version ?
If you want readStream to return a promise, you'll have to ... return a promise for readStream (returning a promise in some callback is not doing that).
What the first code is doing, is promisifying the stream API. And that's exactly how it should be done.
The second version of the code is based on a misunderstanding: it seems to hope that returning a promise in the callback passed to the stream.on method, will somehow make readStream return that promise. But when the on callback is called, readStream has already returned. Since readStream has no return statement, it already returned undefined and not a promise.
As a side note, when the stream API calls the callback you passed to the on method, it does not even look at the returned value -- that is ignored.
The third version is an async function, so it now is guaranteed the function will return a promise. But as the function still does not execute a return statement, that promise is immediately resolved with value undefined. Again, the returned values in the callbacks are unrelated to the promise that the async function has already returned.
new keyword
If you want to avoid the new keyword, then realise that anything that can be done with promises can also be done without them. In the end promises are "only" a convenience.
For instance, you could do:
function readStream(stream, success, failure, encoding="utf8") {
let data = "";
stream.setEncoding(encoding);
stream.on("data", chunk => data += chunk);
stream.on("end", () => success(data));
stream.on("error", failure);
}
function processText(text) {
// ... do something with text
}
function errorHandler(error) {
// ... do something with the error
}
readStream(process.stdin, processText, errorHandler);
In typical Node style you would pass one callback, for both purposes, as last argument:
function readStream(stream, encoding="utf8", callback) {
let data = "";
stream.setEncoding(encoding);
stream.on("data", chunk => data += chunk);
stream.on("end", () => callback?.(null, data));
stream.on("error", err => callback?.(err, null));
}
function processText(err, text) {
if (err) {
// do something with err
return;
}
// ... do something with text
}
readStream(process.stdin, "utf8", processText);
And then you could use the util package to turn that into a promise-returning function:
const util = require('util');
const readStream = util.promisify(function (stream, encoding="utf8", callback) {
let data = "";
stream.setEncoding(encoding);
stream.on("data", chunk => data += chunk);
stream.on("end", () => callback?.(null, data));
stream.on("error", err => callback?.(err, null));
});
(async () => {
try {
const text = await readStream(stream, "utf8");
// do something with text
} catch(err) {
// do something with err
}
})();
Of course, under the hood the promisfy function performs new Promise and we're back to where we started.
You need to construct and return a Promise so that the consumer of the function has something to hook into the asynchronous action being performed. (Another option would be to define the function to also take a callback as an argument.)
If you try to do it the way you're doing with the second snippet, readStream will not return anything, so await readStream(process.stdin); will resolve immediately, and it'll resolve to undefined.
Doing
stream.on("end", () => Promise.resolve(data));
and
stream.on("error", error => Promise.reject(error));
constructs new Promises at that point in the code, but you need the consumer of the function to have access to the Promise that resolves (or rejects) - and so you must have return new Promise at the top level of the function.

Nodejs `fs.createReadStream` as promise

Im trying to get fs.createReadStream working as a promise, so after the entire file has been read, it will be resolved.
In the case bellow, im pausing the stream, executing the awaitable method and resuming.
How to make .on('end'... to be be executed in the end.
if 1. is not possible, why the `.on('wont be fired', maybe i can use it to resolve the promise.
function parseFile<T>(filePath: string, row: (x: T) => void, err: (x) => void, end: (x) => void) {
return new Promise((resolve, reject) => {
const stream = fs.createReadStream(filePath);
stream.on('data', async data => {
try {
stream.pause();
await row(data);
} finally {
stream.resume();
}
})
.on('end', (rowCount: number) => {
resolve();// NOT REALLY THE END row(data) is still being called after this
})
.on('close', () => {
resolve();// NEVER BEING CALLED
})
.on('error', (rowCount: number) => {
reject();// NEVER GETS HERE, AS EXPECTED
})
})
}
UPDATE
Here you can actually test it: https://stackblitz.com/edit/node-czktjh?file=index.js
run node index.js
The output should be 1000 and not 1
Thanks
Something to be aware of. You've removed the line processing from the current version of the question so the stream is being read in large chunks. It appears to be reading the entire file in just two chunks, thus just two data events so the expected count here is 2, not 1000.
I think the problem with this code occurs because stream.pause() does not pause the generation of the end event - it only pauses future data events. If the last data event has been fired and you then await inside the processing of that data event (which causes your data event handler to immediately return a promise, the stream will think it's done and the end event will still fire before you're done awaiting the function inside the processing of that last data event. Remember, the data event handler is NOT promise-aware. And, it appears that stream.pause() only affects data events, not the end event.
I can imagine a work-around with a flag that keeps track of whether you're still processing a data event and postpones processing the end event until you're done with that last data event. I will add code for that in a second that illustrates how to use the flag.
FYI, the missing close event is another stream weirdness. Your nodejs program actually terminates before the close event gets to fire. If you put this at the start of your program:
setTimeout(() => { console.log('done with timer');}, 5000);
Then, you will see the close event because the timer will prevent your nodejs program from exiting before the close event gets to fire. I'm not suggesting this as a solution to any problem, just to illustrate that the close event is still there and wants to fire if your program doesn't exit before it gets a chance.
Here's code that demonstrated the use of flags to work-around the pause issue. When you run this code, you will only see 2 data events, not 1000 because this code is not reading lines, it's reading much larger chunks that that. So, the expected result of this is not 1000.
// run `node index.js` in the terminal
const fs = require('fs');
const parseFile = row => {
let paused = true;
let ended = false;
let dataCntr = 0;
return new Promise((resolve, reject) => {
const stream = fs.createReadStream('./generated.data.csv');
stream
.on('data', async data => {
++dataCntr;
try {
stream.pause();
paused = true;
await row(data);
} finally {
paused = false;
stream.resume();
if (ended) {
console.log(`received ${dataCntr} data events`);
resolve();
}
}
})
.on('end', rowCount => {
ended = true;
if (!paused) {
console.log(`received ${dataCntr} data events`);
resolve();
}
})
.on('close', () => {
//resolve();
})
.on('error', rowCount => {
reject();
});
});
};
(async () => {
let count = 0;
await parseFile(async row => {
await new Promise(resolve => setTimeout(resolve, 50)); //sleep
count++;
});
console.log(`lines executed: ${count}, the expected is more than 1`);
})();
FYI, I still think your original version of the question had the problem I mentioned in my first comment - that you weren't pausing the right stream. What is documented here is yet another problem (where you can get end before your await in the last data event is done).

Using async/await with util.promisify(fs.readFile)?

I'm trying to learn async/await and your feedback would help a lot.
I'm simply using fs.readFile() as a specific example of functions that has not been modernized with Promises and async/await.
(I'm aware of fs.readFileSync() but I want to learn the concepts.)
Is the pattern below an ok pattern? Are there any issues with it?
const fs = require('fs');
const util = require('util');
//promisify converts fs.readFile to a Promised version
const readFilePr = util.promisify(fs.readFile); //returns a Promise which can then be used in async await
async function getFileAsync(filename) {
try {
const contents = await readFilePr(filename, 'utf-8'); //put the resolved results of readFilePr into contents
console.log('✔️ ', filename, 'is successfully read: ', contents);
}
catch (err){ //if readFilePr returns errors, we catch it here
console.error('⛔ We could not read', filename)
console.error('⛔ This is the error: ', err);
}
}
getFileAsync('abc.txt');
import from fs/promises instead, like this:
const { readFile } = require('fs/promises')
This version returns the promise you are wanting to use and then you don't need to wrap readFile in a promise manually.
Here is some more ways on using async/await
EDITED: as #jfriend00 pointed in comments, of course you have to use standard NodeJS features with built in methods like fs.readFile. So I changed fs method in the code below to something custom, where you can define your own promise.
// Create your async function manually
const asyncFn = data => {
// Instead of result, return promise
return new Promise((resolve, reject) => {
// Here we have two methods: resolve and reject.
// To end promise with success, use resolve
// or reject in opposite
//
// Here we do some task that can take time.
// For example purpose we will emulate it with
// setTimeout delay of 3 sec.
setTimeout(() => {
// After some processing time we done
// and can resolve promise
resolve(`Task completed! Result is ${data * data}`);
}, 3000);
});
}
// Create function from which we will
// call our asyncFn in chain way
const myFunct = () => {
console.log(`myFunct: started...`);
// We will call rf with chain methods
asyncFn(2)
// chain error handler
.catch(error => console.log(error))
// chain result handler
.then(data => console.log(`myFunct: log from chain call: ${data}`));
// Chain call will continue execution
// here without pause
console.log(`myFunct: Continue process while chain task still working.`);
}
// Create ASYNC function to use it
// with await
const myFunct2 = async () => {
console.log(`myFunct2: started...`);
// Read file and wait for result
const data = await asyncFn(3);
// Use your result inline after promise resolved
console.log(`myFunct2: log from async call: ${data}`);
console.log(`myFunct2: continue process after async task completed.`);
}
// Run myFunct
myFunct();
myFunct2();

Actionhero actions returning immediately

I'm trying to understand a core concept of ActionHero async/await and hitting lots of walls. Essentially, in an action, why does this return immediately, rather than 500ms later?
async run (data) {
setTimeout(() => data.response.outcome = 'success',500)
}
Clarifying edit: this question was more about async execution flow and promise fulfillment than about the literal use of setTimeout(). Its not really specific to ActionHero but that's the pattern AH uses and was my first exposure to the concepts. The answer provided clarifies that some functions have to be wrapped in a promise so they can be await-ed and that there are multiple ways to do that.
Because you didn't wait for it to finish.
basically you need to await the setTimeout.
async run (data) {
await setTimeout(() => data.response.outcome = 'success',500)
}
but that doesn't work because setTimeout is not a promise
You can use a simple sleep function that resolves after a time.
async function sleep (time) {
return new Promise(resolve => setTimeout(resolve, time));
}
async function run (data) {
await sleep(500);
data.response.outcome = 'success';
}
Just like setTimeout, which is a callback api can be made into a promise, streams can be made into promises. Note in both the sleep and readFile examples I'm only using the async keyword to make things clear
async readFile (file) {
return new Promise((resolve, reject) => {
let data = '';
fs.createReadStream(file)
.on('data', d => data += d.toString())
.on('error', reject)
.on('end', () => resolve(data));
});
}
For most functions you can skip the manual promisification and use util.promisify
const { readFile } = require('fs');
const { promisify } = require('util');
const readFileAsync = promisify(readFile);
The key part is that the promises should resolve after the work is done, and that you should wait for it using either await or .then
So for instance to make things clearer the first example
async function run (data) {
return sleep(500).then(() => data.response.outcome = 'success';);
}
and even
function run (data) {
return sleep(500).then(() => data.response.outcome = 'success';);
}
are all the same at runtime
So to finish
async function transform (inputFile, targetWidth, targetHeight) {
return new Promise((resolve, reject) => {
let transformer = sharp()
.resize(targetWidth, targetHeight)
.crop(sharp.strategy.entropy)
.on('info', { width, height } => console.log(`Image created with dimensions ${height}x${width}`)
.on('error', reject)
.on('end', resolve);
inputFile.pipe(transformer);
});
}

FileStream Promise resolving early

I am experiencing a rather weird problem in nodeJS, and I cannot quite figure out why.
Consider the following code:
(async () => {
console.log ("1");
await new Promise ((resolve, reject) => {
setTimeout (() => {
console.log ("2");
resolve ();
}, 1000);
});
console.log ("3");
process.exit ();
})();
This code does exactly what it is supposed to do. It prints 123, in that order. After the print of 1, it waits approximately one second. Perfect. Now let's see the following example:
const fs = require ("fs");
(async () => {
const stream = fs.createWriteStream ("file.txt");
stream.write ("Test");
console.log ("1");
await new Promise ((resolve, reject) => {
stream.on ("finish", () => {
console.log ("2");
resolve ();
});
});
console.log ("3");
process.exit ();
})();
From my understanding, this code should either complete, or - in case the finish event never gets fired - run infinitely. What happens is the exact opposite: It prints 1, then quits. Shouldn't it at least print another 3 before quitting, since this is the end of the script?
Important: I know that the promise will not resolve, because .end() is not called on the stream. I want to know why the script finishes anyway.
Can anyone explain this behaviour to me?
The best explanation is probably to write this without the async/await keywords and for you to undertstand these don't do anything "magical" and are simply "sugar" for a different way to resolve a Promise as opposed to .then().
const fs = require ("mz/fs");
const stream = fs.createWriteStream("file.txt");
stream.write("Test");
console.log("1");
new Promise ((resolve, reject) => {
stream.on ("finish", () => {
console.log("2");
resolve();
});
}).then(() => {
console.log("2");
process.exit();
});
That's the exact same thing right! So where's the catch.
The thing you are really missing is there is "nothing" that says when you open a file handle it "must" be explicitly closed before the program can exit. As such, there is "nothing to wait for" and the program completes but does not "branch" into the part that is still awaiting the Promise to resolve().
The reason why it only logs "1" is because the remaining branch "is" waiting for the Promise to resolve, but it's just never going to get there before the program finishes.
Of course that all changes when you actually call stream.end() immediately after the write or ideally by "awaiting" any write requests that may be pending:
const fs = require ("mz/fs");
(async () => {
const stream = fs.createWriteStream ("file.txt");
await stream.write ("Test"); // await here before continuing
stream.end()
console.log ("1");
await new Promise ((resolve, reject) => {
stream.on ("finish", () => {
console.log ("2");
//resolve ();
});
});
console.log ("3");
//process.exit ();
})();
That of course will log each output in the listing, as you should well know.
So If you were expecting to see the "3" in the log, the reason why it does not is because of the await where we don't ever close the stream. Again probably best demonstrated by getting rid of the await:
const fs = require ("mz/fs");
(async () => {
const stream = fs.createWriteStream ("file.txt");
await stream.write ("Test");
stream.end()
console.log ("1");
new Promise ((resolve, reject) => { // remove await - execution hoisted
stream.on ("finish", () => {
console.log ("2");
//resolve ();
});
});
console.log ("3");
//process.exit ();
})();
Then you "should" see:
1
3
2
At least on most systems unless you have an "extreme" lag. But generally the "finish" should get fired before the next line was reached after "awaiting" the write.
NOTE: Just using the mz library here for demonstration of an an await on the write() method without wrapping a callback. Generally speaking the callback execution should resolve just the same.

Resources