Nodejs `fs.createReadStream` as promise - node.js

Im trying to get fs.createReadStream working as a promise, so after the entire file has been read, it will be resolved.
In the case bellow, im pausing the stream, executing the awaitable method and resuming.
How to make .on('end'... to be be executed in the end.
if 1. is not possible, why the `.on('wont be fired', maybe i can use it to resolve the promise.
function parseFile<T>(filePath: string, row: (x: T) => void, err: (x) => void, end: (x) => void) {
return new Promise((resolve, reject) => {
const stream = fs.createReadStream(filePath);
stream.on('data', async data => {
try {
stream.pause();
await row(data);
} finally {
stream.resume();
}
})
.on('end', (rowCount: number) => {
resolve();// NOT REALLY THE END row(data) is still being called after this
})
.on('close', () => {
resolve();// NEVER BEING CALLED
})
.on('error', (rowCount: number) => {
reject();// NEVER GETS HERE, AS EXPECTED
})
})
}
UPDATE
Here you can actually test it: https://stackblitz.com/edit/node-czktjh?file=index.js
run node index.js
The output should be 1000 and not 1
Thanks

Something to be aware of. You've removed the line processing from the current version of the question so the stream is being read in large chunks. It appears to be reading the entire file in just two chunks, thus just two data events so the expected count here is 2, not 1000.
I think the problem with this code occurs because stream.pause() does not pause the generation of the end event - it only pauses future data events. If the last data event has been fired and you then await inside the processing of that data event (which causes your data event handler to immediately return a promise, the stream will think it's done and the end event will still fire before you're done awaiting the function inside the processing of that last data event. Remember, the data event handler is NOT promise-aware. And, it appears that stream.pause() only affects data events, not the end event.
I can imagine a work-around with a flag that keeps track of whether you're still processing a data event and postpones processing the end event until you're done with that last data event. I will add code for that in a second that illustrates how to use the flag.
FYI, the missing close event is another stream weirdness. Your nodejs program actually terminates before the close event gets to fire. If you put this at the start of your program:
setTimeout(() => { console.log('done with timer');}, 5000);
Then, you will see the close event because the timer will prevent your nodejs program from exiting before the close event gets to fire. I'm not suggesting this as a solution to any problem, just to illustrate that the close event is still there and wants to fire if your program doesn't exit before it gets a chance.
Here's code that demonstrated the use of flags to work-around the pause issue. When you run this code, you will only see 2 data events, not 1000 because this code is not reading lines, it's reading much larger chunks that that. So, the expected result of this is not 1000.
// run `node index.js` in the terminal
const fs = require('fs');
const parseFile = row => {
let paused = true;
let ended = false;
let dataCntr = 0;
return new Promise((resolve, reject) => {
const stream = fs.createReadStream('./generated.data.csv');
stream
.on('data', async data => {
++dataCntr;
try {
stream.pause();
paused = true;
await row(data);
} finally {
paused = false;
stream.resume();
if (ended) {
console.log(`received ${dataCntr} data events`);
resolve();
}
}
})
.on('end', rowCount => {
ended = true;
if (!paused) {
console.log(`received ${dataCntr} data events`);
resolve();
}
})
.on('close', () => {
//resolve();
})
.on('error', rowCount => {
reject();
});
});
};
(async () => {
let count = 0;
await parseFile(async row => {
await new Promise(resolve => setTimeout(resolve, 50)); //sleep
count++;
});
console.log(`lines executed: ${count}, the expected is more than 1`);
})();
FYI, I still think your original version of the question had the problem I mentioned in my first comment - that you weren't pausing the right stream. What is documented here is yet another problem (where you can get end before your await in the last data event is done).

Related

Nodejs TCP server process packets in order

I am using a tcp server to recieve and process packets in Node.js. It should recieve 2 packets:
"create" for creating an object in a database. It first checks if the object already exists and then creates it. (-> takes some time process)
"update" for updating the newly created object in the database
For the sake of simplicity, we'll just assume the first step always takes longer than the second. (which is always true in my original code)
This is a MWE:
const net = require("net");
const server = net.createServer((conn) => {
conn.on('data', async (data) => {
console.log(`Instruction ${data} recieved`);
await sleep(1000);
console.log(`Instruction ${data} done`);
});
});
server.listen(1234);
const client = net.createConnection(1234, 'localhost', async () => {
client.write("create");
await sleep(10); // just a cheap workaround to "force" sending 2 packets instead of one
client.write("update");
});
// Just to make it easier to read
function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
If i run this code i get:
Instruction create recieved
Instruction update recieved
Instruction create done
Instruction update done
But i want the "create" instruction to block the conn.on('data', func) until the last callback returns asynchronously. The current code tries to update an entry before it is created in the database which is not ideal.
Is there an (elegant) way to achieve this? I suspect some kind of buffer which stores the data and a worker loop of some kind which processes the data? But how do i avoid running an infinite loop which blocks the event loop? (Event loop is the correct term, is it?)
Note: I have a lot more logic to handle fragmentation, etc. But this explains the issue i'm having.
I managed to get it to work with the package async-fifo-queue.
It's not the cleanest solution but it should do what i want and as efficient as possible (using async/await instead of just looping infinitely).
Code:
const net = require("net");
const afq = require("async-fifo-queue");
const q = new afq.Queue();
const server = net.createServer((conn) => {
conn.on('data', q.put.bind(q));
});
server.listen(1234);
const client = net.createConnection(1234, 'localhost', async () => {
client.write("create");
await sleep(10);
client.write("update");
});
(async () => {
while(server.listening) {
const data = await q.get();
console.log(`Instruction ${data} recieved`);
await sleep(1000);
console.log(`Instruction ${data} done`);
}
})();
function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
You can pause the socket when you get the "create" event. After it finishes, you can resume the socket. Example:
const server = net.createServer((conn) => {
conn.on('data', async (data) => {
if (data === 'create') {
conn.pause()
}
console.log(`Instruction ${data} recieved`);
await sleep(1000);
console.log(`Instruction ${data} done`);
if (data === 'create') {
conn.resume()
}
});
});
server.listen(1234);
const client = net.createConnection(1234, 'localhost', async () => {
client.write("create");
await sleep(10); // just a cheap workaround to "force" sending 2 packets instead of one
client.write("update");
});
// Just to make it easier to read
function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}

NodeJS: Reliably act upon a closed stream

I'm looking for a way to reliably know for sure whether all the data in the stream has been processed. An asynchronous data listener might be called after the end event, in which case I cannot use the end event to, for instance, close a database connection when the data event is still executing database queries.
Example:
const fs = require('fs')
const stream = fs.createReadStream('./big.file', { encoding: 'utf8' });
stream
.on('data', () => {
stream.pause();
setTimeout(() => {
console.log('data');
stream.resume();
}, 10);
})
.on('close', function() {
console.log('end');
});
This will log "data" a lot of times, then "end", and then "data" one more time.
So in a real-world example, if "data" is doing queries, and "end" would close the connection, the last query would throw an error because the database connection was closed prematurely.
I've noticed a closed property on the stream, and of course there is the isPaused() methods, and I can use those to fix my problem kind of:
stream
.on('data', () => {
stream.pause();
databaseQuery().then(result => {
stream.resume();
if (stream.closed) {
closeConnection();
}
});
})
.on('close', function() {
if (!stream.isPaused()) {
closeConnection();
}
});
I'm unsure however if this is the best way to go about this.
Can I be sure the connection will be closed at all?
Edit: I'm seeing similar results for the "end" event, it doesn't matter whether I use "end" or "close", the test logs are identical.

async await with setInterval

function first(){
console.log('first')
}
function second(){
console.log('second')
}
let interval = async ()=>{
await setInterval(first,2000)
await setInterval(second,2000)
}
interval();
Imagine that I have this code above.
When I run it, first() and second() will be called at the same time; how do I call second() after first)() returns some data, for example, if first() is done, only then call second()?
Because first() in my code will be working with a big amount of data and if this 2 functions will be calling at the same time, it will be hard for the server.
How do I call second() each time when first() will return some data?
As mentioned above setInterval does not play well with promises if you do not stop it. In case you clear the interval you can use it like:
async function waitUntil(condition) {
return await new Promise(resolve => {
const interval = setInterval(() => {
if (condition) {
resolve('foo');
clearInterval(interval);
};
}, 1000);
});
}
Later you can use it like
const bar = waitUntil(someConditionHere)
You have a few problems:
Promises may only ever resolve once, setInterval() is meant to call the callback multiple times, Promises do not support this case well.
Neither setInterval(), nor the more appropriate setTimeout() return Promises, therefore, awaiting on them is pointless in this context.
You're looking for a function that returns a Promise which resolves after some times (using setTimeout(), probably, not setInterval()).
Luckily, creating such a function is rather trivial:
async function delay(ms) {
// return await for better async stack trace support in case of errors.
return await new Promise(resolve => setTimeout(resolve, ms));
}
With this new delay function, you can implement your desired flow:
function first(){
console.log('first')
}
function second(){
console.log('second')
}
let run = async ()=>{
await delay(2000);
first();
await delay(2000)
second();
}
run();
setInterval doesn't play well with promises because it triggers a callback multiple times, while promise resolves once.
It seems that it's setTimeout that fits the case. It should be promisified in order to be used with async..await:
async () => {
await new Promise(resolve => setTimeout(() => resolve(first()), 2000));
await new Promise(resolve => setTimeout(() => resolve(second()), 2000));
}
await expression causes async to pause until a Promise is settled
so you can directly get the promise's result without await
for me, I want to initiate Http request every 1s
let intervalid
async function testFunction() {
intervalid = setInterval(() => {
// I use axios like: axios.get('/user?ID=12345').then
new Promise(function(resolve, reject){
resolve('something')
}).then(res => {
if (condition) {
// do something
} else {
clearInterval(intervalid)
}
})
}, 1000)
}
// you can use this function like
testFunction()
// or stop the setInterval in any place by
clearInterval(intervalid)
You could use an IFFE. This way you could escape the issue of myInterval not accepting Promise as a return type.
There are cases where you need setInterval, because you want to call some function unknown amount of times with some interval in between.
When I faced this problem this turned out to be the most straight-forward solution for me. I hope it help someone :)
For me the use case was that I wanted to send logs to CloudWatch but try not to face the Throttle exception for sending more than 5 logs per second. So I needed to keep my logs and send them as a batch in an interval of 1 second. The solution I'm posting here is what I ended up using.
async function myAsyncFunc(): Promise<string> {
return new Promise<string>((resolve) => {
resolve("hello world");
});
}
function myInterval(): void {
setInterval(() => {
void (async () => {
await myAsyncFunc();
})();
}, 5_000);
}
// then call like so
myInterval();
Looked through all the answers but still didn't find the correct one that would work exactly how the OP is asked. This is what I used for the same purpose:
async function waitInterval(callback, ms) {
return new Promise(resolve => {
let iteration = 0;
const interval = setInterval(async () => {
if (await callback(iteration, interval)) {
resolve();
clearInterval(interval);
}
iteration++;
}, ms);
});
}
function first(i) {
console.log(`first: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
function second(i) {
console.log(`second: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
(async () => {
console.log('start');
await waitInterval(first, 1000);
await waitInterval(second, 1000);
console.log('finish');
})()
In my example, I also put interval iteration count and the timer itself, just in case the caller would need to do something with it. However, it's not necessary
In my case, I needed to iterate through a list of images, pausing in between each, and then a longer pause at the end before re-looping through.
I accomplished this by combining several techniques from above, calling my function recursively and awaiting a timeout.
If at any point another trigger changes my animationPaused:boolean, my recursive function will exit.
const loopThroughImages = async() => {
for (let i=0; i<numberOfImages; i++){
if (animationPaused) {
return;
}
this.updateImage(i);
await timeout(700);
}
await timeout(1000);
loopThroughImages();
}
loopThroughImages();
Async/await do not make the promises synchronous.
To my knowledge, it's just a different syntax for return Promise and .then().
Here i rewrote the async function and left both versions, so you can see what it really does and compare.
It's in fact a cascade of Promises.
// by the way no need for async there. the callback does not return a promise, so no need for await.
function waitInterval(callback, ms) {
return new Promise(resolve => {
let iteration = 0;
const interval = setInterval(async () => {
if (callback(iteration, interval)) {
resolve();
clearInterval(interval);
}
iteration++;
}, ms);
});
}
function first(i) {
console.log(`first: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
function second(i) {
console.log(`second: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
// async function with async/await, this code ...
(async () => {
console.log('start');
await waitInterval(first, 1000);
await waitInterval(second, 1000);
console.log('finish');
})() //... returns a pending Promise and ...
console.log('i do not wait');
// ... is kinda identical to this code.
// still asynchronous but return Promise statements with then cascade.
(() => {
console.log('start again');
return waitInterval(first, 1000).then(() => {
return waitInterval(second, 1000).then(() => {
console.log('finish again');
});
});
})(); // returns a pending Promise...
console.log('i do not wait either');
You can see the two async functions both execute at the same time.
So using promises around intervals here is not very useful, as it's still just intervals, and promises changes nothing, and make things confusing...
As the code is calling callbacks repeatedly into an interval, this is, i think, a cleaner way:
function first(i) {
console.log(`first: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
function second(i) {
console.log(`second: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
function executeThroughTime(...callbacks){
console.log('start');
let callbackIndex = 0; // to track current callback.
let timerIndex = 0; // index given to callbacks
let interval = setInterval(() =>{
if (callbacks[callbackIndex](timerIndex++)){ // callback return true when it finishes.
timerIndex = 0; // resets for next callback
if (++callbackIndex>=callbacks.length){ // if no next callback finish.
clearInterval(interval);
console.log('finish');
}
}
},1000)
}
executeThroughTime(first,second);
console.log('and i still do not wait ;)');
Also, this solution execute a callback every secondes.
if the callbacks are async requests that takes more than one sec to resolve, and i can't afford for them to overlap, then, instead of doing iterative call with repetitive interval, i would get the request resolution to call the next request (through a timer if i don't want to harass the server).
Here the "recursive" task is called lTask, does pretty much the same as before, except that, as i do not have an interval anymore, i need a new timer each iteration.
// slow internet request simulation. with a Promise, could be a callback.
function simulateAsync1(i) {
console.log(`first pending: ${i}`);
return new Promise((resolve) =>{
setTimeout(() => resolve('got that first big data'), Math.floor(Math.random()*1000)+ 1000);//simulate request that last between 1 and 2 sec.
}).then((result) =>{
console.log(`first solved: ${i} ->`, result);
return i==2;
});
}
// slow internet request simulation. with a Promise, could be a callback.
function simulateAsync2(i) {
console.log(`second pending: ${i}`);
return new Promise((resolve) =>{
setTimeout(() => resolve('got that second big data'), Math.floor(Math.random()*1000) + 1000);//simulate request that last between 1 and 2 sec.
}).then((result) =>{ // promise is resolved
console.log(`second solved: ${i} ->`,result);
return i==4; // return a promise
});
}
function executeThroughTime(...asyncCallbacks){
console.log('start');
let callbackIndex = 0;
let timerIndex = 0;
let lPreviousTime = Date.now();
let lTask = () => { // timeout callback.
asyncCallbacks[callbackIndex](timerIndex++).then((result) => { // the setTimeout for the next task is set when the promise is solved.
console.log('result',result)
if (result) { // current callback is done.
timerIndex = 0;
if (++callbackIndex>=asyncCallbacks.length){//are all callbacks done ?
console.log('finish');
return;// its over
}
}
console.log('time elapsed since previous call',Date.now() - lPreviousTime);
lPreviousTime = Date.now();
//console.log('"wait" 1 sec (but not realy)');
setTimeout(lTask,1000);//redo task after 1 sec.
//console.log('i do not wait');
});
}
lTask();// no need to set a timer for first call.
}
executeThroughTime(simulateAsync1,simulateAsync2);
console.log('i do not wait');
Next step would be to empty a fifo with the interval, and fill it with web request promises...

How to asynchronously createReadStream in node.js with async/await

I am having difficulty with using fs.creadReadStream to process my csv file asynchronously:
async function processData(row) {
// perform some asynchronous function
await someAsynchronousFunction();
}
fs.createReadStream('/file')
.pipe(parse({
delimiter: ',',
columns: true
})).on('data', async (row) => {
await processData(row);
}).on('end', () => {
console.log('done processing!')
})
I want to perform some asynchronous function after reading each record one by one before the createReadStream reaches on('end').
However, the on('end') gets hit before all of my data finishes processing. Does anyone know what I might be doing wrong?
Thanks in advance!
.on('data, ...) does not wait for your await. Remember, an async function returns a promise immediately and .on() is not paying any attention to that promise so it just keeps merrily going on.
The await only waits inside the function, it does not stop your function from returning immediately and thus the stream thinks you've process the data and keeps sending more data and generating more data events.
There are several possible approaches here, but the simplest might be to pause the stream until processData() is done and then restart the stream.
Also, does processData() return a promise that is linked to the completion of the async operation? That is also required for await to be able to do its job.
The readable stream doc contains an example of pausing the stream during a data event and then resuming it after some asynchronous operation finishes. Here's their example:
const readable = getReadableStreamSomehow();
readable.on('data', (chunk) => {
console.log(`Received ${chunk.length} bytes of data.`);
readable.pause();
console.log('There will be no additional data for 1 second.');
setTimeout(() => {
console.log('Now data will start flowing again.');
readable.resume();
}, 1000);
});
I ran into the same problem recently. I fixed it by using an array of promises, and waiting for all of them to resolve when .on("end") was triggered.
import parse from "csv-parse";
export const parseCsv = () =>
new Promise((resolve, reject) => {
const promises = [];
fs.createReadStream('/file')
.pipe(parse({ delimiter: ',', columns: true }))
.on("data", row => promises.push(processData(row)))
.on("error", reject)
.on("end", async () => {
await Promise.all(promises);
resolve();
});
});

FileStream Promise resolving early

I am experiencing a rather weird problem in nodeJS, and I cannot quite figure out why.
Consider the following code:
(async () => {
console.log ("1");
await new Promise ((resolve, reject) => {
setTimeout (() => {
console.log ("2");
resolve ();
}, 1000);
});
console.log ("3");
process.exit ();
})();
This code does exactly what it is supposed to do. It prints 123, in that order. After the print of 1, it waits approximately one second. Perfect. Now let's see the following example:
const fs = require ("fs");
(async () => {
const stream = fs.createWriteStream ("file.txt");
stream.write ("Test");
console.log ("1");
await new Promise ((resolve, reject) => {
stream.on ("finish", () => {
console.log ("2");
resolve ();
});
});
console.log ("3");
process.exit ();
})();
From my understanding, this code should either complete, or - in case the finish event never gets fired - run infinitely. What happens is the exact opposite: It prints 1, then quits. Shouldn't it at least print another 3 before quitting, since this is the end of the script?
Important: I know that the promise will not resolve, because .end() is not called on the stream. I want to know why the script finishes anyway.
Can anyone explain this behaviour to me?
The best explanation is probably to write this without the async/await keywords and for you to undertstand these don't do anything "magical" and are simply "sugar" for a different way to resolve a Promise as opposed to .then().
const fs = require ("mz/fs");
const stream = fs.createWriteStream("file.txt");
stream.write("Test");
console.log("1");
new Promise ((resolve, reject) => {
stream.on ("finish", () => {
console.log("2");
resolve();
});
}).then(() => {
console.log("2");
process.exit();
});
That's the exact same thing right! So where's the catch.
The thing you are really missing is there is "nothing" that says when you open a file handle it "must" be explicitly closed before the program can exit. As such, there is "nothing to wait for" and the program completes but does not "branch" into the part that is still awaiting the Promise to resolve().
The reason why it only logs "1" is because the remaining branch "is" waiting for the Promise to resolve, but it's just never going to get there before the program finishes.
Of course that all changes when you actually call stream.end() immediately after the write or ideally by "awaiting" any write requests that may be pending:
const fs = require ("mz/fs");
(async () => {
const stream = fs.createWriteStream ("file.txt");
await stream.write ("Test"); // await here before continuing
stream.end()
console.log ("1");
await new Promise ((resolve, reject) => {
stream.on ("finish", () => {
console.log ("2");
//resolve ();
});
});
console.log ("3");
//process.exit ();
})();
That of course will log each output in the listing, as you should well know.
So If you were expecting to see the "3" in the log, the reason why it does not is because of the await where we don't ever close the stream. Again probably best demonstrated by getting rid of the await:
const fs = require ("mz/fs");
(async () => {
const stream = fs.createWriteStream ("file.txt");
await stream.write ("Test");
stream.end()
console.log ("1");
new Promise ((resolve, reject) => { // remove await - execution hoisted
stream.on ("finish", () => {
console.log ("2");
//resolve ();
});
});
console.log ("3");
//process.exit ();
})();
Then you "should" see:
1
3
2
At least on most systems unless you have an "extreme" lag. But generally the "finish" should get fired before the next line was reached after "awaiting" the write.
NOTE: Just using the mz library here for demonstration of an an await on the write() method without wrapping a callback. Generally speaking the callback execution should resolve just the same.

Resources