Node pipe to stdout -- how do I tell if drained? - node.js

The standard advice on determine whether you need to wait for drain event on process.stdout is to check whether it returns false when you write to it.
How should I check if I've piped another stream to it? It would seem that that stream can emit finish before all the output is actually written. Can I do something like?
upstreamOfStdout.on('finish', function(){
if(!process.stdout.write('')) {
process.stdout.on('drain', function() { done("I'm done"); });
}
else {
done("I'm done");
}
});
upstreamOfStdout.pipe(process.stdout);
I prefer an answer that doesn't depend on the internals of any streams. Just given that the streams conform to the node stream interface, what is the canonical way to do this?
EDIT:
The larger context is a wrapper:
new Promise(function(resolve, reject){
stream.on(<some-event>, resolve);
... (perhaps something else here?)
});
where stream can be process.stdout or something else, which has another through stream piped into it.
My program exits whenever resolve is called -- I presume the Promise code keeps the program alive until all promises have been resolved.
I have encountered this situation several times, and have always used hacks to solve the problem (e.g. there are several private members in process.stdout that are useful.) But I really would like to solve this once and for all (or learn that it is a bug, so I can track the issue and fix my hacks when its resolved, at least): how do I tell when a stream downstream of another is finished processing its input?

Instead of writing directly to process.stdout, create a custom writable (shown below) which writes to stdout as a side effect.
const { Writable } = require('stream');
function writeStdoutAndFinish(){
return new Writable({
write(chunk, encoding, callback) {
process.stdout.write(chunk,callback);
},
});
};
The result of writeStdoutAndFinish() will emit a finish event.
async function main(){
...
await new Promise((resolve)=>{
someReadableStream.pipe(writeStdoutAndFinish()).on('finish',()=>{
console.log('finish received');
resolve();
})
});
...
}
In practice, I don't that the above approach differs in behavior from
async function main(){
...
await new Promise((resolve)=>{
(someReadableStream.on('end',()=>{
console.log('end received');
resolve();
})).pipe(process.stdout)
});
...
}

First of all, as far as I can see from the documentation, that stream never emits the finish event, so unlikely you can rely on that.
Moreover, from the documentation above mentioned, the drain event seems to be used to notify the user about when the stream is ready to accept more data once the .write method returned false. In any case you can deduce that that means that all the other data have been written. From the documentation for the write method indeed we deduce that the false value (aka please stop pushing data) is not mandatory and you can freely ignore it, but subsequent data will be probably stuffed in memory letting the use of it to grow up.
Because of that, basing my assumption on the sole documentation, I guess you can rely on the drain event to know when all the data have been nicely handled or are likely to be flushed out.
That said, it looks to me also that there is not a clear way to definitely know when all the data have been effectively sent to the console.
Finally, you can listen the end event of the piped stream to know when it has been fully consumed, no matter if it has been written to the console or the data are still buffered within the console stream.
Of course, you can also freely ignore the problem, for a fully consumed stream should be nicely handled by node.js, thus discarded and you have not to deal with it anymore once you have piped it to the second stream.

Related

Difference between response.write vs stream.pipe(response) in NodeJS

As I understand "response.write" gives more control over the chunk of data I am writing to, while pipe doesn't have any control over the chunks.
I am trying to stream files and I don't need any control on the chunk of data, so is it recommended to go with stream.pipe(response) ? is there any advantage such as performance over response.write?
downloadStream = readBucket.openDownloadStream(trackID)
downloadStream.on('data', chunk => {
console.log('chunk');
res.write(chunk);
});
downloadStream.on('error', error => {
console.log('error occured', error)
res.sendStatus(500);
});
downloadStream.on('end', () => {
res.end();
});
For my scenario, both codes do the same. I prefer pipe because of less code. Is there any performance benefits, memory/io efficiency advantages with pipe() over response.write?
downloadStream= readBucket.openDownloadStream(trackID)
downloadStream.pipe(res);
.pipe() is just a ready made way to send a readstream to a writestream. You can certainly code it manually if you want, but .pipe() handle a number of things for you.
I'd suggest it's kind of like fs.readFile(). If what you want to do is read a whole file into memory, fs.readFile() does the work of opening the file for reading, reading all the data into a buffer, closing the target file and giving you all the data at the end. If there are any errors, it makes sure the file you were reading gets closed.
The same is true of .pipe(). It hooks up to the data, finish and error events for you and just handles all those, while streaming the data out to our write stream. Depending on the type of writestream, it also takes care of "finishing" or "closing" both the readstream and the writestream, even if there are errors.
And, .pipe() has backflow handling, something your code does not. When you call res.write() it returns a boolean. If that boolean is true, then the write buffer is full and you should not be calling res.write() again until the drain event occurs. Note, your code does not do that. So, .pipe() is more complete than what many people will typically write themselves.
The only situations I've seen where you're generally doing a pipe-like operation, but you can't use .pipe() is when you have very custom behavior during error conditions and you want to do something significantly differently than the default error handling. For just streaming the data and finishing both input and output streams, terminating both on error, it does exactly what you want so there's really no reason to code it yourself when the desired behavior is already built-in.
For my scenario, both codes do the same. I prefer pipe because of less code.
Same here.
Is there any performance benefits, memory/io efficiency advantages with pipe() over response.write?
Yes, sort of. It probably has fewer bugs than the code you write yourself (like forgetting backflow detection in your example that might only show up in some circumstances, large data, slow connection).

Does write() (without callback) preserve order in node.js write streams?

I have a node.js program in which I use a stream to write information to a SFTP server. Something like this (simplified version):
var conn = new SSHClient();
process.nextTick(function (){
conn.on('ready', function () {
conn.sftp(function (error, sftp) {
var writeStream = sftp.createWriteStream(filename);
...
writeStream.write(line1);
writeStream.write(line2);
writeStream.write(line3);
...
});
}).connect(...);
});
Note I'm not using the (optional) callback argument (described in the write() API specification) and I'm not sure if this may cause undesired behaviour (i.e. lines not writen in the following order: line1, line2, line3). In other words, I don't know if this alternative (more complex code and not sure if less efficient) should be used:
writeStream.write(line1, ..., function() {
writeStream.write(line2, ..., function() {
writeStream.write(line3);
});
});
(or equivalent alternative using async series())
Empirically in my tests I have always get the file writen in the desired order (I mean, iirst line1, then line2 and finally line3). However, I don't now if this has happened just by chance or the above is the right way of using write().
I understand that writing in stream is in general asynchronous (as all I/O work should be) but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Examples of usage of write() in real programs are very welcomed. Thanks!
Does write() (without callback) preserve order in node.js write streams?
Yes it does. It preserves order of your writes to that specific stream. All data you're writing goes through the stream buffer which serializes it.
but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Yes, all data does go through a stream buffer. The .write() operation does not return until the data has been successfully copied into the buffer unless an error occurs.
Note, that if you are writing any significant amount of data, you may have to pay attention to flow control (often called back pressure) on the stream. It can back up and may tell you that you need to wait before writing more, but it does buffer your writes in the order you send them.
If the .write() operation returns false, then the stream is telling you that you need to wait for the drain event before writing any more. You can read about this issue in the node.js docs for .write() and in this article about backpressure.
Your code also needs to listen for the error event to detect any errors upon writing the stream. Because the writes are asynchronous, they may occur at some later time and are not necessarily reflected in either the return value from .write() or in the err parameter to the .write() callback. You have to listen for the error event to make sure you see errors on the stream.

Close readable stream to FIFO in NodeJS

I am creating a readable stream to a linux fifo in nodejs like this:
var stream = FS.createReadStream('fifo');
This all works well and I can receive the data from the fifo just fine.
My problem is that I want to have a method to shut my software down gently and therefore I need to close this stream somehow.
Calling
process.exit();
does have no effect as the stream is blocking.
I also tried to destroy the stream manually by calling the undocumented methods stream.close() as well as stream.destroy() as described in the answers of this question.
I know that I could kill my own process using process.kill(process.pid, 'SIGKILL') but this feels like a really bad hack and could have bad impacts on the filesystem or database.
Isn't there a better way to achieve this?
You can try this minimal example to reproduce my problem:
var FS = require('fs');
console.log("Creating readable stream on fifo ...");
var stream = FS.createReadStream('fifo');
stream.once('close', function() {
console.log("The close event was emitted.");
});
stream.close();
stream.destroy();
process.exit();
After creating a fifo called 'fifo' using mkfifo fifo.
How could I modify the above code to shutdown the software correctly?
Explicitly writing to the named pipe will unblock the read operation, for example:
require('child_process').execSync('echo "" > fifo');
process.exit();

Using callbacks with Socket IO

I'm using node and socket io to stream twitter feed to the browser, but the stream is too fast. In order to slow it down, I'm attempting to use setInterval, but it either only delays the start of the stream (without setting evenly spaced intervals between the tweets) or says that I can't use callbacks when broadcasting. Server side code below:
function start(){
stream.on('tweet', function(tweet){
if(tweet.coordinates && tweet.coordinates != null){
io.sockets.emit('stream', tweet);
}
});
}
io.sockets.on("connection", function(socket){
console.log('connected');
setInterval(start, 4000);
});
I think you're misunderstanding how .on() works for streams. It's an event handler. Once it is installed, it's there and the stream can call you at any time. Your interval is actually just making things worse because it's installing multiple .on() handlers.
It's unclear what you mean by "data coming too fast". Too fast for what? If it's just faster than you want to display it, then you can just store the tweets in an array and then use timers to decide when to display things from the array.
If data from a stream is coming too quickly to even store and this is a flowing nodejs stream, then you can pause the stream with the .pause() method and then, when you're able to go again, you can call .resume(). See http://nodejs.org/api/stream.html#stream_readable_pause for more info.

Why don't Node.js ReadableStreams support reading from asynchronous sources?

I'm implementing a Readable Stream. In my _read() implementation, the source of the stream is a web service which requires asynchronous calls. Why doesn't _read() provide a done callback function that can be called when my asynchronous call returns?
The Transform stream and the Writable stream both support this. Why doesn't Readable? Am I just using Readable streams improperly?
MyReadStream.prototype._read = function() {
doSomethingAsync('foo', function(err, result) {
if (result) {
this.push(result);
} else {
this.push(null);
}
// why no done() available to call like in _write()?
// done();
}
}
In my actual implementation, I don't want to call doSomethingAsync again until a previous call has returned. Without a done callback for me to use, I have to implement my own throttle mechanism.
_read() is a notification that the amount of buffered data is below the highWaterMark, so more data can be pulled from upstream.
_write() has a callback because it has to know when you're done processing the chunk. If you don't execute the callback for a long time, the highWaterMark may be reached and data should stop flowing in. When you execute the callback, the internal buffer can start to drain again, allowing more writes to continue.
So _read() doesn't need a callback because it's an advisory thing that you're free to ignore because it's just telling you the stream is able to buffer more data internally, whereas the callback in _write() is critical because it controls backpressure and buffering. If you need to throttle your web API calls, you may look into what the async module has to offer, especially async.memoize and/or async.queue.

Resources