Does write() (without callback) preserve order in node.js write streams? - node.js

I have a node.js program in which I use a stream to write information to a SFTP server. Something like this (simplified version):
var conn = new SSHClient();
process.nextTick(function (){
conn.on('ready', function () {
conn.sftp(function (error, sftp) {
var writeStream = sftp.createWriteStream(filename);
...
writeStream.write(line1);
writeStream.write(line2);
writeStream.write(line3);
...
});
}).connect(...);
});
Note I'm not using the (optional) callback argument (described in the write() API specification) and I'm not sure if this may cause undesired behaviour (i.e. lines not writen in the following order: line1, line2, line3). In other words, I don't know if this alternative (more complex code and not sure if less efficient) should be used:
writeStream.write(line1, ..., function() {
writeStream.write(line2, ..., function() {
writeStream.write(line3);
});
});
(or equivalent alternative using async series())
Empirically in my tests I have always get the file writen in the desired order (I mean, iirst line1, then line2 and finally line3). However, I don't now if this has happened just by chance or the above is the right way of using write().
I understand that writing in stream is in general asynchronous (as all I/O work should be) but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Examples of usage of write() in real programs are very welcomed. Thanks!

Does write() (without callback) preserve order in node.js write streams?
Yes it does. It preserves order of your writes to that specific stream. All data you're writing goes through the stream buffer which serializes it.
but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Yes, all data does go through a stream buffer. The .write() operation does not return until the data has been successfully copied into the buffer unless an error occurs.
Note, that if you are writing any significant amount of data, you may have to pay attention to flow control (often called back pressure) on the stream. It can back up and may tell you that you need to wait before writing more, but it does buffer your writes in the order you send them.
If the .write() operation returns false, then the stream is telling you that you need to wait for the drain event before writing any more. You can read about this issue in the node.js docs for .write() and in this article about backpressure.
Your code also needs to listen for the error event to detect any errors upon writing the stream. Because the writes are asynchronous, they may occur at some later time and are not necessarily reflected in either the return value from .write() or in the err parameter to the .write() callback. You have to listen for the error event to make sure you see errors on the stream.

Related

Difference between response.write vs stream.pipe(response) in NodeJS

As I understand "response.write" gives more control over the chunk of data I am writing to, while pipe doesn't have any control over the chunks.
I am trying to stream files and I don't need any control on the chunk of data, so is it recommended to go with stream.pipe(response) ? is there any advantage such as performance over response.write?
downloadStream = readBucket.openDownloadStream(trackID)
downloadStream.on('data', chunk => {
console.log('chunk');
res.write(chunk);
});
downloadStream.on('error', error => {
console.log('error occured', error)
res.sendStatus(500);
});
downloadStream.on('end', () => {
res.end();
});
For my scenario, both codes do the same. I prefer pipe because of less code. Is there any performance benefits, memory/io efficiency advantages with pipe() over response.write?
downloadStream= readBucket.openDownloadStream(trackID)
downloadStream.pipe(res);
.pipe() is just a ready made way to send a readstream to a writestream. You can certainly code it manually if you want, but .pipe() handle a number of things for you.
I'd suggest it's kind of like fs.readFile(). If what you want to do is read a whole file into memory, fs.readFile() does the work of opening the file for reading, reading all the data into a buffer, closing the target file and giving you all the data at the end. If there are any errors, it makes sure the file you were reading gets closed.
The same is true of .pipe(). It hooks up to the data, finish and error events for you and just handles all those, while streaming the data out to our write stream. Depending on the type of writestream, it also takes care of "finishing" or "closing" both the readstream and the writestream, even if there are errors.
And, .pipe() has backflow handling, something your code does not. When you call res.write() it returns a boolean. If that boolean is true, then the write buffer is full and you should not be calling res.write() again until the drain event occurs. Note, your code does not do that. So, .pipe() is more complete than what many people will typically write themselves.
The only situations I've seen where you're generally doing a pipe-like operation, but you can't use .pipe() is when you have very custom behavior during error conditions and you want to do something significantly differently than the default error handling. For just streaming the data and finishing both input and output streams, terminating both on error, it does exactly what you want so there's really no reason to code it yourself when the desired behavior is already built-in.
For my scenario, both codes do the same. I prefer pipe because of less code.
Same here.
Is there any performance benefits, memory/io efficiency advantages with pipe() over response.write?
Yes, sort of. It probably has fewer bugs than the code you write yourself (like forgetting backflow detection in your example that might only show up in some circumstances, large data, slow connection).

What's better readSync or createReadStream (with Symbol.asyncIterator)?

createReadStream (with Symbol.asyncIterator)
async function* readChunkIter(chunksAsync) {
for await (const chunk of chunksAsync) {
// magic
yield chunk;
}
}
const fileStream = fs.createReadStream(filePath, { highWaterMark: 1024 * 64 });
const readChunk = readChunkIter(fileStream);
readSync
function* readChunkIter(fd) {
// loop
// magic
fs.readSync(fd, buffer, 0, chunkSize, bytesRead);
yield buffer;
}
const fd = fs.openSync(filePath, 'r');
const readChunk = readChunkIter(fd);
What's better to use with a generator function and why?
upd: I'm not looking for a better way, I want to know the difference between using these features
To start with, you're comparing a synchronous file operation fs.readSync() with an asynchronous one in the stream (which uses fs.read() internally). so, that's a bit like apples and oranges for server use.
If this is on a server, then NEVER use synchronous file I/O except at server startup time because when processing requests or any other server events, synchronous file I/O blocks the entire event loop during the file read operation which drastically reduces your server scalability. Only use asynchronous file I/O, which between your two cases would be the stream.
Otherwise, if this is not on a server or any process that cares about blocking the node.js event loop during a synchronous file operation, then it's entirely up to you on which interface you prefer.
Other comments:
It's also unclear why you wrap for await() in a generator. The caller can just use for await() themselves and avoid the wrapping in a generator.
Streams for reading files are usually used in an event driven manner by adding an event listener to the data event and responding to data as it arrives. If you're just going to asynchronously read chunks of data from the file, there's really no benefit to a stream. You may as well just use fs.read() or fs.promises.read().
We can't really comment on the best/better way to solve a problem without seeing the overall problem you're trying to code for. You've just shown one little snippet of reading data. The best way to structure that depends upon how the higher level code can most conveniently use/consume the data (which you don't show).
I really didn't ask the right question. I'm not looking for a better way, I want to know the difference between using these features.
Well, the main difference is that fs.readSync() is blocking and synchronous and thus blocks the event loop, ruining the scalability of a server and should never be used (except during startup code) in a server environment. Streams in node.js are asynchronous and do not block the event loop.
Other than that difference, streams are a higher level construct than just reading the file directly and should be used when you're actually using features of the streams and should probably not be used when you're just reading chunks from the file directly and aren't using any features of streams.
In particular, error handling is not always so clear with streams, particularly when trying to use await and promises with streams. This is probably because readstreams were originally designed to be an event driven object and that means communicating errors indirectly on an error event which complicates the error handling on straight read operations. If you're not using the event driven nature of readstreams or some transform feature or some other major feature of streams, I wouldn't use them - I'd use the more traditional fs.promises.readFile() to just read data.

What is the advantage of using pipe function over res.write

The framework is Express.
When I'm sending a request from within an end point and start receiving data, either I can read data in chunks and write them instantly:
responseHandler.on('data', (chunk) => {
res.write(chunk);
});
Or I can create a writable stream and pipe the response to that.
responseHandler.pipe(res)
It is obvious that the pipe function takes care of the former process with more dimensions to it. What are they?
The most important difference between managing event handlers and using readable.pipe(writable) is that using pipe:
The flow of data will be automatically managed so that the destination Writable stream is not overwhelmed by a faster Readable stream. Pipe
It means that readable stream may be faster than writable and pipe handles that logic. If you are writing code like:
responseHandler.on('data', (chunk) => {
res.write(chunk);
});
res.write() function
Returns: (boolean) false if the stream wishes for the calling code to wait for the 'drain' event to be emitted before continuing to write additional data; otherwise true. Link
It means that writable stream could be not ready to handle more data. So you can manage this manually as mentioned in writable.write() example.
In some cases you do not have readable stream and you could write to writable stream using writable.write().
Example
const data = []; // array of some data.
data.forEach((d) => writable.write(d));
But again, you must see what writable.write returns. If it is false you must act in a manual fashion to adjust stream flow.
Another way is to wrap your data into readable stream and just pipe it.
By the way, there is one more great advantage of using pipes. You can chain them by your needs, for instance:
readableStream
.pipe(modify) // transform stream
.pipe(zip) // transform stream
.pipe(writableStream);
By summing everything up piggyback on node.js given functionality to manage streams if possible. In most cases it will help you avoid extra complexity and it will not be slower compared to managing it manually.

Node pipe to stdout -- how do I tell if drained?

The standard advice on determine whether you need to wait for drain event on process.stdout is to check whether it returns false when you write to it.
How should I check if I've piped another stream to it? It would seem that that stream can emit finish before all the output is actually written. Can I do something like?
upstreamOfStdout.on('finish', function(){
if(!process.stdout.write('')) {
process.stdout.on('drain', function() { done("I'm done"); });
}
else {
done("I'm done");
}
});
upstreamOfStdout.pipe(process.stdout);
I prefer an answer that doesn't depend on the internals of any streams. Just given that the streams conform to the node stream interface, what is the canonical way to do this?
EDIT:
The larger context is a wrapper:
new Promise(function(resolve, reject){
stream.on(<some-event>, resolve);
... (perhaps something else here?)
});
where stream can be process.stdout or something else, which has another through stream piped into it.
My program exits whenever resolve is called -- I presume the Promise code keeps the program alive until all promises have been resolved.
I have encountered this situation several times, and have always used hacks to solve the problem (e.g. there are several private members in process.stdout that are useful.) But I really would like to solve this once and for all (or learn that it is a bug, so I can track the issue and fix my hacks when its resolved, at least): how do I tell when a stream downstream of another is finished processing its input?
Instead of writing directly to process.stdout, create a custom writable (shown below) which writes to stdout as a side effect.
const { Writable } = require('stream');
function writeStdoutAndFinish(){
return new Writable({
write(chunk, encoding, callback) {
process.stdout.write(chunk,callback);
},
});
};
The result of writeStdoutAndFinish() will emit a finish event.
async function main(){
...
await new Promise((resolve)=>{
someReadableStream.pipe(writeStdoutAndFinish()).on('finish',()=>{
console.log('finish received');
resolve();
})
});
...
}
In practice, I don't that the above approach differs in behavior from
async function main(){
...
await new Promise((resolve)=>{
(someReadableStream.on('end',()=>{
console.log('end received');
resolve();
})).pipe(process.stdout)
});
...
}
First of all, as far as I can see from the documentation, that stream never emits the finish event, so unlikely you can rely on that.
Moreover, from the documentation above mentioned, the drain event seems to be used to notify the user about when the stream is ready to accept more data once the .write method returned false. In any case you can deduce that that means that all the other data have been written. From the documentation for the write method indeed we deduce that the false value (aka please stop pushing data) is not mandatory and you can freely ignore it, but subsequent data will be probably stuffed in memory letting the use of it to grow up.
Because of that, basing my assumption on the sole documentation, I guess you can rely on the drain event to know when all the data have been nicely handled or are likely to be flushed out.
That said, it looks to me also that there is not a clear way to definitely know when all the data have been effectively sent to the console.
Finally, you can listen the end event of the piped stream to know when it has been fully consumed, no matter if it has been written to the console or the data are still buffered within the console stream.
Of course, you can also freely ignore the problem, for a fully consumed stream should be nicely handled by node.js, thus discarded and you have not to deal with it anymore once you have piped it to the second stream.

Is createreadstream asynchronous?

https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options .
I also have a general question .
Can I assume that unless otherwise stated in the documentation , any function mentioned is asynchronous?
Is createreadstream asynchronous?
Yes and no. This question is really more a matter of semantics than anything because it hides an asynchronous operation under a synchronous looking interface. fs.createReadStream() appears to have a synchronous interface. It does not return a promise or accept a callback to communicate back when it's done or to send back some results. So, it appears synchronous from the interface. And, we know from using it that there's nothing you have to wait for in order to start using the stream. So, you use it as if it was a synchronous interface.
Here's the signature for fs.createReadStream():
fs.createReadStream(path[, options])
And, in the options object, there is no callback option and no mention of a returned promise. This is not a typical asynchronous interface.
On the other hand if you look at the signature of fs.rename():
fs.rename(oldPath, newPath, callback)
You see that it takes a callback that is referred to in the doc as the "completion callback". This function is clearly asynchronous.
But, fs.createReadStream() does open a file and it opens that file asynchronously without blocking.
If you're wondering how fs.createReadStream() can be synchronous when it has to open a file asynchronously, that's because fs.createReadStream() has not yet opened the file when it returns.
In normal use of the stream, you can just start reading from the stream immediately. But, internally to the stream, if the file is not yet opened, it will wait until the file is done being opened before actually attempting to read from it. So, the process of opening the file is being hidden from the user of the stream (which is generally a good thing).
If you wanted to know when the file as actually done being opened, there is an open event on the stream. And, if there's an error opening the file, there will be an error event on the stream. So, if you want to get technical, you can say that fs.readStream() actually is an asynchronous operation and completion of the async operation is communicated via the open or error events.
let rstream = fs.createReadStream("temp.txt");
rstream.on('open', (fd) => {
// file opened now
});
rstream.on('error', (err) => {
// error on the stream
});
But, in normal usage of fs.createReadStream(), the programmer does not have to monitor the file open event because it is hidden from the user and handled automatically when the stream is read from next. When you create a read stream and immediately ask to read from it (which is an asynchronous interface), internally the stream object waits for the file to finish opening, then reads some bytes form it, waits for the file read to finish and then notifies completion of the read operation. So, they just combine the file open completion with the first read, saving the programmer an extra step of waiting for the file open to finish before issuing their first read operation.
So, technically fs.createReadStream() is an asynchronous operation that has completion events. But, because of the way it's been combined with reading from the file, you don't generally have to use it like it's asynchronous because it's asynchronous behavior is combined with the async reading from the file.
according to nodejs source code:
fs.createReadStream create a ReadStream instance.
ReadStream's _read method (we know every custom readablestream must provide its _read method) calls fs.read
we know that fs.read is async (fs.readSync is sync)

Resources