I know that one can synchronously read a file in NodeJS like this:
var fs = require('fs');
var content = fs.readFileSync('myfilename');
console.log(content);
I am instead interested in being able to read the contents from a stream into a string synchronously. Thoughts?
Streams in node.js are not synchronous - they are event driven. They just aren't synchronous. So, you can't get their results into a string synchronously.
If you have no choice but to use a stream, then you have no choice but to deal with the contents of the stream asynchronously.
If you can change the source of the data to a synchronous source such as the fs.readFileSync() that you show, then you can do that (though generally not recommended for a multi-user server process).
Related
createReadStream (with Symbol.asyncIterator)
async function* readChunkIter(chunksAsync) {
for await (const chunk of chunksAsync) {
// magic
yield chunk;
}
}
const fileStream = fs.createReadStream(filePath, { highWaterMark: 1024 * 64 });
const readChunk = readChunkIter(fileStream);
readSync
function* readChunkIter(fd) {
// loop
// magic
fs.readSync(fd, buffer, 0, chunkSize, bytesRead);
yield buffer;
}
const fd = fs.openSync(filePath, 'r');
const readChunk = readChunkIter(fd);
What's better to use with a generator function and why?
upd: I'm not looking for a better way, I want to know the difference between using these features
To start with, you're comparing a synchronous file operation fs.readSync() with an asynchronous one in the stream (which uses fs.read() internally). so, that's a bit like apples and oranges for server use.
If this is on a server, then NEVER use synchronous file I/O except at server startup time because when processing requests or any other server events, synchronous file I/O blocks the entire event loop during the file read operation which drastically reduces your server scalability. Only use asynchronous file I/O, which between your two cases would be the stream.
Otherwise, if this is not on a server or any process that cares about blocking the node.js event loop during a synchronous file operation, then it's entirely up to you on which interface you prefer.
Other comments:
It's also unclear why you wrap for await() in a generator. The caller can just use for await() themselves and avoid the wrapping in a generator.
Streams for reading files are usually used in an event driven manner by adding an event listener to the data event and responding to data as it arrives. If you're just going to asynchronously read chunks of data from the file, there's really no benefit to a stream. You may as well just use fs.read() or fs.promises.read().
We can't really comment on the best/better way to solve a problem without seeing the overall problem you're trying to code for. You've just shown one little snippet of reading data. The best way to structure that depends upon how the higher level code can most conveniently use/consume the data (which you don't show).
I really didn't ask the right question. I'm not looking for a better way, I want to know the difference between using these features.
Well, the main difference is that fs.readSync() is blocking and synchronous and thus blocks the event loop, ruining the scalability of a server and should never be used (except during startup code) in a server environment. Streams in node.js are asynchronous and do not block the event loop.
Other than that difference, streams are a higher level construct than just reading the file directly and should be used when you're actually using features of the streams and should probably not be used when you're just reading chunks from the file directly and aren't using any features of streams.
In particular, error handling is not always so clear with streams, particularly when trying to use await and promises with streams. This is probably because readstreams were originally designed to be an event driven object and that means communicating errors indirectly on an error event which complicates the error handling on straight read operations. If you're not using the event driven nature of readstreams or some transform feature or some other major feature of streams, I wouldn't use them - I'd use the more traditional fs.promises.readFile() to just read data.
The framework is Express.
When I'm sending a request from within an end point and start receiving data, either I can read data in chunks and write them instantly:
responseHandler.on('data', (chunk) => {
res.write(chunk);
});
Or I can create a writable stream and pipe the response to that.
responseHandler.pipe(res)
It is obvious that the pipe function takes care of the former process with more dimensions to it. What are they?
The most important difference between managing event handlers and using readable.pipe(writable) is that using pipe:
The flow of data will be automatically managed so that the destination Writable stream is not overwhelmed by a faster Readable stream. Pipe
It means that readable stream may be faster than writable and pipe handles that logic. If you are writing code like:
responseHandler.on('data', (chunk) => {
res.write(chunk);
});
res.write() function
Returns: (boolean) false if the stream wishes for the calling code to wait for the 'drain' event to be emitted before continuing to write additional data; otherwise true. Link
It means that writable stream could be not ready to handle more data. So you can manage this manually as mentioned in writable.write() example.
In some cases you do not have readable stream and you could write to writable stream using writable.write().
Example
const data = []; // array of some data.
data.forEach((d) => writable.write(d));
But again, you must see what writable.write returns. If it is false you must act in a manual fashion to adjust stream flow.
Another way is to wrap your data into readable stream and just pipe it.
By the way, there is one more great advantage of using pipes. You can chain them by your needs, for instance:
readableStream
.pipe(modify) // transform stream
.pipe(zip) // transform stream
.pipe(writableStream);
By summing everything up piggyback on node.js given functionality to manage streams if possible. In most cases it will help you avoid extra complexity and it will not be slower compared to managing it manually.
I have a node.js program in which I use a stream to write information to a SFTP server. Something like this (simplified version):
var conn = new SSHClient();
process.nextTick(function (){
conn.on('ready', function () {
conn.sftp(function (error, sftp) {
var writeStream = sftp.createWriteStream(filename);
...
writeStream.write(line1);
writeStream.write(line2);
writeStream.write(line3);
...
});
}).connect(...);
});
Note I'm not using the (optional) callback argument (described in the write() API specification) and I'm not sure if this may cause undesired behaviour (i.e. lines not writen in the following order: line1, line2, line3). In other words, I don't know if this alternative (more complex code and not sure if less efficient) should be used:
writeStream.write(line1, ..., function() {
writeStream.write(line2, ..., function() {
writeStream.write(line3);
});
});
(or equivalent alternative using async series())
Empirically in my tests I have always get the file writen in the desired order (I mean, iirst line1, then line2 and finally line3). However, I don't now if this has happened just by chance or the above is the right way of using write().
I understand that writing in stream is in general asynchronous (as all I/O work should be) but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Examples of usage of write() in real programs are very welcomed. Thanks!
Does write() (without callback) preserve order in node.js write streams?
Yes it does. It preserves order of your writes to that specific stream. All data you're writing goes through the stream buffer which serializes it.
but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Yes, all data does go through a stream buffer. The .write() operation does not return until the data has been successfully copied into the buffer unless an error occurs.
Note, that if you are writing any significant amount of data, you may have to pay attention to flow control (often called back pressure) on the stream. It can back up and may tell you that you need to wait before writing more, but it does buffer your writes in the order you send them.
If the .write() operation returns false, then the stream is telling you that you need to wait for the drain event before writing any more. You can read about this issue in the node.js docs for .write() and in this article about backpressure.
Your code also needs to listen for the error event to detect any errors upon writing the stream. Because the writes are asynchronous, they may occur at some later time and are not necessarily reflected in either the return value from .write() or in the err parameter to the .write() callback. You have to listen for the error event to make sure you see errors on the stream.
The documentation for node suggests that for the new best way to read streams is as follows:
var readable = getReadableStreamSomehow();
readable.on('readable', function() {
var chunk;
while (null !== (chunk = readable.read())) {
console.log('got %d bytes of data', chunk.length);
}
});
To me this seems to cause a blocking while loop. This would mean that if node is responding to an http request by reading and sending a file, the process would have to block while the chunk is read before it could be sent.
Isn't this blocking IO which node.js tries to avoid?
The important thing to note here is that it's not blocking in the sense that it's waiting for more input to arrive on the stream. It's simply retrieving the current contents of the stream's internal buffer. This kind of loop will finish pretty quickly since there is no waiting on I/O at all.
A stream can be both synchronous and asynchronous. If readable stream synchronously pushes data in the internal buffer then you'll get a synchronous stream. And yes, in that case if it pushes lots of data synchronously node's event loop won't be able to run until all the data is pushed.
Interestingly, if you even remove the while loop in readble callback, the stream module internally calls a while loop once and keeps running until all the pushed data is read.
But for asynchronous IO operations(e.g. http or fs module), they push data asynchronously in the buffer. So the while loop only runs when data is pushed in buffer and stops as soon as you've read the entire buffer.
https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options .
I also have a general question .
Can I assume that unless otherwise stated in the documentation , any function mentioned is asynchronous?
Is createreadstream asynchronous?
Yes and no. This question is really more a matter of semantics than anything because it hides an asynchronous operation under a synchronous looking interface. fs.createReadStream() appears to have a synchronous interface. It does not return a promise or accept a callback to communicate back when it's done or to send back some results. So, it appears synchronous from the interface. And, we know from using it that there's nothing you have to wait for in order to start using the stream. So, you use it as if it was a synchronous interface.
Here's the signature for fs.createReadStream():
fs.createReadStream(path[, options])
And, in the options object, there is no callback option and no mention of a returned promise. This is not a typical asynchronous interface.
On the other hand if you look at the signature of fs.rename():
fs.rename(oldPath, newPath, callback)
You see that it takes a callback that is referred to in the doc as the "completion callback". This function is clearly asynchronous.
But, fs.createReadStream() does open a file and it opens that file asynchronously without blocking.
If you're wondering how fs.createReadStream() can be synchronous when it has to open a file asynchronously, that's because fs.createReadStream() has not yet opened the file when it returns.
In normal use of the stream, you can just start reading from the stream immediately. But, internally to the stream, if the file is not yet opened, it will wait until the file is done being opened before actually attempting to read from it. So, the process of opening the file is being hidden from the user of the stream (which is generally a good thing).
If you wanted to know when the file as actually done being opened, there is an open event on the stream. And, if there's an error opening the file, there will be an error event on the stream. So, if you want to get technical, you can say that fs.readStream() actually is an asynchronous operation and completion of the async operation is communicated via the open or error events.
let rstream = fs.createReadStream("temp.txt");
rstream.on('open', (fd) => {
// file opened now
});
rstream.on('error', (err) => {
// error on the stream
});
But, in normal usage of fs.createReadStream(), the programmer does not have to monitor the file open event because it is hidden from the user and handled automatically when the stream is read from next. When you create a read stream and immediately ask to read from it (which is an asynchronous interface), internally the stream object waits for the file to finish opening, then reads some bytes form it, waits for the file read to finish and then notifies completion of the read operation. So, they just combine the file open completion with the first read, saving the programmer an extra step of waiting for the file open to finish before issuing their first read operation.
So, technically fs.createReadStream() is an asynchronous operation that has completion events. But, because of the way it's been combined with reading from the file, you don't generally have to use it like it's asynchronous because it's asynchronous behavior is combined with the async reading from the file.
according to nodejs source code:
fs.createReadStream create a ReadStream instance.
ReadStream's _read method (we know every custom readablestream must provide its _read method) calls fs.read
we know that fs.read is async (fs.readSync is sync)