https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options .
I also have a general question .
Can I assume that unless otherwise stated in the documentation , any function mentioned is asynchronous?
Is createreadstream asynchronous?
Yes and no. This question is really more a matter of semantics than anything because it hides an asynchronous operation under a synchronous looking interface. fs.createReadStream() appears to have a synchronous interface. It does not return a promise or accept a callback to communicate back when it's done or to send back some results. So, it appears synchronous from the interface. And, we know from using it that there's nothing you have to wait for in order to start using the stream. So, you use it as if it was a synchronous interface.
Here's the signature for fs.createReadStream():
fs.createReadStream(path[, options])
And, in the options object, there is no callback option and no mention of a returned promise. This is not a typical asynchronous interface.
On the other hand if you look at the signature of fs.rename():
fs.rename(oldPath, newPath, callback)
You see that it takes a callback that is referred to in the doc as the "completion callback". This function is clearly asynchronous.
But, fs.createReadStream() does open a file and it opens that file asynchronously without blocking.
If you're wondering how fs.createReadStream() can be synchronous when it has to open a file asynchronously, that's because fs.createReadStream() has not yet opened the file when it returns.
In normal use of the stream, you can just start reading from the stream immediately. But, internally to the stream, if the file is not yet opened, it will wait until the file is done being opened before actually attempting to read from it. So, the process of opening the file is being hidden from the user of the stream (which is generally a good thing).
If you wanted to know when the file as actually done being opened, there is an open event on the stream. And, if there's an error opening the file, there will be an error event on the stream. So, if you want to get technical, you can say that fs.readStream() actually is an asynchronous operation and completion of the async operation is communicated via the open or error events.
let rstream = fs.createReadStream("temp.txt");
rstream.on('open', (fd) => {
// file opened now
});
rstream.on('error', (err) => {
// error on the stream
});
But, in normal usage of fs.createReadStream(), the programmer does not have to monitor the file open event because it is hidden from the user and handled automatically when the stream is read from next. When you create a read stream and immediately ask to read from it (which is an asynchronous interface), internally the stream object waits for the file to finish opening, then reads some bytes form it, waits for the file read to finish and then notifies completion of the read operation. So, they just combine the file open completion with the first read, saving the programmer an extra step of waiting for the file open to finish before issuing their first read operation.
So, technically fs.createReadStream() is an asynchronous operation that has completion events. But, because of the way it's been combined with reading from the file, you don't generally have to use it like it's asynchronous because it's asynchronous behavior is combined with the async reading from the file.
according to nodejs source code:
fs.createReadStream create a ReadStream instance.
ReadStream's _read method (we know every custom readablestream must provide its _read method) calls fs.read
we know that fs.read is async (fs.readSync is sync)
Related
As I understand "response.write" gives more control over the chunk of data I am writing to, while pipe doesn't have any control over the chunks.
I am trying to stream files and I don't need any control on the chunk of data, so is it recommended to go with stream.pipe(response) ? is there any advantage such as performance over response.write?
downloadStream = readBucket.openDownloadStream(trackID)
downloadStream.on('data', chunk => {
console.log('chunk');
res.write(chunk);
});
downloadStream.on('error', error => {
console.log('error occured', error)
res.sendStatus(500);
});
downloadStream.on('end', () => {
res.end();
});
For my scenario, both codes do the same. I prefer pipe because of less code. Is there any performance benefits, memory/io efficiency advantages with pipe() over response.write?
downloadStream= readBucket.openDownloadStream(trackID)
downloadStream.pipe(res);
.pipe() is just a ready made way to send a readstream to a writestream. You can certainly code it manually if you want, but .pipe() handle a number of things for you.
I'd suggest it's kind of like fs.readFile(). If what you want to do is read a whole file into memory, fs.readFile() does the work of opening the file for reading, reading all the data into a buffer, closing the target file and giving you all the data at the end. If there are any errors, it makes sure the file you were reading gets closed.
The same is true of .pipe(). It hooks up to the data, finish and error events for you and just handles all those, while streaming the data out to our write stream. Depending on the type of writestream, it also takes care of "finishing" or "closing" both the readstream and the writestream, even if there are errors.
And, .pipe() has backflow handling, something your code does not. When you call res.write() it returns a boolean. If that boolean is true, then the write buffer is full and you should not be calling res.write() again until the drain event occurs. Note, your code does not do that. So, .pipe() is more complete than what many people will typically write themselves.
The only situations I've seen where you're generally doing a pipe-like operation, but you can't use .pipe() is when you have very custom behavior during error conditions and you want to do something significantly differently than the default error handling. For just streaming the data and finishing both input and output streams, terminating both on error, it does exactly what you want so there's really no reason to code it yourself when the desired behavior is already built-in.
For my scenario, both codes do the same. I prefer pipe because of less code.
Same here.
Is there any performance benefits, memory/io efficiency advantages with pipe() over response.write?
Yes, sort of. It probably has fewer bugs than the code you write yourself (like forgetting backflow detection in your example that might only show up in some circumstances, large data, slow connection).
createReadStream (with Symbol.asyncIterator)
async function* readChunkIter(chunksAsync) {
for await (const chunk of chunksAsync) {
// magic
yield chunk;
}
}
const fileStream = fs.createReadStream(filePath, { highWaterMark: 1024 * 64 });
const readChunk = readChunkIter(fileStream);
readSync
function* readChunkIter(fd) {
// loop
// magic
fs.readSync(fd, buffer, 0, chunkSize, bytesRead);
yield buffer;
}
const fd = fs.openSync(filePath, 'r');
const readChunk = readChunkIter(fd);
What's better to use with a generator function and why?
upd: I'm not looking for a better way, I want to know the difference between using these features
To start with, you're comparing a synchronous file operation fs.readSync() with an asynchronous one in the stream (which uses fs.read() internally). so, that's a bit like apples and oranges for server use.
If this is on a server, then NEVER use synchronous file I/O except at server startup time because when processing requests or any other server events, synchronous file I/O blocks the entire event loop during the file read operation which drastically reduces your server scalability. Only use asynchronous file I/O, which between your two cases would be the stream.
Otherwise, if this is not on a server or any process that cares about blocking the node.js event loop during a synchronous file operation, then it's entirely up to you on which interface you prefer.
Other comments:
It's also unclear why you wrap for await() in a generator. The caller can just use for await() themselves and avoid the wrapping in a generator.
Streams for reading files are usually used in an event driven manner by adding an event listener to the data event and responding to data as it arrives. If you're just going to asynchronously read chunks of data from the file, there's really no benefit to a stream. You may as well just use fs.read() or fs.promises.read().
We can't really comment on the best/better way to solve a problem without seeing the overall problem you're trying to code for. You've just shown one little snippet of reading data. The best way to structure that depends upon how the higher level code can most conveniently use/consume the data (which you don't show).
I really didn't ask the right question. I'm not looking for a better way, I want to know the difference between using these features.
Well, the main difference is that fs.readSync() is blocking and synchronous and thus blocks the event loop, ruining the scalability of a server and should never be used (except during startup code) in a server environment. Streams in node.js are asynchronous and do not block the event loop.
Other than that difference, streams are a higher level construct than just reading the file directly and should be used when you're actually using features of the streams and should probably not be used when you're just reading chunks from the file directly and aren't using any features of streams.
In particular, error handling is not always so clear with streams, particularly when trying to use await and promises with streams. This is probably because readstreams were originally designed to be an event driven object and that means communicating errors indirectly on an error event which complicates the error handling on straight read operations. If you're not using the event driven nature of readstreams or some transform feature or some other major feature of streams, I wouldn't use them - I'd use the more traditional fs.promises.readFile() to just read data.
I have a node.js program in which I use a stream to write information to a SFTP server. Something like this (simplified version):
var conn = new SSHClient();
process.nextTick(function (){
conn.on('ready', function () {
conn.sftp(function (error, sftp) {
var writeStream = sftp.createWriteStream(filename);
...
writeStream.write(line1);
writeStream.write(line2);
writeStream.write(line3);
...
});
}).connect(...);
});
Note I'm not using the (optional) callback argument (described in the write() API specification) and I'm not sure if this may cause undesired behaviour (i.e. lines not writen in the following order: line1, line2, line3). In other words, I don't know if this alternative (more complex code and not sure if less efficient) should be used:
writeStream.write(line1, ..., function() {
writeStream.write(line2, ..., function() {
writeStream.write(line3);
});
});
(or equivalent alternative using async series())
Empirically in my tests I have always get the file writen in the desired order (I mean, iirst line1, then line2 and finally line3). However, I don't now if this has happened just by chance or the above is the right way of using write().
I understand that writing in stream is in general asynchronous (as all I/O work should be) but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Examples of usage of write() in real programs are very welcomed. Thanks!
Does write() (without callback) preserve order in node.js write streams?
Yes it does. It preserves order of your writes to that specific stream. All data you're writing goes through the stream buffer which serializes it.
but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Yes, all data does go through a stream buffer. The .write() operation does not return until the data has been successfully copied into the buffer unless an error occurs.
Note, that if you are writing any significant amount of data, you may have to pay attention to flow control (often called back pressure) on the stream. It can back up and may tell you that you need to wait before writing more, but it does buffer your writes in the order you send them.
If the .write() operation returns false, then the stream is telling you that you need to wait for the drain event before writing any more. You can read about this issue in the node.js docs for .write() and in this article about backpressure.
Your code also needs to listen for the error event to detect any errors upon writing the stream. Because the writes are asynchronous, they may occur at some later time and are not necessarily reflected in either the return value from .write() or in the err parameter to the .write() callback. You have to listen for the error event to make sure you see errors on the stream.
I'm writing to a file (a writable steam) and I need to close the file once I'm done. I'm not sure of the difference between these two functions or if I need to call them both. Here's what the documentation says:
stream.end()
Terminates the stream with EOF or FIN. This call will allow queued write data to be sent before closing the stream.
stream.destroySoon()
After the write queue is drained, close the file descriptor. destroySoon() can still destroy straight away, as long as there is no data left in the queue for writes.
There is no difference.
From fs.js in the node source:
// There is no shutdown() for files.
WriteStream.prototype.destroySoon = WriteStream.prototype.end;
I'm using epoll_create to wait on a socket.
What is the life-cycle of the returned resource tied to? Is there something like an epoll_destroy or is it tied to the socket's close or destory call?
Can I re-use the result of epoll_create if close my socket and re-open a new one. Or should I just call epoll_create and forget about the previous result of epoll_create.
epoll_create(2) returns a file descriptor, so you just use close(2) on it when done.
Then, the idea of I/O multiplexing, often called Asynchronous I/O, is to wait for multiple events, and handle them one at a time. That means you generally need only one polling file descriptor.
epoll(7) manual page contains basic example of suggested API usage.