I'm implementing a Readable Stream. In my _read() implementation, the source of the stream is a web service which requires asynchronous calls. Why doesn't _read() provide a done callback function that can be called when my asynchronous call returns?
The Transform stream and the Writable stream both support this. Why doesn't Readable? Am I just using Readable streams improperly?
MyReadStream.prototype._read = function() {
doSomethingAsync('foo', function(err, result) {
if (result) {
this.push(result);
} else {
this.push(null);
}
// why no done() available to call like in _write()?
// done();
}
}
In my actual implementation, I don't want to call doSomethingAsync again until a previous call has returned. Without a done callback for me to use, I have to implement my own throttle mechanism.
_read() is a notification that the amount of buffered data is below the highWaterMark, so more data can be pulled from upstream.
_write() has a callback because it has to know when you're done processing the chunk. If you don't execute the callback for a long time, the highWaterMark may be reached and data should stop flowing in. When you execute the callback, the internal buffer can start to drain again, allowing more writes to continue.
So _read() doesn't need a callback because it's an advisory thing that you're free to ignore because it's just telling you the stream is able to buffer more data internally, whereas the callback in _write() is critical because it controls backpressure and buffering. If you need to throttle your web API calls, you may look into what the async module has to offer, especially async.memoize and/or async.queue.
Related
createReadStream (with Symbol.asyncIterator)
async function* readChunkIter(chunksAsync) {
for await (const chunk of chunksAsync) {
// magic
yield chunk;
}
}
const fileStream = fs.createReadStream(filePath, { highWaterMark: 1024 * 64 });
const readChunk = readChunkIter(fileStream);
readSync
function* readChunkIter(fd) {
// loop
// magic
fs.readSync(fd, buffer, 0, chunkSize, bytesRead);
yield buffer;
}
const fd = fs.openSync(filePath, 'r');
const readChunk = readChunkIter(fd);
What's better to use with a generator function and why?
upd: I'm not looking for a better way, I want to know the difference between using these features
To start with, you're comparing a synchronous file operation fs.readSync() with an asynchronous one in the stream (which uses fs.read() internally). so, that's a bit like apples and oranges for server use.
If this is on a server, then NEVER use synchronous file I/O except at server startup time because when processing requests or any other server events, synchronous file I/O blocks the entire event loop during the file read operation which drastically reduces your server scalability. Only use asynchronous file I/O, which between your two cases would be the stream.
Otherwise, if this is not on a server or any process that cares about blocking the node.js event loop during a synchronous file operation, then it's entirely up to you on which interface you prefer.
Other comments:
It's also unclear why you wrap for await() in a generator. The caller can just use for await() themselves and avoid the wrapping in a generator.
Streams for reading files are usually used in an event driven manner by adding an event listener to the data event and responding to data as it arrives. If you're just going to asynchronously read chunks of data from the file, there's really no benefit to a stream. You may as well just use fs.read() or fs.promises.read().
We can't really comment on the best/better way to solve a problem without seeing the overall problem you're trying to code for. You've just shown one little snippet of reading data. The best way to structure that depends upon how the higher level code can most conveniently use/consume the data (which you don't show).
I really didn't ask the right question. I'm not looking for a better way, I want to know the difference between using these features.
Well, the main difference is that fs.readSync() is blocking and synchronous and thus blocks the event loop, ruining the scalability of a server and should never be used (except during startup code) in a server environment. Streams in node.js are asynchronous and do not block the event loop.
Other than that difference, streams are a higher level construct than just reading the file directly and should be used when you're actually using features of the streams and should probably not be used when you're just reading chunks from the file directly and aren't using any features of streams.
In particular, error handling is not always so clear with streams, particularly when trying to use await and promises with streams. This is probably because readstreams were originally designed to be an event driven object and that means communicating errors indirectly on an error event which complicates the error handling on straight read operations. If you're not using the event driven nature of readstreams or some transform feature or some other major feature of streams, I wouldn't use them - I'd use the more traditional fs.promises.readFile() to just read data.
I have a node.js program in which I use a stream to write information to a SFTP server. Something like this (simplified version):
var conn = new SSHClient();
process.nextTick(function (){
conn.on('ready', function () {
conn.sftp(function (error, sftp) {
var writeStream = sftp.createWriteStream(filename);
...
writeStream.write(line1);
writeStream.write(line2);
writeStream.write(line3);
...
});
}).connect(...);
});
Note I'm not using the (optional) callback argument (described in the write() API specification) and I'm not sure if this may cause undesired behaviour (i.e. lines not writen in the following order: line1, line2, line3). In other words, I don't know if this alternative (more complex code and not sure if less efficient) should be used:
writeStream.write(line1, ..., function() {
writeStream.write(line2, ..., function() {
writeStream.write(line3);
});
});
(or equivalent alternative using async series())
Empirically in my tests I have always get the file writen in the desired order (I mean, iirst line1, then line2 and finally line3). However, I don't now if this has happened just by chance or the above is the right way of using write().
I understand that writing in stream is in general asynchronous (as all I/O work should be) but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Examples of usage of write() in real programs are very welcomed. Thanks!
Does write() (without callback) preserve order in node.js write streams?
Yes it does. It preserves order of your writes to that specific stream. All data you're writing goes through the stream buffer which serializes it.
but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Yes, all data does go through a stream buffer. The .write() operation does not return until the data has been successfully copied into the buffer unless an error occurs.
Note, that if you are writing any significant amount of data, you may have to pay attention to flow control (often called back pressure) on the stream. It can back up and may tell you that you need to wait before writing more, but it does buffer your writes in the order you send them.
If the .write() operation returns false, then the stream is telling you that you need to wait for the drain event before writing any more. You can read about this issue in the node.js docs for .write() and in this article about backpressure.
Your code also needs to listen for the error event to detect any errors upon writing the stream. Because the writes are asynchronous, they may occur at some later time and are not necessarily reflected in either the return value from .write() or in the err parameter to the .write() callback. You have to listen for the error event to make sure you see errors on the stream.
What I Have:
I have a nodejs express server get endpoint that in turn calls other APIs that are time consuming(say about 2 seconds). I have called this function with a callback such that the res.send is triggered as a part of the call back. The res.send object packs an object that will be created after the results from these time consuming API calls is performed. So my res.send can only be sent when I have the entire information from the API call.
Some representative code.
someFunctionCall(params, callback)
{
// do some asyncronous requests.
Promise.all([requestAsync1(params),requestAsync2(params)]).then
{
// do some operations
callback(response) // callback given with some data from the promise
}
}
app.get('/',function(req, res){
someFunctionCall(params, function(err, data){
res.send(JSON.stringify(data))
}
}
What I want
I want my server to be able to handle other parallel incoming get requests without being blocked due to the REST api calls in the other function. But the problem is that the callback will only be issued when the promises are fulfilled,each of those operations are async, but my thread will wait till the execution of all of them. And Node does not accept the next get request without executing the res.send or the res.end of the previous request. This becomes an issues when I have multiple requests coming in, each one is executed one after another.
Note: I do not want to go with the cluster method, I just want to know if it is possible to this without it.
You are apparently misunderstanding how node.js, asynchronous operations and promises work. Assuming your long running asynchronous operations are all properly written with asynchronous I/O, then neither your requestAsync1(params) or requestAsync2(params) calls are blocking. That means that while you are waiting for Promise.all() to call its .then() handler to signify that both of those asynchronous operations are complete, node.js is perfectly free to run any other events or incoming requests. Promises themselves do not block, so the node.js event system is free to process other events. So, you either don't have a blocking problem at all or if you actually do, it is not caused by what you asked about here.
To see if your code is actually blocking or not, you can temporarily add a simple timer that outputs to the console like this:
let startTime;
setInterval(function() {
if (!startTime) {
startTime = Date.now();
}
console.log((Date.now() - startTime) / 1000);
}, 100)
This will output a simple relative timestamp every 100ms when the event loop is not blocked. You would obviously not leave this in your code for production code, but it can be useful to show you when/if your event loop is blocked.
I do see an odd syntax issue in the code you included in your question. This code:
someFunctionCall(params, callback)
{
// do some asyncronous requests.
Promise.all([requestAsync1(params),requestAsync2(params)]).then
{
// do some operations
callback(response) // callback given with some data from the promise
}
}
should be expressed like this:
someFunctionCall(params, callback)
{
// do some asyncronous requests.
Promise.all([requestAsync1(params),requestAsync2(params)]).then(function(response)
{
// do some operations
callback(response) // callback given with some data from the promise
}
});
But, an even better design would be to just return the promise and not switch back to a plain callback. Besides allowing the caller to use the more flexible promises scheme, you are also "eating" errors that may occur in either or your async operations. It's suggest this:
someFunctionCall(params) {
// do some asyncronous requests.
return Promise.all([requestAsync1(params),requestAsync2(params)]).then(function(results) {
// further processing of results
// return final resolved value of the promise
return someValue;
});
}
Then, then caller would use this like:
someFunctionCall(params).then(function(result) {
// process final result here
}).catch(function(err) {
// handle error here
});
The standard advice on determine whether you need to wait for drain event on process.stdout is to check whether it returns false when you write to it.
How should I check if I've piped another stream to it? It would seem that that stream can emit finish before all the output is actually written. Can I do something like?
upstreamOfStdout.on('finish', function(){
if(!process.stdout.write('')) {
process.stdout.on('drain', function() { done("I'm done"); });
}
else {
done("I'm done");
}
});
upstreamOfStdout.pipe(process.stdout);
I prefer an answer that doesn't depend on the internals of any streams. Just given that the streams conform to the node stream interface, what is the canonical way to do this?
EDIT:
The larger context is a wrapper:
new Promise(function(resolve, reject){
stream.on(<some-event>, resolve);
... (perhaps something else here?)
});
where stream can be process.stdout or something else, which has another through stream piped into it.
My program exits whenever resolve is called -- I presume the Promise code keeps the program alive until all promises have been resolved.
I have encountered this situation several times, and have always used hacks to solve the problem (e.g. there are several private members in process.stdout that are useful.) But I really would like to solve this once and for all (or learn that it is a bug, so I can track the issue and fix my hacks when its resolved, at least): how do I tell when a stream downstream of another is finished processing its input?
Instead of writing directly to process.stdout, create a custom writable (shown below) which writes to stdout as a side effect.
const { Writable } = require('stream');
function writeStdoutAndFinish(){
return new Writable({
write(chunk, encoding, callback) {
process.stdout.write(chunk,callback);
},
});
};
The result of writeStdoutAndFinish() will emit a finish event.
async function main(){
...
await new Promise((resolve)=>{
someReadableStream.pipe(writeStdoutAndFinish()).on('finish',()=>{
console.log('finish received');
resolve();
})
});
...
}
In practice, I don't that the above approach differs in behavior from
async function main(){
...
await new Promise((resolve)=>{
(someReadableStream.on('end',()=>{
console.log('end received');
resolve();
})).pipe(process.stdout)
});
...
}
First of all, as far as I can see from the documentation, that stream never emits the finish event, so unlikely you can rely on that.
Moreover, from the documentation above mentioned, the drain event seems to be used to notify the user about when the stream is ready to accept more data once the .write method returned false. In any case you can deduce that that means that all the other data have been written. From the documentation for the write method indeed we deduce that the false value (aka please stop pushing data) is not mandatory and you can freely ignore it, but subsequent data will be probably stuffed in memory letting the use of it to grow up.
Because of that, basing my assumption on the sole documentation, I guess you can rely on the drain event to know when all the data have been nicely handled or are likely to be flushed out.
That said, it looks to me also that there is not a clear way to definitely know when all the data have been effectively sent to the console.
Finally, you can listen the end event of the piped stream to know when it has been fully consumed, no matter if it has been written to the console or the data are still buffered within the console stream.
Of course, you can also freely ignore the problem, for a fully consumed stream should be nicely handled by node.js, thus discarded and you have not to deal with it anymore once you have piped it to the second stream.
The documentation for node suggests that for the new best way to read streams is as follows:
var readable = getReadableStreamSomehow();
readable.on('readable', function() {
var chunk;
while (null !== (chunk = readable.read())) {
console.log('got %d bytes of data', chunk.length);
}
});
To me this seems to cause a blocking while loop. This would mean that if node is responding to an http request by reading and sending a file, the process would have to block while the chunk is read before it could be sent.
Isn't this blocking IO which node.js tries to avoid?
The important thing to note here is that it's not blocking in the sense that it's waiting for more input to arrive on the stream. It's simply retrieving the current contents of the stream's internal buffer. This kind of loop will finish pretty quickly since there is no waiting on I/O at all.
A stream can be both synchronous and asynchronous. If readable stream synchronously pushes data in the internal buffer then you'll get a synchronous stream. And yes, in that case if it pushes lots of data synchronously node's event loop won't be able to run until all the data is pushed.
Interestingly, if you even remove the while loop in readble callback, the stream module internally calls a while loop once and keeps running until all the pushed data is read.
But for asynchronous IO operations(e.g. http or fs module), they push data asynchronously in the buffer. So the while loop only runs when data is pushed in buffer and stops as soon as you've read the entire buffer.