Does the new way to read streams in Node cause blocking? - node.js

The documentation for node suggests that for the new best way to read streams is as follows:
var readable = getReadableStreamSomehow();
readable.on('readable', function() {
var chunk;
while (null !== (chunk = readable.read())) {
console.log('got %d bytes of data', chunk.length);
}
});
To me this seems to cause a blocking while loop. This would mean that if node is responding to an http request by reading and sending a file, the process would have to block while the chunk is read before it could be sent.
Isn't this blocking IO which node.js tries to avoid?

The important thing to note here is that it's not blocking in the sense that it's waiting for more input to arrive on the stream. It's simply retrieving the current contents of the stream's internal buffer. This kind of loop will finish pretty quickly since there is no waiting on I/O at all.

A stream can be both synchronous and asynchronous. If readable stream synchronously pushes data in the internal buffer then you'll get a synchronous stream. And yes, in that case if it pushes lots of data synchronously node's event loop won't be able to run until all the data is pushed.
Interestingly, if you even remove the while loop in readble callback, the stream module internally calls a while loop once and keeps running until all the pushed data is read.
But for asynchronous IO operations(e.g. http or fs module), they push data asynchronously in the buffer. So the while loop only runs when data is pushed in buffer and stops as soon as you've read the entire buffer.

Related

Can counter value be not in sorted order?

I am trying to understand node.js streams. Since the write method on the writer object method is asynchronous, can counter value be out of order?
const writer = fs.createWriteStream(path.resolve("modules", "streams", "hello"));
writer.on("finish", () => console.log("Finished Writing"));
for (let i = 0; i < 1000; i++) writer.write(`Hello:${i}`); // this is async, according to me
writer.end();
console.log("Hello");
Outputs:
Hello
Finished Writing
No, the counter won't be out of order. Streams insert items into the outgoing buffer as you call .write() and all your writes are called in order.
But, a loop like this needs flow control on the writer.write() because if it returns false, then you can't write any more until the drain event on the stream indicates that there is more room for writing. See here in the doc for more info.
The write.write() function is asynchronous, but because of buffering and other events on the stream you don't necessarily have to register a complete callback for every write. The write.write() puts data in the outbound buffer and returns true immediately when the data has been successfully buffered or it returns false immediately if there's no room in the buffer.
The actual write to the file stream is indeed asynchronous and occurs largely out of your view. Errors from the asynchronous part are communicated via the error event on the stream and, as you're already doing, completion of the stream is also communicated via an event.
The reason you get Finished Writing after Hello is because of the asynchronous writing behind the scenes. Your for loop sets off the writes, but they are not yet complete when the for loop is done. They finish some time later (when you see the finish event).

What is the advantage of using pipe function over res.write

The framework is Express.
When I'm sending a request from within an end point and start receiving data, either I can read data in chunks and write them instantly:
responseHandler.on('data', (chunk) => {
res.write(chunk);
});
Or I can create a writable stream and pipe the response to that.
responseHandler.pipe(res)
It is obvious that the pipe function takes care of the former process with more dimensions to it. What are they?
The most important difference between managing event handlers and using readable.pipe(writable) is that using pipe:
The flow of data will be automatically managed so that the destination Writable stream is not overwhelmed by a faster Readable stream. Pipe
It means that readable stream may be faster than writable and pipe handles that logic. If you are writing code like:
responseHandler.on('data', (chunk) => {
res.write(chunk);
});
res.write() function
Returns: (boolean) false if the stream wishes for the calling code to wait for the 'drain' event to be emitted before continuing to write additional data; otherwise true. Link
It means that writable stream could be not ready to handle more data. So you can manage this manually as mentioned in writable.write() example.
In some cases you do not have readable stream and you could write to writable stream using writable.write().
Example
const data = []; // array of some data.
data.forEach((d) => writable.write(d));
But again, you must see what writable.write returns. If it is false you must act in a manual fashion to adjust stream flow.
Another way is to wrap your data into readable stream and just pipe it.
By the way, there is one more great advantage of using pipes. You can chain them by your needs, for instance:
readableStream
.pipe(modify) // transform stream
.pipe(zip) // transform stream
.pipe(writableStream);
By summing everything up piggyback on node.js given functionality to manage streams if possible. In most cases it will help you avoid extra complexity and it will not be slower compared to managing it manually.

Does write() (without callback) preserve order in node.js write streams?

I have a node.js program in which I use a stream to write information to a SFTP server. Something like this (simplified version):
var conn = new SSHClient();
process.nextTick(function (){
conn.on('ready', function () {
conn.sftp(function (error, sftp) {
var writeStream = sftp.createWriteStream(filename);
...
writeStream.write(line1);
writeStream.write(line2);
writeStream.write(line3);
...
});
}).connect(...);
});
Note I'm not using the (optional) callback argument (described in the write() API specification) and I'm not sure if this may cause undesired behaviour (i.e. lines not writen in the following order: line1, line2, line3). In other words, I don't know if this alternative (more complex code and not sure if less efficient) should be used:
writeStream.write(line1, ..., function() {
writeStream.write(line2, ..., function() {
writeStream.write(line3);
});
});
(or equivalent alternative using async series())
Empirically in my tests I have always get the file writen in the desired order (I mean, iirst line1, then line2 and finally line3). However, I don't now if this has happened just by chance or the above is the right way of using write().
I understand that writing in stream is in general asynchronous (as all I/O work should be) but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Examples of usage of write() in real programs are very welcomed. Thanks!
Does write() (without callback) preserve order in node.js write streams?
Yes it does. It preserves order of your writes to that specific stream. All data you're writing goes through the stream buffer which serializes it.
but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Yes, all data does go through a stream buffer. The .write() operation does not return until the data has been successfully copied into the buffer unless an error occurs.
Note, that if you are writing any significant amount of data, you may have to pay attention to flow control (often called back pressure) on the stream. It can back up and may tell you that you need to wait before writing more, but it does buffer your writes in the order you send them.
If the .write() operation returns false, then the stream is telling you that you need to wait for the drain event before writing any more. You can read about this issue in the node.js docs for .write() and in this article about backpressure.
Your code also needs to listen for the error event to detect any errors upon writing the stream. Because the writes are asynchronous, they may occur at some later time and are not necessarily reflected in either the return value from .write() or in the err parameter to the .write() callback. You have to listen for the error event to make sure you see errors on the stream.

Using callbacks with Socket IO

I'm using node and socket io to stream twitter feed to the browser, but the stream is too fast. In order to slow it down, I'm attempting to use setInterval, but it either only delays the start of the stream (without setting evenly spaced intervals between the tweets) or says that I can't use callbacks when broadcasting. Server side code below:
function start(){
stream.on('tweet', function(tweet){
if(tweet.coordinates && tweet.coordinates != null){
io.sockets.emit('stream', tweet);
}
});
}
io.sockets.on("connection", function(socket){
console.log('connected');
setInterval(start, 4000);
});
I think you're misunderstanding how .on() works for streams. It's an event handler. Once it is installed, it's there and the stream can call you at any time. Your interval is actually just making things worse because it's installing multiple .on() handlers.
It's unclear what you mean by "data coming too fast". Too fast for what? If it's just faster than you want to display it, then you can just store the tweets in an array and then use timers to decide when to display things from the array.
If data from a stream is coming too quickly to even store and this is a flowing nodejs stream, then you can pause the stream with the .pause() method and then, when you're able to go again, you can call .resume(). See http://nodejs.org/api/stream.html#stream_readable_pause for more info.

Trouble writing log data with Node.JS I/O

I am interfacing Node.JS with a library that provides an iterator-style access to data:
next = log.get_next()
I effectively want to write the following:
while (next = log.get_next()) {
console.log(next);
}
and redirect stdout to a file (e.g. node log.js > log.txt). This works well for small logs, but for large lots the output file is empty and my memory usage goes through the roof.
It appears I don't fully understand I/O in node, as a simple infinite loop that writes a string to the console also exhibits the same behavior.
Some advice on how to accomplish this task would be great. Thanks.
The WriteStream class buffers i/o and if you're never yielding the thread, the queued writes never get serviced. The best approach is to write a reasonable chunk of data, then wait for the buffer to clear before writing again. The WriteStream class emits a 'drain' event that tells you when the buffer has been fully flushed. Here's an example:
var os = require('os');
process.stdout.on('drain', function(){
dump();
});
function dump(){
for (var i=0; i<10000; i++)
console.log('xxxx');
console.error(os.freemem());
}
dump();
If you run like:
node testbuffer > output
you'll see that the file grows periodically and the memory reaches a steady state.
The library you're interfacing with ought to accept a callback. Node.js is designed to be non-blocking. I think that perhaps console.log keeps returning control to the loop (and log.get_next()) before it sends the output.
If the module was rewritten to make get_next support a callback, improved code might be like this:
var log_next = function() {
console.log(next);
log.get_next(log_next);
};
log.get_next(log_next);
(There are libraries and patterns that could make this code prettier.)
If the code is only synchronous and has to stay as it is, calling setTimeout with 0 or another small number could keep it from blocking the entire process.
var log_next = function() {
console.log(log.get_next());
setTimeout(log_next, 0);
};
log_next();

Resources