Node js readable streams events differences - node.js

I see many examples online using either 'data', 'open' and 'readable'. All seem to accomplish the same goal of streaming the input data / chunking input data. Why the variation and what's the exact differences between each event and when to use which for reading data?
Simple code examples:
readStream.on('open', function () {
// This just pipes the read stream to the response object (which goes to the client)
readStream.pipe(res);
});
readerStream.on('data', function(chunk) {
data += chunk;
});

From node.js documentation:
Event data:
The data event is emitted whenever the stream is relinquishing ownership of a chunk of data to a consumer. This may occur whenever the stream is switched in flowing mode by calling readable.pipe(), readable.resume(), or by attaching a listener callback to the data event. The 'data' event will also be emitted whenever the readable.read() method is called and a chunk of data is available to be returned.
Attaching a data event listener to a stream that has not been explicitly paused will switch the stream into flowing mode. Data will then be passed as soon as it is available.
Find more about: https://nodejs.org/api/stream.html#class-streamreadable

Related

What is the advantage of using pipe function over res.write

The framework is Express.
When I'm sending a request from within an end point and start receiving data, either I can read data in chunks and write them instantly:
responseHandler.on('data', (chunk) => {
res.write(chunk);
});
Or I can create a writable stream and pipe the response to that.
responseHandler.pipe(res)
It is obvious that the pipe function takes care of the former process with more dimensions to it. What are they?
The most important difference between managing event handlers and using readable.pipe(writable) is that using pipe:
The flow of data will be automatically managed so that the destination Writable stream is not overwhelmed by a faster Readable stream. Pipe
It means that readable stream may be faster than writable and pipe handles that logic. If you are writing code like:
responseHandler.on('data', (chunk) => {
res.write(chunk);
});
res.write() function
Returns: (boolean) false if the stream wishes for the calling code to wait for the 'drain' event to be emitted before continuing to write additional data; otherwise true. Link
It means that writable stream could be not ready to handle more data. So you can manage this manually as mentioned in writable.write() example.
In some cases you do not have readable stream and you could write to writable stream using writable.write().
Example
const data = []; // array of some data.
data.forEach((d) => writable.write(d));
But again, you must see what writable.write returns. If it is false you must act in a manual fashion to adjust stream flow.
Another way is to wrap your data into readable stream and just pipe it.
By the way, there is one more great advantage of using pipes. You can chain them by your needs, for instance:
readableStream
.pipe(modify) // transform stream
.pipe(zip) // transform stream
.pipe(writableStream);
By summing everything up piggyback on node.js given functionality to manage streams if possible. In most cases it will help you avoid extra complexity and it will not be slower compared to managing it manually.

Does write() (without callback) preserve order in node.js write streams?

I have a node.js program in which I use a stream to write information to a SFTP server. Something like this (simplified version):
var conn = new SSHClient();
process.nextTick(function (){
conn.on('ready', function () {
conn.sftp(function (error, sftp) {
var writeStream = sftp.createWriteStream(filename);
...
writeStream.write(line1);
writeStream.write(line2);
writeStream.write(line3);
...
});
}).connect(...);
});
Note I'm not using the (optional) callback argument (described in the write() API specification) and I'm not sure if this may cause undesired behaviour (i.e. lines not writen in the following order: line1, line2, line3). In other words, I don't know if this alternative (more complex code and not sure if less efficient) should be used:
writeStream.write(line1, ..., function() {
writeStream.write(line2, ..., function() {
writeStream.write(line3);
});
});
(or equivalent alternative using async series())
Empirically in my tests I have always get the file writen in the desired order (I mean, iirst line1, then line2 and finally line3). However, I don't now if this has happened just by chance or the above is the right way of using write().
I understand that writing in stream is in general asynchronous (as all I/O work should be) but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Examples of usage of write() in real programs are very welcomed. Thanks!
Does write() (without callback) preserve order in node.js write streams?
Yes it does. It preserves order of your writes to that specific stream. All data you're writing goes through the stream buffer which serializes it.
but I wonder if streams in node.js keep an internal buffer or similar that keeps data ordered, so each write() call doesn't return until the data has been put in this buffer.
Yes, all data does go through a stream buffer. The .write() operation does not return until the data has been successfully copied into the buffer unless an error occurs.
Note, that if you are writing any significant amount of data, you may have to pay attention to flow control (often called back pressure) on the stream. It can back up and may tell you that you need to wait before writing more, but it does buffer your writes in the order you send them.
If the .write() operation returns false, then the stream is telling you that you need to wait for the drain event before writing any more. You can read about this issue in the node.js docs for .write() and in this article about backpressure.
Your code also needs to listen for the error event to detect any errors upon writing the stream. Because the writes are asynchronous, they may occur at some later time and are not necessarily reflected in either the return value from .write() or in the err parameter to the .write() callback. You have to listen for the error event to make sure you see errors on the stream.

Does the new way to read streams in Node cause blocking?

The documentation for node suggests that for the new best way to read streams is as follows:
var readable = getReadableStreamSomehow();
readable.on('readable', function() {
var chunk;
while (null !== (chunk = readable.read())) {
console.log('got %d bytes of data', chunk.length);
}
});
To me this seems to cause a blocking while loop. This would mean that if node is responding to an http request by reading and sending a file, the process would have to block while the chunk is read before it could be sent.
Isn't this blocking IO which node.js tries to avoid?
The important thing to note here is that it's not blocking in the sense that it's waiting for more input to arrive on the stream. It's simply retrieving the current contents of the stream's internal buffer. This kind of loop will finish pretty quickly since there is no waiting on I/O at all.
A stream can be both synchronous and asynchronous. If readable stream synchronously pushes data in the internal buffer then you'll get a synchronous stream. And yes, in that case if it pushes lots of data synchronously node's event loop won't be able to run until all the data is pushed.
Interestingly, if you even remove the while loop in readble callback, the stream module internally calls a while loop once and keeps running until all the pushed data is read.
But for asynchronous IO operations(e.g. http or fs module), they push data asynchronously in the buffer. So the while loop only runs when data is pushed in buffer and stops as soon as you've read the entire buffer.

Using callbacks with Socket IO

I'm using node and socket io to stream twitter feed to the browser, but the stream is too fast. In order to slow it down, I'm attempting to use setInterval, but it either only delays the start of the stream (without setting evenly spaced intervals between the tweets) or says that I can't use callbacks when broadcasting. Server side code below:
function start(){
stream.on('tweet', function(tweet){
if(tweet.coordinates && tweet.coordinates != null){
io.sockets.emit('stream', tweet);
}
});
}
io.sockets.on("connection", function(socket){
console.log('connected');
setInterval(start, 4000);
});
I think you're misunderstanding how .on() works for streams. It's an event handler. Once it is installed, it's there and the stream can call you at any time. Your interval is actually just making things worse because it's installing multiple .on() handlers.
It's unclear what you mean by "data coming too fast". Too fast for what? If it's just faster than you want to display it, then you can just store the tweets in an array and then use timers to decide when to display things from the array.
If data from a stream is coming too quickly to even store and this is a flowing nodejs stream, then you can pause the stream with the .pause() method and then, when you're able to go again, you can call .resume(). See http://nodejs.org/api/stream.html#stream_readable_pause for more info.

net module in node.js

I'm trying to make a server based on the net module. what I don't understand is on which event I'm supposed to put the response code:
on(data,function()) could still be in the middle of receiving more data from the stream (so it might be to early to reply)
and on(end,function()) is after the connection is closed .
thank you for your help
The socket event ('data'), calls the callback function every time an incoming data buffer is ready for reading,, and the event emits the socket buffer of data,,
so use this,,
socket.on('data',function(data){
// Here is the function to detect the real data in stream
});
this can help for node v0.6.5, http://nodejs.org/docs/v0.6.5/api/net.html#event_data_
and this for clear understanding for the Readable streames,
http://nodejs.org/docs/v0.6.5/api/streams.html#readable_Stream

Resources