Why use 'data' event over 'readable' event? - node.js

The stream documentation says this:
Note: In general, the readable.pipe() and 'data' event mechanisms are
preferred over the use of the 'readable' event.
Why is 'data' event preferred over 'readable' event? 'readable' event seems to be a better approach since it provides flow control through back-pressure and also gives more control to the application on when to handle the available data.

These is basic difference between the working principle of readable and data event are.
readable event buffers the data and once avaiable it can be read.
data event callback is called when chunk of data is available and
forces you to handle that.
readable event is good in case you are handling large data
for smaller chunk data event is efficient to use

Related

Node.js Streams: When will _writev Be Invoked?

The Node.js documentation makes the following comments about a Writable stream's _writev method.
The writable._writev() method may be implemented in addition or alternatively to writable._write() in stream implementations that are capable of processing multiple chunks of data at once. If implemented and if there is buffered data from previous writes, _writev() will be called instead of _write().
Emphasis mine. In what scenarios can a Node.js writable stream have buffered data from previous writes?
Is the _writev method only called after uncorking a corked stream that's had data written to it? Or are there other scenarios where a stream can have buffered date from previous writes? Bonus point if you can point to the place in the Node.js source code where it makes a decisions w/r/t to calling _write or _writev.
_writev() will be called whenever there is more than one piece of data buffered from the stream and the function has been defined. Using cork() could cause more data to be buffered, but so could slow processing.
The code that guards _writev is in lib/internal/streams/writable.js. There is a buffer decision and then the guard for the write.

Node JS Streams: Understanding data concatenation

One of the first things you learn when you look at node's http module is this pattern for concatenating all of the data events coming from the request read stream:
let body = [];
request.on('data', chunk => {
body.push(chunk);
}).on('end', () => {
body = Buffer.concat(body).toString();
});
However, if you look at a lot of streaming library implementations they seem to gloss over this entirely. Also, when I inspect the request.on('data',...) event it almost ever only emits once for a typical JSON payload with a few to a dozen properties.
You can do things with the request stream like pipe it through some transforms in object mode and through to some other read streams. It looks like this concatenating pattern is never needed.
Is this because the request stream in handling POST and PUT bodies pretty much only ever emits one data event which is because their payload is way below the chunk partition size limit?. In practice, how large would a JSON encoded object need to be to be streamed in more than one data chunk?
It seems to me that objectMode streams don't need to worry about concatenating because if you're dealing with an object it is almost always no larger than one data emitted chunk, which atomically transforms to one object? I could see there being an issue if a client were uploading something like a massive collection (which is when a stream would be very useful as long as it could parse the individual objects in the collection and emit them one by one or in batches).
I find this to probably be the most confusing aspect of really understanding the node.js specifics of streams, there is a weird disconnect between streaming raw data, and dealing with atomic chunks like objects. Do objectMode stream transforms have internal logic for automatically concatenating up to object boundaries? If someone could clarify this it would be very appreciated.
The job of the code you show is to collect all the data from the stream into one buffer so when the end event occurs, you then have all the data.
request.on('data',...) may emit only once or it may emit hundreds of times. It depends upon the size of the data, the configuration of the stream object and the type of stream behind it. You cannot ever reliably assume it will only emit once.
You can do things with the request stream like pipe it through some transforms in object mode and through to some other read streams. It looks like this concatenating pattern is never needed.
You only use this concatenating pattern when you are trying to get the entire data from this stream into a single variable. The whole point of piping to another stream is that you don't need to fetch the entire data from one stream before sending it to the next stream. .pipe() will just send data as it arrives to the next stream for you. Same for transforms.
Is this because the request stream in handling POST and PUT bodies pretty much only ever emits one data event which is because their payload is way below the chunk partition size limit?.
It is likely because the payload is below some internal buffer size and the transport is sending all the data at once and you aren't running on a slow link and .... The point here is you cannot make assumptions about how many data events there will be. You must assume there can be more than one and that the first data event does not necessarily contain all the data or data separated on a nice boundary. Lots of things can cause the incoming data to get broken up differently.
Keep in mind that a readStream reads data until there's momentarily no more data to read (up to the size of the internal buffer) and then it emits a data event. It doesn't wait until the buffer fills before emitting a data event. So, since all data at the lower levels of the TCP stack is sent in packets, all it takes is a momentary delivery delay with some packet and the stream will find no more data available to read and will emit a data event. This can happen because of the way the data is sent, because of things that happen in the transport over which the data flows or even because of local TCP flow control if lots of stuff is going on with the TCP stack at the OS level.
In practice, how large would a JSON encoded object need to be to be streamed in more than one data chunk?
You really should not know or care because you HAVE to assume that any size object could be delivered in more than one data event. You can probably safely assume that a JSON object larger than the internal stream buffer size (which you could find out by studying the stream code or examining internals in the debugger) WILL be delivered in multiple data events, but you cannot assume the reverse because there are other variables such as transport-related things that can cause it to get split up into multiple events.
It seems to me that objectMode streams don't need to worry about concatenating because if you're dealing with an object it is almost always no larger than one data emitted chunk, which atomically transforms to one object? I could see there being an issue if a client were uploading something like a massive collection (which is when a stream would be very useful as long as it could parse the individual objects in the collection and emit them one by one or in batches).
Object mode streams must do their own internal buffering to find the boundaries of whatever objects they are parsing so that they can emit only whole objects. At some low level, they are concatenating data buffers and then examining them to see if they yet have a whole object.
Yes, you are correct that if you were using an object mode stream and the object themselves were very large, they could consume a lot of memory. Likely this wouldn't be the most optimal way of dealing with that type of data.
Do objectMode stream transforms have internal logic for automatically concatenating up to object boundaries?
Yes, they do.
FYI, the first thing I do when making http requests is to go use the request-promise library so I don't have to do my own concatenating. It handles all this for you. It also provides a promise-based interface and about 100 other helpful features which I find helpful.

Check if NodeJS ClearTextStream stream is "ended"?

I have a ClearTextStream for a TLS connection and I want to check if "end" was already called. The actual problem is, that I'm trying to write something into the stream and I get an "write after end" error.
Now to avoid that, I just want to check if "end" was already called. I do have an "close" event, but it isn't fired in all cases.
I can't find it in the documentation and I couldn't find anything like that by googling.
I could check the error event (which is throwing "write after end" for me) and handle the situation there - but is there really no way to check this in the beginning?
Thanks!
If you get a write after end error, that means that you are trying to write data to a Writable stream that has been closed (ie. that can't accept anymore input data). When a writable stream closes, the finish event is emitted (see the documentation). On the other hand, the close event is emitted by a Readable stream, when the underlying resource is closed (for instance when the file descriptor you are reading is closed).
As a ClearTextStream is a Duplex stream, it can emit both close and finish events, but they don't mean the same thing. In your particular case, you should listen to the finish event and react appropriately.
Another solution would be to check the this.ended and this.finished booleans (see the source code), but I wouldn't recommend that as they are private variables and only reflect the implementation details, not the public API.

What is Streams3 in Node.js and how does it differ from Streams2?

I've often heard of Streams2 and old-streams, but what is Streams3? It get mentioned in this talk by Thorsten Lorenz.
Where can I read about it, and what is the difference between Streams2 and Streams3.
Doing a search on Google, I also see it mentioned in the Changelog of Node 0.11.5,
stream: Simplify flowing, passive data listening (streams3) (isaacs)
I'm going to give this a shot, but I've probably got it wrong. Having never written Streams1 (old-streams) or Streams2, I'm probably not the right guy to self-answer this one, but here it goes. It seems as if there is Streams1 API that still persists to some degree. In Streams2, there are two modes of streams flowing (legacy), and non-flowing. In short, the shim that supported flowing mode is going away. This was the message that lead to the patch now called called Streams3,
Same API as streams2, but remove the confusing modality of flowing/old
mode switch.
Every time read() is called, and returns some data, a data event fires.
resume() will make it call read() repeatedly. Otherwise, no change.
pause() will make it stop calling read() repeatedly.
pipe(dest) and on('data', fn) will automatically call resume().
No switches into old-mode. There's only flowing, and paused. Streams start out paused.
Unfortunately, to understand any of description which defines Streams3 pretty well, you need to first understand Streams1, and the legacy streams
Backstory
First, let's take a look at what the Node v0.10.25 docs say about the two modes,
Readable streams have two "modes": a flowing mode and a non-flowing mode. When in flowing mode, data is read from the underlying system and provided to your program as fast as possible. In non-flowing mode, you must explicitly call stream.read() to get chunks of data out. — Node v0.10.25 Docs
Isaac Z. Schlueter said in November slides I dug up:
streams2
"suck streams"
Instead of 'data' events spewing, call read() to pull data from source
Solves all problems (that we know of)
So it seems as if in streams1, you'd create an object and call .on('data', cb) to that object. This would set the event to be trigger, and then you were at the mercy of the stream. In Streams2 internally streams have buffers and you request data from those streams explicitly (using `.read). Isaac goes on to specify how backwards compat works in Streams2 to keep Streams1 (old-stream) modules functioning
old-mode streams1 shim
New streams can switch into old-mode, where they spew 'data'
If you add a 'data' event handler, or call pause() or resume(), then switch
Making minimal changes to existing tests to keep us honest
So in Streams2, a call to .pause() or .resume() triggers the shim. And, it should, right? In Streams2 you have control over when to .read(), and you're not catching stuff being thrown at you. This triggered a legacy mode that acted independently of Streams2.
Let's take an example from Isaac's slide,
createServer(function(q,s) {
// ADVISORY only!
q.pause()
session(q, function(ses) {
q.on('data', handler)
q.resume()
})
})
In Streams1, q starts up right away reading and emitting (likely losing data), until the call to q.pause advises q to stop pulling in data but not from emitting events to clear what it already read.
In Streams2, q starts off paused until the call to .pause() which signifies to emulate the old mode.
In Streams3, q starts off as paused having never read from the file handle making the q.pause() a noop, and on the call to q.on('data', cb) will call q.resume until there is no more data in the buffer. And, then call again q.resume doing the same thing.
Seems like Streams3 was introduced in io.js, then in Node 0.11+
Streams 1 Supported data being pushed to a stream. There was no consumer control, data was thrown at the consumer whether it was ready or not.
Streams 2 allows data to be pushed to a stream as per Streams 1, or for a consumer to pull data from a stream as needed. The consumer could control the flow of data in pull mode (using stream.read() when notified of available data). The stream can not support both push and pull at the same time.
Streams 3 allows pull and push data on the same stream.
Great overview here:
https://strongloop.com/strongblog/whats-new-io-js-beta-streams3/
A cached version (accessed 8/2020) is here: https://hackerfall.com/story/whats-new-in-iojs-10-beta-streams-3
I suggest you read the documentation, more specifically the section "API for Stream Consumers", it's actually very understandable, besides I think the other answer is wrong: http://nodejs.org/api/stream.html#stream_readable_read_size

When does Node emit a data event?

I'm looking at implementing a node server which will be receiving uploads of potentially large files and forwarding the data on through another stream. I've found this article:
http://www.componentix.com/blog/13/file-uploads-using-nodejs-once-again
Which has some useful code examples around handling the various events as well as the pump problem with different speeds of the streams on both sides. What's still not clear to me (and what I can't seem to find documentation for) is when exactly the 'data' event is emitted for the incoming stream by node.
The node docs state:
Event: 'data'
Emitted when data is received. The argument data will be a Buffer or
String. Encoding of data is set by socket.setEncoding(). (See the
Readable Stream section for more information.)
What is meant by "when data is received"? Is this fired when the incoming data chunk reaches a certain size? When the incoming connection is closed? After a certain time?
The stream has an internal buffer that it uses to store the data until it's ready to fire the data event. That might be a few cases depending on the type of stream: internal buffer full, all data read, connection closed, etc.
The network stream is probably firing the data event with whatever data received from the socket's read method. If I can find it in the node source, I'll reference it.

Resources