What are the roles of _read and read in Node JS streams? - node.js

I'm really just looking for clarification on how these work. IMO the documentation on streams is somewhat lacking, and there actually aren't a lot of resources out their that comprehensively explain explain how they're are meant to work and be extended.
My question can be broken down into two parts
One, What is the role of the _read function within the stream module? When I run this code it endlessly prints out "hello world" until null is pushed onto the stream buffer. This seems to indicate that _read is called in some kind of loop that waits for a null in the buffer, but I can't find documentation anywhere that states this in explicit terms.
var Readable = require('stream').Readable
var rs = Readable()
rs._read = function () {
rs.push("hello world")
rs.push(null)
};
rs.on("data", function(data){
console.log("some data", data)
})
Two, what does read actually do? My understanding is that read consumes data from the read stream buffer, and fires the data event. Is that all that's going on here?

read() is something that a consumer of the readStream calls if they want to specifically read some bytes from the stream (when the stream is not flowing).
_read() is an internal method that is part of the internal implementation of the read stream. The internals of the stream call this method (it is NOT to be called from the outside) when the stream is flowing and the stream wants to get more data from the source. When called the _read() method pushes data with .push(data) or if it has no more data, then it does a .push(null).
You can see an explanation and example here in this article.
_read(size) {
if (this.data.length) {
const chunk = this.data.slice(0, size);
this.data = this.data.slice(size, this.data.length);
this.push(chunk);
} else {
this.push(null); // 'end', no more data
}
}
If you were implementing a read stream to some custom source of data, then you would implement the _read() method to fetch up to the size amount of data from your source and .push() that data into the stream.

Related

Why while loop is needed for reading a non-flowing mode stream in Node.js?

In the node.js documentation, I came across the following code
const readable = getReadableStreamSomehow();
// 'readable' may be triggered multiple times as data is buffered in
readable.on('readable', () => {
let chunk;
console.log('Stream is readable (new data received in buffer)');
// Use a loop to make sure we read all currently available data
while (null !== (chunk = readable.read())) {
console.log(`Read ${chunk.length} bytes of data...`);
}
});
// 'end' will be triggered once when there is no more data available
readable.on('end', () => {
console.log('Reached end of stream.');
});
Here is the comment from the node.js documentation concerning the usage of the while loop, saying it's needed to make sure all data is read
// Use a loop to make sure we read all currently available data
while (null !== (chunk = readable.read())) {
I couldn't understand why it is needed and tried to replace while with just if statement, and the process terminated after the very first read. Why?
From the node.js documentation
The readable.read() method should only be called on Readable streams operating in paused mode. In flowing mode, readable.read() is called automatically until the internal buffer is fully drained.
Be careful that this method is only meant for stream that has been paused.
And even further, if you understand what a stream is, you'll understand that you need to process chunks of data.
Each call to readable.read() returns a chunk of data, or null. The chunks are not concatenated. A while loop is necessary to consume all data currently in the buffer.
So i hope you understand that if you are not looping through your readable stream and only executing 1 read, you won't get your full data.
Ref: https://nodejs.org/api/stream.html

Why does my Node.js stream return something when invoking its read function?

I'm having trouble understanding, why the following works, that is, why the invocations of the read() function actually return the objects stored in the readable stream.
const { Readable } = require('stream')
var r = new Readable({objectMode: true, read: () => {}}) // dummy read
var a = [1,2,3,4,5,6,7]
r.push(...a)
Now, when I invoke r.read() I get the numbers I pushed into my readable stream r
r.read() // -> 1
r.read() // -> 2
// etc
But I provided a "dummy" read function (read: () => {}) above when creating my readable stream. So, why do I get values back, when calling read?
Help will be much appreciated.
The answer is simple. You're calling the push method which should be called by your read implementation.
The purpose of push is to say: here's what I've read from the source, but it doesn't have to be called from within the internal methods.
In other words in the process:
wait for _read to be called
_read something from source
push the read chunks to stream
return the chunks from read
You simply skipped the two first steps and pushed the data from outside.

What's the node.js paradigm for socket stream conversation?

I'm trying to implement a socket protocol and it is unclear to me how to proceed. I have the socket as a Stream object, and I am able to write() data to it to send on the socket, and I know that the "readable" or "data" events can be used to receive data. But this does not work well when the protocol involves a conversation in which one host is supposed to send a piece of data, wait for a response, and then send data again after the response.
In a block paradigm it would look like this:
send some data
wait for specific data reply
massage data and send it back
send additional data
As far as I can tell, node's Stream object does not have a read function that will asynchronously return with the number of bytes requested. Otherwise, each wait could just put the remaining functionality in its own callback.
What is the node.js paradigm for this type of communication?
Technically there is a Readable.read() but its not recommended (maybe you can't be sure of the size or it blocks, not sure.) You can keep track of state and on each data event add to a Buffer that you keep processing incrementally. You can use readUInt32LE etc. on Buffer to read specific pieces of binary data if you need to do that (or you can convert to string if its textual data). https://github.com/runvnc/metastream/blob/master/index.js
If you want to write it in your 'block paradigm', you could basically make some things a promise or async function and then
let specialReplyRes = null;
waitForSpecialReply = f => new Promise( res => specialReplyRes = res);
stream.on('data', (buff) => {
if (buff.toString().indexOf('special')>=0) specialReplyRes(buff.toString());
});
// ...
async function proto() {
stream.write(data);
let reply = await waitForSpecialReply();
const message = massage(reply);
stream.write(message);
}
Where your waitForSpecialReply promise is stored and resolved after a certain message is received through your parsing.

Basic streams issue: Difficulty sending a string to stdout

I'm just starting learning about streams in node. I have a string in memory and I want to put it in a stream that applies a transformation and pipe it through to process.stdout. Here is my attempt to do it:
var through = require('through');
var stream = through(function write(data) {
this.push(data.toUpperCase());
});
stream.push('asdf');
stream.pipe(process.stdout);
stream.end();
It does not work. When I run the script on the cli via node, nothing is sent to stdout and no errors are thrown. A few questions I have:
If you have a value in memory that you want to put into a stream, what is the best way to do it?
What is the difference between push and queue?
Does it matter if I call end() before or after calling pipe()?
Is end() equivalent to push(null)?
Thanks!
Just use the vanilla stream API
var Transform = require("stream").Transform;
// create a new Transform stream
var stream = new Transform({
decodeStrings: false,
encoding: "ascii"
});
// implement the _transform method
stream._transform = function _transform(str, enc, done) {
this.push(str.toUpperCase() + "\n";
done();
};
// connect to stdout
stream.pipe(process.stdout);
// write some stuff to the stream
stream.write("hello!");
stream.write("world!");
// output
// HELLO!
// WORLD!
Or you can build your own stream constructor. This is really the way the stream API is intended to be used
var Transform = require("stream").Transform;
function MyStream() {
// call Transform constructor with `this` context
// {decodeStrings: false} keeps data as `string` type instead of `Buffer`
// {encoding: "ascii"} sets the encoding for our strings
Transform.call(this, {decodeStrings: false, encoding: "ascii"});
// our function to do "work"
function _transform(str, encoding, done) {
this.push(str.toUpperCase() + "\n");
done();
}
// export our function
this._transform = _transform;
}
// extend the Transform.prototype to your constructor
MyStream.prototype = Object.create(Transform.prototype, {
constructor: {
value: MyStream
}
});
Now use it like this
// instantiate
var a = new MyStream();
// pipe to a destination
a.pipe(process.stdout);
// write data
a.write("hello!");
a.write("world!");
Output
HELLO!
WORLD!
Some other notes about .push vs .write.
.write(str) adds data to the writable buffer. It is meant to be called externally. If you think of a stream like a duplex file handle, it's just like fwrite, only buffered.
.push(str) adds data to the readable buffer. It is only intended to be called from within our stream.
.push(str) can be called many times. Watch what happens if we change our function to
function _transform(str, encoding, done) {
this.push(str.toUpperCase());
this.push(str.toUpperCase());
this.push(str.toUpperCase() + "\n");
done();
}
Output
HELLO!HELLO!HELLO!
WORLD!WORLD!WORLD!
First, you want to use write(), not push(). write() puts data in to the stream, push() pushes data out of the stream; you only use push() when implementing your own Readable, Duplex, or Transform streams.
Second, you'll only want to write() data to the stream after you've setup the pipe() (or added some event listeners). If you write to a stream with nothing wired to the other end, the data you've written will be lost. As #naomik pointed out, this isn't true in general since a Writable stream will buffer write()s. In your example you do need to write() after pipe() though. Otherwise, the process will end before writing anything to STDOUT. This is possibly due to how the through module is implemented, but I don't know that for sure.
So, with that in mind, you can make a couple simple changes to your example to get it to work:
var through = require('through');
var stream = through(function write(data) {
this.push(data.toUpperCase());
});
stream.pipe(process.stdout);
stream.write('asdf');
stream.end();
Now, for your questions:
The easiest way to get data from memory in to a writable stream is to simply write() it, just like we're doing with stream.wrtie('asdf') in your example.
As far as I know, the stream doesn't have a queue() function, did you mean write()? Like I said above, write() is used to put data in to a stream, push() is used to push data out of the stream. Only call push() in your owns stream implementations.
Only call end() after all your data has been written to your stream. end() basically says: "Ok, I'm done now. Please finish what you're doing and close the stream."
push(null) is pretty much equivalent to end(). That being said, don't call push(null) unless you're doing it inside your own stream implementation (as stated above). It's almost always more appropriate to call end().
Based on the examples for stream (http://nodejs.org/api/stream.html#stream_readable_pipe_destination_options)
and through (https://www.npmjs.org/package/through)
it doesn't look like you are using your stream correctly... What happens if you use write(...) instead of push(...)?

How to implement setEncoding in my own Node.JS readable Stream?

In my program, I have a readable Stream source which emits data events from a shared stream of byte data. I then interpret the data to extract Buffer objects, and then pass them to the appropriate function. The most elegant way of passing the Buffers to the rest of my program seems to be using a readable Stream object. That way my functions can read the stream the same way as if it were standalone stream.
Here is my current code:
var stream = new stream.Stream()
stream.readable = true
stream.pause = function () {
source.pause()
}
stream.resume = function () {
source.resume()
}
stream.readable = true
// in a loop, read the data from source stream, dissect the parts we want,
// and emit Buffer objects to the stream's data event.
How can I implement setEncoding?
Edit: I think the answer is related to this line in the stream.js source code
Stream.Readable = require('_stream_readable');
Furthermore, the correct answer may be to
replace new stream.Stream with new stream.Readable and to
replace stream.emit('data', buffer) with
either stream.onread(buffer) or stream.push(buffer). The thing is, I don't know which one.
Edit: stream.Readable is not available in version 0.8. It is available in 0.9 beta, and will be avilable in v0.10 stable when that comes out. The ideal answer to this question will describe both the 0.8- solution and the 0.9+ solution.
use streams2. for 0.8, there's: https://github.com/isaacs/readable-stream

Resources