How to implement setEncoding in my own Node.JS readable Stream? - node.js

In my program, I have a readable Stream source which emits data events from a shared stream of byte data. I then interpret the data to extract Buffer objects, and then pass them to the appropriate function. The most elegant way of passing the Buffers to the rest of my program seems to be using a readable Stream object. That way my functions can read the stream the same way as if it were standalone stream.
Here is my current code:
var stream = new stream.Stream()
stream.readable = true
stream.pause = function () {
source.pause()
}
stream.resume = function () {
source.resume()
}
stream.readable = true
// in a loop, read the data from source stream, dissect the parts we want,
// and emit Buffer objects to the stream's data event.
How can I implement setEncoding?
Edit: I think the answer is related to this line in the stream.js source code
Stream.Readable = require('_stream_readable');
Furthermore, the correct answer may be to
replace new stream.Stream with new stream.Readable and to
replace stream.emit('data', buffer) with
either stream.onread(buffer) or stream.push(buffer). The thing is, I don't know which one.
Edit: stream.Readable is not available in version 0.8. It is available in 0.9 beta, and will be avilable in v0.10 stable when that comes out. The ideal answer to this question will describe both the 0.8- solution and the 0.9+ solution.

use streams2. for 0.8, there's: https://github.com/isaacs/readable-stream

Related

What are the roles of _read and read in Node JS streams?

I'm really just looking for clarification on how these work. IMO the documentation on streams is somewhat lacking, and there actually aren't a lot of resources out their that comprehensively explain explain how they're are meant to work and be extended.
My question can be broken down into two parts
One, What is the role of the _read function within the stream module? When I run this code it endlessly prints out "hello world" until null is pushed onto the stream buffer. This seems to indicate that _read is called in some kind of loop that waits for a null in the buffer, but I can't find documentation anywhere that states this in explicit terms.
var Readable = require('stream').Readable
var rs = Readable()
rs._read = function () {
rs.push("hello world")
rs.push(null)
};
rs.on("data", function(data){
console.log("some data", data)
})
Two, what does read actually do? My understanding is that read consumes data from the read stream buffer, and fires the data event. Is that all that's going on here?
read() is something that a consumer of the readStream calls if they want to specifically read some bytes from the stream (when the stream is not flowing).
_read() is an internal method that is part of the internal implementation of the read stream. The internals of the stream call this method (it is NOT to be called from the outside) when the stream is flowing and the stream wants to get more data from the source. When called the _read() method pushes data with .push(data) or if it has no more data, then it does a .push(null).
You can see an explanation and example here in this article.
_read(size) {
if (this.data.length) {
const chunk = this.data.slice(0, size);
this.data = this.data.slice(size, this.data.length);
this.push(chunk);
} else {
this.push(null); // 'end', no more data
}
}
If you were implementing a read stream to some custom source of data, then you would implement the _read() method to fetch up to the size amount of data from your source and .push() that data into the stream.

Buffering a Float32Array to a client

This should be obvious, but for some reason I am not getting any result. I have already spent way too much time just trying different ways to get this working without results.
TLDR: A shorter way to explain this question could be: I know how to stream a sound from a file. How to stream a buffer containing sound that was synthesized on the server instead?
This works:
client:
var stream = ss.createStream();
ss(socket).emit('get-file', stream, data.bufferSource);
var parts = [];
stream.on('data', function(chunk){
parts.push(chunk);
});
stream.on('end', function () {
var blob=new Blob(parts,{type:"audio"});
if(cb){
cb(blob);
}
});
server (in the 'socket-connected' callback of socket.io)
var ss = require('socket.io-stream');
// ....
ss(socket).on('get-file', (stream:any, filename:any)=>{
console.log("get-file",filename);
fs.createReadStream(filename).pipe(stream);
});
Now, the problem:
I want to alter this audio buffer and send the modified audio instead of just the file. I converted the ReadStream into an Float32Array, and did some processes sample by sample. Now I want to send that modified Float32Array to the client.
In my view, I just need to replaces the fs.createReadStream(filename) with(new Readable()).push(modifiedSoundBuffer). However, I get a TypeError: Invalid non-string/buffer chunk. Interestingly, if I convert this modifiedSodunBuffer into a Uint8Array, it doesn't yell at me, and the client gets a large array, which looks good; only that all the array values are 0. I guess that it's flooring all the values?
ss(socket).on('get-buffer', (stream:any, filename:any)=>{
let readable=(new Readable()).push(modifiedFloat32Array);
readable.pipe(stream);
});
I am trying to use streams for two reasons: sound buffers are large, and to allow concurrent processing in the future
if you will convert object Float32Array to buffer before sending like this Readable()).push(Buffer.from(modifiedSoundBuffer)) ?

Proper way to unpipe a streams2 pipeline and empty it (not just flush)

Premise
I'm trying to find the correct way to prematurely terminate a series of piped streams (pipeline) in Node.js: sometimes I want to gracefully abort the stream before it has finished. Specifically I'm dealing with mostly objectMode: true and non-native parallel streams, but this shouldn't really matter.
Problem
The problem is when I unpipe the pipeline, data remains in each stream's buffer and is drained. This might be okay for most of the intermediate streams (e.g. Readable/Transform), but the last Writable still drains to its write target (e.g. a file or a database or socket or w/e). This could be problematic if the buffer contains hundreds or thousands of chunks which takes a significant amount of time to drain. I want it to stop immediately, i.e. not drain; why waste cycles and memory on data that doesn't matter?
Depending on the route I go, I receive either a "write after end" error, or an exception when the stream cannot find existing pipes.
Question
What is the proper way to gracefully kill off a pipeline of streams in the form a.pipe(b).pipe(c).pipe(z)?
Solution?
The solution I have come up with is 3-step:
unpipe each stream in the pipeline in reverse order
Empty each stream's buffer that implements Writable
end each stream that implements Writable
Some pseudo code illustrating the entire process:
var pipeline = [ // define the pipeline
readStream,
transformStream0,
transformStream1,
writeStream
];
// build and start the pipeline
var tmpBuildStream;
pipeline.forEach(function(stream) {
if ( !tmpBuildStream ) {
tmpBuildStream = stream;
continue;
}
tmpBuildStream = lastStream.pipe(stream);
});
// sleep, timeout, event, etc...
// tear down the pipeline
var tmpTearStream;
pipeline.slice(0).reverse().forEach(function(stream) {
if ( !tmpTearStream ) {
tmpTearStream = stream;
continue;
}
tmpTearStream = stream.unpipe(tmpTearStream);
});
// empty and end the pipeline
pipeline.forEach(function(stream) {
if ( typeof stream._writableState === 'object' ) { // empty
stream._writableState.length -= stream._writableState.buffer.length;
stream._writableState.buffer = [];
}
if ( typeof stream.end === 'function' ) { // kill
stream.end();
}
});
I'm really worried about the usage of stream._writableState and modifying the internal buffer and length properties (the _ signifies a private property). This seems like a hack. Also note that since I'm piping, things like pause and resume our out of the question (based on a suggestion I received from IRC).
I also put together a runnable version (pretty sloppy) you can grab from github: https://github.com/zamnuts/multipipe-proto (git clone, npm install, view readme, npm start)
In this particular case I think we should get rid of the structure where you have 4 different not fully customised streams. Piping them together will create chain dependency that will be hard to control if we haven't implement our own mechanism.
I would like to focus on your actuall goal here:
INPUT >----[read] → [transform0] → [transform1] → [write]-----> OUTPUT
| | | |
KILL_ALL------o----------o--------------o------------o--------[nothing to drain]
I believe that the above structure can be achieved via combining custom:
duplex stream - for own _write(chunk, encoding, cb)and _read(bytes) implementation with
transform stream - for own _transform(chunk, encoding, cb) implementation.
Since you are using the writable-stream-parallel package you may also want to go over their libs, as their duplex implementation can be found here: https://github.com/Clever/writable-stream-parallel/blob/master/lib/duplex.js .
And their transform stream implementation is here: https://github.com/Clever/writable-stream-parallel/blob/master/lib/transform.js. Here they handle the highWaterMark.
Possible solution
Their write stream : https://github.com/Clever/writable-stream-parallel/blob/master/lib/writable.js#L189 has an interesting function writeOrBuffer, I think you might be able to tweak it a bit to interrupt writing the data from buffer.
Note: These 3 flags are controlling the buffer clearing:
( !finished && !state.bufferProcessing && state.buffer.length )
References:
Node.js Transform Stream Doc
Node.js Duplex Stream Doc
Writing Transform Stream in Node.js
Writing Duplex Stream in Node.js

Basic streams issue: Difficulty sending a string to stdout

I'm just starting learning about streams in node. I have a string in memory and I want to put it in a stream that applies a transformation and pipe it through to process.stdout. Here is my attempt to do it:
var through = require('through');
var stream = through(function write(data) {
this.push(data.toUpperCase());
});
stream.push('asdf');
stream.pipe(process.stdout);
stream.end();
It does not work. When I run the script on the cli via node, nothing is sent to stdout and no errors are thrown. A few questions I have:
If you have a value in memory that you want to put into a stream, what is the best way to do it?
What is the difference between push and queue?
Does it matter if I call end() before or after calling pipe()?
Is end() equivalent to push(null)?
Thanks!
Just use the vanilla stream API
var Transform = require("stream").Transform;
// create a new Transform stream
var stream = new Transform({
decodeStrings: false,
encoding: "ascii"
});
// implement the _transform method
stream._transform = function _transform(str, enc, done) {
this.push(str.toUpperCase() + "\n";
done();
};
// connect to stdout
stream.pipe(process.stdout);
// write some stuff to the stream
stream.write("hello!");
stream.write("world!");
// output
// HELLO!
// WORLD!
Or you can build your own stream constructor. This is really the way the stream API is intended to be used
var Transform = require("stream").Transform;
function MyStream() {
// call Transform constructor with `this` context
// {decodeStrings: false} keeps data as `string` type instead of `Buffer`
// {encoding: "ascii"} sets the encoding for our strings
Transform.call(this, {decodeStrings: false, encoding: "ascii"});
// our function to do "work"
function _transform(str, encoding, done) {
this.push(str.toUpperCase() + "\n");
done();
}
// export our function
this._transform = _transform;
}
// extend the Transform.prototype to your constructor
MyStream.prototype = Object.create(Transform.prototype, {
constructor: {
value: MyStream
}
});
Now use it like this
// instantiate
var a = new MyStream();
// pipe to a destination
a.pipe(process.stdout);
// write data
a.write("hello!");
a.write("world!");
Output
HELLO!
WORLD!
Some other notes about .push vs .write.
.write(str) adds data to the writable buffer. It is meant to be called externally. If you think of a stream like a duplex file handle, it's just like fwrite, only buffered.
.push(str) adds data to the readable buffer. It is only intended to be called from within our stream.
.push(str) can be called many times. Watch what happens if we change our function to
function _transform(str, encoding, done) {
this.push(str.toUpperCase());
this.push(str.toUpperCase());
this.push(str.toUpperCase() + "\n");
done();
}
Output
HELLO!HELLO!HELLO!
WORLD!WORLD!WORLD!
First, you want to use write(), not push(). write() puts data in to the stream, push() pushes data out of the stream; you only use push() when implementing your own Readable, Duplex, or Transform streams.
Second, you'll only want to write() data to the stream after you've setup the pipe() (or added some event listeners). If you write to a stream with nothing wired to the other end, the data you've written will be lost. As #naomik pointed out, this isn't true in general since a Writable stream will buffer write()s. In your example you do need to write() after pipe() though. Otherwise, the process will end before writing anything to STDOUT. This is possibly due to how the through module is implemented, but I don't know that for sure.
So, with that in mind, you can make a couple simple changes to your example to get it to work:
var through = require('through');
var stream = through(function write(data) {
this.push(data.toUpperCase());
});
stream.pipe(process.stdout);
stream.write('asdf');
stream.end();
Now, for your questions:
The easiest way to get data from memory in to a writable stream is to simply write() it, just like we're doing with stream.wrtie('asdf') in your example.
As far as I know, the stream doesn't have a queue() function, did you mean write()? Like I said above, write() is used to put data in to a stream, push() is used to push data out of the stream. Only call push() in your owns stream implementations.
Only call end() after all your data has been written to your stream. end() basically says: "Ok, I'm done now. Please finish what you're doing and close the stream."
push(null) is pretty much equivalent to end(). That being said, don't call push(null) unless you're doing it inside your own stream implementation (as stated above). It's almost always more appropriate to call end().
Based on the examples for stream (http://nodejs.org/api/stream.html#stream_readable_pipe_destination_options)
and through (https://www.npmjs.org/package/through)
it doesn't look like you are using your stream correctly... What happens if you use write(...) instead of push(...)?

Node.js: splitting a readable stream pipe to multiple sequential writable streams

Given a Readable stream (which may be process.stdin or a file stream), is it possible/practical to pipe() to a custom Writable stream that will fill a child Writable until a certain size; then close that child stream; open a new Writable stream and continue?
(The context is to upload a large piece of data from a pipeline to a CDN, dividing it up into blocks of a reasonable size as it goes, without having to write the data to disk first.)
I've tried creating a Writable that handles the opening and closing of the child stream in the _write function, but the problem comes when the incoming chunk is too big to fit in the existing child stream: it has to write some of the chunk to the old stream; create the new stream; and then wait for the open event on the new stream before completing the _write call.
The other thought I had was to create an extra Duplex or Transform stream to buffer the pipe and ensure that the chunk coming into the Writable is definitely equal to or less than the amount the existing child stream can accept, to give the Writable time to change the child stream over.
Alternatively, is this overcomplicating everything and there's a much easier way to do the original task?
I bumped across the question when looking for an answer for a related problem. How to parse a file and split it its lines into separate files depending on some category value in the line.
I did my best to change my code to make it more relevant to your problem. However, that's rapidly adapted. Not tested. Treat it as pseudo-code.
var fs = require('fs'),
through = require('through');
var destCount = 0, dest, size = 0, MAX_SIZE = 1000;
readableStream
.on('data', function(data) {
var out = data.toString() + "\n";
size += out.length;
if(size > MAX_SIZE) {
dest.emit("end");
dest = null;
size = 0;
}
if(!dest) {
// option 1. manipulate data before saving them.
dest = through();
dest.pipe(fs.createWriteStream("log" + destCount))
// option 2. write directly to file
// dest = fs.createWriteStream("log" + destCount);
}
dest.emit("data", out);
})
.on('end', function() {
dest.emit('end');
});
I would introduce a Transform in between the Readable and Writable stream. And in its _transform, I would do all the logic I would need.
Maybe, I would only have a Readable and a Transform only. The _transform method would create all the Writable stream I need
Personally, I only use a Writable stream only when I'm dumping data somewhere and I would be done processing that chunk.
I avoid implementing _read and _write as much as I can and abuse Transform stream.
But the point I don't understand in your question is write about size. What do you mean by it.?

Resources