How to get notified when data is actually ready for streaming?

How to get notified when data is actually ready for streaming? - node.js

I have two streams:
a source stream, which downloads an audio file from the Internet
a consumer stream, which streams the file to a streaming server
Before streaming to the server there should be a handshake which returns a handle. Then I have a few seconds to really start streaming or the server closes the connection.
Which means, that I should
FIRST wait until the source data is ready to be streamed
and only THEN start streaming.
The problem is that there doesn't seem to be a way to get notified when data is ready in the source stream.
The first event that comes to mind is the 'data' event. But it also consumes the data which is not acceptable and doesn't allow to use pipes at all.
So how to do something like this:
await pEvent(sourceStream, 'dataIsReady');
// Negotiate with the server about the transmission
sourceStream.pipe(consumerStream);
Thanks in advance.

Answering to myself.
Here is a solution which works for me.
It requires an auxiliary passthrough stream with a custom event:
class DataWaitPassThroughStream extends Transform {
dataIsReady: boolean = false;
constructor(opts: TransformOptions) {
super(opts);
}
_transform(chunk: any, encoding: BufferEncoding, callback: TransformCallback) {
if (!this.dataIsReady) {
this.dataIsReady = true;
this.emit('dataIsReady');
}
callback(null, chunk);
}
}
Usage
import pEvent from 'p-event';
const dataReadyStream = sourceStream.pipe(new DataWaitPassThroughStream());
await pEvent(dataReadyStream, 'dataIsReady');
// Negotiate with the server about the transmission...
dataReadyStream.pipe(consumerStream);

Related

What are the roles of _read and read in Node JS streams?

I'm really just looking for clarification on how these work. IMO the documentation on streams is somewhat lacking, and there actually aren't a lot of resources out their that comprehensively explain explain how they're are meant to work and be extended.
My question can be broken down into two parts
One, What is the role of the _read function within the stream module? When I run this code it endlessly prints out "hello world" until null is pushed onto the stream buffer. This seems to indicate that _read is called in some kind of loop that waits for a null in the buffer, but I can't find documentation anywhere that states this in explicit terms.
var Readable = require('stream').Readable
var rs = Readable()
rs._read = function () {
rs.push("hello world")
rs.push(null)
};
rs.on("data", function(data){
console.log("some data", data)
})
Two, what does read actually do? My understanding is that read consumes data from the read stream buffer, and fires the data event. Is that all that's going on here?

read() is something that a consumer of the readStream calls if they want to specifically read some bytes from the stream (when the stream is not flowing).
_read() is an internal method that is part of the internal implementation of the read stream. The internals of the stream call this method (it is NOT to be called from the outside) when the stream is flowing and the stream wants to get more data from the source. When called the _read() method pushes data with .push(data) or if it has no more data, then it does a .push(null).
You can see an explanation and example here in this article.
_read(size) {
if (this.data.length) {
const chunk = this.data.slice(0, size);
this.data = this.data.slice(size, this.data.length);
this.push(chunk);
} else {
this.push(null); // 'end', no more data
}
}
If you were implementing a read stream to some custom source of data, then you would implement the _read() method to fetch up to the size amount of data from your source and .push() that data into the stream.

Nodejs PassThrough Stream

I want to transmit an fs.Readstream over a net.Socket (TCP) stream. For this I use a .pipe.
When the fs.Readstream is finished, I don't want to end the net.Socket stream. That's why I use
readStream.pipe(socket, {
end: false
})
Unfortunately I don't get 'close', 'finish' or 'end' on the other side. This prevents me from closing my fs.Writestream on the opposite side. However, the net.Socket connection remains, which I also need because I would like to receive an ID as a response.
Since I don't get a 'close' or 'finish' on the opposite, unfortunately I can't end the fs.Writestream and therefore can't send a response with a corresponding ID
Is there a way to manually send a 'close' or 'finish' event via the net.socket without closing it?
With the command, only my own events react.
Can anyone tell me what I am doing wrong?
var socket : net.Socket; //TCP connect
var readStream = fs.createWriteStream('test.txt');
socket.on('connect', () => {
readStream.pipe(socket, {
end: false
})
readStream.on('close', () => {
socket.emit('close');
socket.emit('finish');
})
//waiting for answer
//waiting for answer
//waiting for answer
socket.on('data', (c) => {
console.log('got my answer: ' + c.toString());
})
})
}

Well there's not really much you can do with a single stream except provide some way to the other side to know that the stream has ended programatically.
When the socket sends an end event it actually flushes the buffer and then closes the TCP connection, which then on the other side is translated into finish after the last byte is delivered. In order to re-use the connection you can consider these two options:
One: Use HTTP keep-alive
As you can imagine you're not the first person having faced this problem. It actually is a common thing and some protocols like HTTP have you already covered. This will introduce a minor overhead, but only on starting and ending the streams - which in your case may be more acceptable than the other options.
Instead of using basic TCP streams you can as simply use HTTP connections and send your data over http requests, a HTTP POST request would be just fine and your code wouldn't look any different except ditching that {end: false}. The socket would need to have it's headers sent, so it'd be constructed like this:
const socket : HTTP.ClientRequest = http.request({method: 'POST', url: '//wherever.org/somewhere/there:9087', headers: {
'connection': 'keep-alive',
'transfer-encoding': 'chunked'
}}, (res) => {
// here you can call the code to push more streams since the
});
readStream.pipe(socket); // so our socket (vel connection) will end, but the underlying channel will stay open.
You actually don't need to wait for the socket to connect, and pipe the stream directly like in the example above, but do check how this behaves if your connection fails. Your waiting for connect event will also work since HTTP request class implements all TCP connection events and methods (although it may have some slight differences in signatures).
More reading:
Wikipedia article of keep-alive - a good explaination how this works
Node.js http.Agent options - you can control how many connections you have, and more importantly set the default keep alive behavior.
Oh and a bit of warning - TCP keep-alive is a different thing, so don't get confused there.
Two: Use a "magic" end packet
In this case what you'd do is to send a simple end packet, for instance: \x00 (a nul character) at the end of the socket. This has a major drawback, because you will need to do something with the stream in order to make sure that a nul character doesn't appear there otherwise - this will introduce an overhead on the data processing (so more CPU usage).
In order to do it like this, you need to push the data through a transform stream before you send them to the socket - this below is an example, but it would work on strings only so adapt it to your needs.
const zeroEncoder = new Transform({
encoding: 'utf-8',
transform(chunk, enc, cb) { cb(chunk.toString().replace('\x00', '\\x00')); },
flush: (cb) => cb('\x00')
});
// ... whereever you do the writing:
readStream
.pipe(zeroEncoder)
.on('unpipe', () => console.log('this will be your end marker to send in another stream'))
.pipe(socket, {end: false})
Then on the other side:
tcpStream.on('data', (chunk) => {
if (chunk.toString().endsWith('\x00')) {
output.end(decodeZeros(chunk));
// and rotate output
} else {
output.write(decodeZeros(chunk));
}
});
As you can see this is way more complicated and this is also just an example - you could simplify it a bit by using JSON, 7-bit transfer encoding or some other ways, but it will in all cases need some trickery and most importantly reading through the whole stream and way more memory for it - so I don't really recommend this approach. If you do though:
Make sure you encode/decode the data correctly
Consider if you can find a byte that won't appear in your data
The above may work with strings, but will be at least bad with Buffers
Finally there's no error control or flow control - so at least pause/resume logic is needed.
I hope this is helpful.

What's the node.js paradigm for socket stream conversation?

I'm trying to implement a socket protocol and it is unclear to me how to proceed. I have the socket as a Stream object, and I am able to write() data to it to send on the socket, and I know that the "readable" or "data" events can be used to receive data. But this does not work well when the protocol involves a conversation in which one host is supposed to send a piece of data, wait for a response, and then send data again after the response.
In a block paradigm it would look like this:
send some data
wait for specific data reply
massage data and send it back
send additional data
As far as I can tell, node's Stream object does not have a read function that will asynchronously return with the number of bytes requested. Otherwise, each wait could just put the remaining functionality in its own callback.
What is the node.js paradigm for this type of communication?

Technically there is a Readable.read() but its not recommended (maybe you can't be sure of the size or it blocks, not sure.) You can keep track of state and on each data event add to a Buffer that you keep processing incrementally. You can use readUInt32LE etc. on Buffer to read specific pieces of binary data if you need to do that (or you can convert to string if its textual data). https://github.com/runvnc/metastream/blob/master/index.js
If you want to write it in your 'block paradigm', you could basically make some things a promise or async function and then
let specialReplyRes = null;
waitForSpecialReply = f => new Promise( res => specialReplyRes = res);
stream.on('data', (buff) => {
if (buff.toString().indexOf('special')>=0) specialReplyRes(buff.toString());
});
// ...
async function proto() {
stream.write(data);
let reply = await waitForSpecialReply();
const message = massage(reply);
stream.write(message);
}
Where your waitForSpecialReply promise is stored and resolved after a certain message is received through your parsing.

Proper way to unpipe a streams2 pipeline and empty it (not just flush)

Premise
I'm trying to find the correct way to prematurely terminate a series of piped streams (pipeline) in Node.js: sometimes I want to gracefully abort the stream before it has finished. Specifically I'm dealing with mostly objectMode: true and non-native parallel streams, but this shouldn't really matter.
Problem
The problem is when I unpipe the pipeline, data remains in each stream's buffer and is drained. This might be okay for most of the intermediate streams (e.g. Readable/Transform), but the last Writable still drains to its write target (e.g. a file or a database or socket or w/e). This could be problematic if the buffer contains hundreds or thousands of chunks which takes a significant amount of time to drain. I want it to stop immediately, i.e. not drain; why waste cycles and memory on data that doesn't matter?
Depending on the route I go, I receive either a "write after end" error, or an exception when the stream cannot find existing pipes.
Question
What is the proper way to gracefully kill off a pipeline of streams in the form a.pipe(b).pipe(c).pipe(z)?
Solution?
The solution I have come up with is 3-step:
unpipe each stream in the pipeline in reverse order
Empty each stream's buffer that implements Writable
end each stream that implements Writable
Some pseudo code illustrating the entire process:
var pipeline = [ // define the pipeline
readStream,
transformStream0,
transformStream1,
writeStream
];
// build and start the pipeline
var tmpBuildStream;
pipeline.forEach(function(stream) {
if ( !tmpBuildStream ) {
tmpBuildStream = stream;
continue;
}
tmpBuildStream = lastStream.pipe(stream);
});
// sleep, timeout, event, etc...
// tear down the pipeline
var tmpTearStream;
pipeline.slice(0).reverse().forEach(function(stream) {
if ( !tmpTearStream ) {
tmpTearStream = stream;
continue;
}
tmpTearStream = stream.unpipe(tmpTearStream);
});
// empty and end the pipeline
pipeline.forEach(function(stream) {
if ( typeof stream._writableState === 'object' ) { // empty
stream._writableState.length -= stream._writableState.buffer.length;
stream._writableState.buffer = [];
}
if ( typeof stream.end === 'function' ) { // kill
stream.end();
}
});
I'm really worried about the usage of stream._writableState and modifying the internal buffer and length properties (the _ signifies a private property). This seems like a hack. Also note that since I'm piping, things like pause and resume our out of the question (based on a suggestion I received from IRC).
I also put together a runnable version (pretty sloppy) you can grab from github: https://github.com/zamnuts/multipipe-proto (git clone, npm install, view readme, npm start)

In this particular case I think we should get rid of the structure where you have 4 different not fully customised streams. Piping them together will create chain dependency that will be hard to control if we haven't implement our own mechanism.
I would like to focus on your actuall goal here:
INPUT >----[read] → [transform0] → [transform1] → [write]-----> OUTPUT
| | | |
KILL_ALL------o----------o--------------o------------o--------[nothing to drain]
I believe that the above structure can be achieved via combining custom:
duplex stream - for own _write(chunk, encoding, cb)and _read(bytes) implementation with
transform stream - for own _transform(chunk, encoding, cb) implementation.
Since you are using the writable-stream-parallel package you may also want to go over their libs, as their duplex implementation can be found here: https://github.com/Clever/writable-stream-parallel/blob/master/lib/duplex.js .
And their transform stream implementation is here: https://github.com/Clever/writable-stream-parallel/blob/master/lib/transform.js. Here they handle the highWaterMark.
Possible solution
Their write stream : https://github.com/Clever/writable-stream-parallel/blob/master/lib/writable.js#L189 has an interesting function writeOrBuffer, I think you might be able to tweak it a bit to interrupt writing the data from buffer.
Note: These 3 flags are controlling the buffer clearing:
( !finished && !state.bufferProcessing && state.buffer.length )
References:
Node.js Transform Stream Doc
Node.js Duplex Stream Doc
Writing Transform Stream in Node.js
Writing Duplex Stream in Node.js

watching streaming HTTP response progress in NodeJS, express

i want to stream sizeable files in NodeJS 0.10.x using express#4.8.5 and pipes. currently i'm
doing it like this (in CoffeeScript):
app.get '/', ( request, response ) ->
input = P.create_readstream route
input
.pipe P.$split()
.pipe P.$trim()
.pipe P.$skip_empty()
.pipe P.$skip_comments()
.pipe P.$parse_csv headers: no, delimiter: '\t'
.pipe response
(P is pipedreams.)
what i would like to have is something like
.pipe count_bytes # ???
.pipe response
.pipe report_progress response
so when i look at the server running in the terminal, i get some indication of how many bytes have been
accepted by the client. right now, it is very annoying to see the client loading for ages without having
any indication whether the transmision will be done in a minute or tomorrow.
is there any middleware to do that? i couldn't find any.
oh, and do i have to call anything on response completion? it does look like it's working automagically right now.

For your second question, you don't have to close anything. The pipe function handles everything for you, even throttling of the streams (if the source stream has more data than the client can handle due to poor download speed, it will pause the source stream until the client can consume again the source instead of using a bunch of memory server side by completely reading the source).
For your first question, to have some stats server side on your streams, what you could use is a Transform stream like:
var Transform = require('stream').Transform;
var util = require('util').inherits;
function StatsStream(ip, options) {
Transform.call(this, options);
this.ip = ip;
}
inherits(StatsStream, Transform);
StatsStream.prototype._transform = function(chunk, encoding, callback) {
// here some bytes have been read from the source and are
// ready to go to the destination, do your logging here
console.log('flowing ', chunk.length, 'bytes to', this.ip);
// then tell the tranform stream that the bytes it should
// send to the destination is the same chunk you received...
// (and that no error occured)
callback(null, chunk);
};
Then in your requests handlers you can pipe like (sorry javascript):
input.pipe(new StatsStream(req.ip)).pipe(response)
I did this on top of my head so beware :)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to get notified when data is actually ready for streaming? - node.js

Related

What are the roles of _read and read in Node JS streams?

Nodejs PassThrough Stream

What's the node.js paradigm for socket stream conversation?

Proper way to unpipe a streams2 pipeline and empty it (not just flush)

watching streaming HTTP response progress in NodeJS, express

Categories

Resources