Nodejs PassThrough Stream - node.js

I want to transmit an fs.Readstream over a net.Socket (TCP) stream. For this I use a .pipe.
When the fs.Readstream is finished, I don't want to end the net.Socket stream. That's why I use
readStream.pipe(socket, {
end: false
})
Unfortunately I don't get 'close', 'finish' or 'end' on the other side. This prevents me from closing my fs.Writestream on the opposite side. However, the net.Socket connection remains, which I also need because I would like to receive an ID as a response.
Since I don't get a 'close' or 'finish' on the opposite, unfortunately I can't end the fs.Writestream and therefore can't send a response with a corresponding ID
Is there a way to manually send a 'close' or 'finish' event via the net.socket without closing it?
With the command, only my own events react.
Can anyone tell me what I am doing wrong?
var socket : net.Socket; //TCP connect
var readStream = fs.createWriteStream('test.txt');
socket.on('connect', () => {
readStream.pipe(socket, {
end: false
})
readStream.on('close', () => {
socket.emit('close');
socket.emit('finish');
})
//waiting for answer
//waiting for answer
//waiting for answer
socket.on('data', (c) => {
console.log('got my answer: ' + c.toString());
})
})
}

Well there's not really much you can do with a single stream except provide some way to the other side to know that the stream has ended programatically.
When the socket sends an end event it actually flushes the buffer and then closes the TCP connection, which then on the other side is translated into finish after the last byte is delivered. In order to re-use the connection you can consider these two options:
One: Use HTTP keep-alive
As you can imagine you're not the first person having faced this problem. It actually is a common thing and some protocols like HTTP have you already covered. This will introduce a minor overhead, but only on starting and ending the streams - which in your case may be more acceptable than the other options.
Instead of using basic TCP streams you can as simply use HTTP connections and send your data over http requests, a HTTP POST request would be just fine and your code wouldn't look any different except ditching that {end: false}. The socket would need to have it's headers sent, so it'd be constructed like this:
const socket : HTTP.ClientRequest = http.request({method: 'POST', url: '//wherever.org/somewhere/there:9087', headers: {
'connection': 'keep-alive',
'transfer-encoding': 'chunked'
}}, (res) => {
// here you can call the code to push more streams since the
});
readStream.pipe(socket); // so our socket (vel connection) will end, but the underlying channel will stay open.
You actually don't need to wait for the socket to connect, and pipe the stream directly like in the example above, but do check how this behaves if your connection fails. Your waiting for connect event will also work since HTTP request class implements all TCP connection events and methods (although it may have some slight differences in signatures).
More reading:
Wikipedia article of keep-alive - a good explaination how this works
Node.js http.Agent options - you can control how many connections you have, and more importantly set the default keep alive behavior.
Oh and a bit of warning - TCP keep-alive is a different thing, so don't get confused there.
Two: Use a "magic" end packet
In this case what you'd do is to send a simple end packet, for instance: \x00 (a nul character) at the end of the socket. This has a major drawback, because you will need to do something with the stream in order to make sure that a nul character doesn't appear there otherwise - this will introduce an overhead on the data processing (so more CPU usage).
In order to do it like this, you need to push the data through a transform stream before you send them to the socket - this below is an example, but it would work on strings only so adapt it to your needs.
const zeroEncoder = new Transform({
encoding: 'utf-8',
transform(chunk, enc, cb) { cb(chunk.toString().replace('\x00', '\\x00')); },
flush: (cb) => cb('\x00')
});
// ... whereever you do the writing:
readStream
.pipe(zeroEncoder)
.on('unpipe', () => console.log('this will be your end marker to send in another stream'))
.pipe(socket, {end: false})
Then on the other side:
tcpStream.on('data', (chunk) => {
if (chunk.toString().endsWith('\x00')) {
output.end(decodeZeros(chunk));
// and rotate output
} else {
output.write(decodeZeros(chunk));
}
});
As you can see this is way more complicated and this is also just an example - you could simplify it a bit by using JSON, 7-bit transfer encoding or some other ways, but it will in all cases need some trickery and most importantly reading through the whole stream and way more memory for it - so I don't really recommend this approach. If you do though:
Make sure you encode/decode the data correctly
Consider if you can find a byte that won't appear in your data
The above may work with strings, but will be at least bad with Buffers
Finally there's no error control or flow control - so at least pause/resume logic is needed.
I hope this is helpful.

Related

How to get notified when data is actually ready for streaming?

I have two streams:
a source stream, which downloads an audio file from the Internet
a consumer stream, which streams the file to a streaming server
Before streaming to the server there should be a handshake which returns a handle. Then I have a few seconds to really start streaming or the server closes the connection.
Which means, that I should
FIRST wait until the source data is ready to be streamed
and only THEN start streaming.
The problem is that there doesn't seem to be a way to get notified when data is ready in the source stream.
The first event that comes to mind is the 'data' event. But it also consumes the data which is not acceptable and doesn't allow to use pipes at all.
So how to do something like this:
await pEvent(sourceStream, 'dataIsReady');
// Negotiate with the server about the transmission
sourceStream.pipe(consumerStream);
Thanks in advance.
Answering to myself.
Here is a solution which works for me.
It requires an auxiliary passthrough stream with a custom event:
class DataWaitPassThroughStream extends Transform {
dataIsReady: boolean = false;
constructor(opts: TransformOptions) {
super(opts);
}
_transform(chunk: any, encoding: BufferEncoding, callback: TransformCallback) {
if (!this.dataIsReady) {
this.dataIsReady = true;
this.emit('dataIsReady');
}
callback(null, chunk);
}
}
Usage
import pEvent from 'p-event';
const dataReadyStream = sourceStream.pipe(new DataWaitPassThroughStream());
await pEvent(dataReadyStream, 'dataIsReady');
// Negotiate with the server about the transmission...
dataReadyStream.pipe(consumerStream);

socket.io how to send multiple messages sequentially?

I'm using socket.io like this
Client:
socket.on('response', function(i){
console.log(i);
});
socket.emit('request', whateverdata);
Server:
socket.on('request', function(whateverdata){
for (i=0; i<10000; i++){
console.log(i);
socket.emit('response', i);
}
console.log("done!");
});
I need output like this when putting the two terminals side by side:
Server Client
0 0
1 1
. (etc) .
. .
9998 9998
9999 9999
done!
But instead I am getting this:
Server Client
0
1
. (etc)
.
9998
9999
done!
0
1
.
. (etc)
9998
9999
Why?
Shouldn't Socket.IO / Node emit the message immediately, not wait for the loop to complete before emitting any of them?
Notes:
The for loop is very long and computationally slow.
This question is referring to the socket.io library, not websockets in general.
Due to latency, waiting for confirmation from the client before sending each response is not possible
The order that the messages are received is not important, only that they are received as quickly as possible
The server emits them all in a loop and it takes a small bit of time for them to get to the client and get processed by the client in another process. This should not be surprising.
It is also possible that the single-threaded nature of Javascript in node.js prevents the emits from actually getting sent until your Javascript loop finishes. That would take detailed examination of socket.io code to know for sure if that is an issue. As I said before if you want to 1,1 then 2,2 then 3,3 instead of 1,2,3 sent, then 1,2,3 received you have to write code to force that.
If you want the client to receive the first before the server sends the 2nd, then you have to make the client send a response to the first and have the server not send the 2nd until it receives the response from the first. This is all async networking. You don't control the order of events in different processes unless you write specific code to force a particular sequence.
Also, how do you have client and server in the same console anyway? Unless you are writing out precise timestamps, you wouldn't be able to tell exactly what event came before the other in two separate processes.
One thing you could try is to send 10, then do a setTimeout(fn, 1) to send the next 10 and so on. That would give JS a chance to breathe and perhaps process some other events that are waiting for you to finish to allow the packets to get sent.
There's another networking issue too. By default TCP tries to batch up your sends (at the lowest TCP level). Each time you send, it sets a short timer and doesn't actually send until that timer fires. If more data arrives before the timer fires, it just adds that data to the "pending" packet and sets the timer again. This is referred to as the Nagle's algorithm. You can disable this "feature" on a per-socket basis with socket.setNoDelay(). You have to call that on the actual TCP socket.
I am seeing some discussion that Nagle's algorithm may already be turned off for socket.io (by default). Not sure yet.
In stepping through the process of socket.io's .emit(), there are some cases where the socket is marked as not yet writable. In those cases, the packets are added to a buffer and will be processed "later" on some future tick of the event loop. I cannot see exactly what puts the socket temporarily in this state, but I've definitely seen it happen in the debugger. When it's that way, a tight loop of .emit() will just buffer and won't send until you let other events in the event loop process. This is why doing setTimeout(fn, 0) every so often to keep sending will then let the prior packets process. There's some other event that needs to get processed before socket.io makes the socket writable again.
The issue occurs in the flush() method in engine.io (the transport layer for socket.io). Here's the code for .flush():
Socket.prototype.flush = function () {
if ('closed' !== this.readyState &&
this.transport.writable &&
this.writeBuffer.length) {
debug('flushing buffer to transport');
this.emit('flush', this.writeBuffer);
this.server.emit('flush', this, this.writeBuffer);
var wbuf = this.writeBuffer;
this.writeBuffer = [];
if (!this.transport.supportsFraming) {
this.sentCallbackFn.push(this.packetsFn);
} else {
this.sentCallbackFn.push.apply(this.sentCallbackFn, this.packetsFn);
}
this.packetsFn = [];
this.transport.send(wbuf);
this.emit('drain');
this.server.emit('drain', this);
}
};
What happens sometimes is that this.transport.writable is false. And, when that happens, it does not send the data yet. It will be sent on some future tick of the event loop.
From what I can tell, it looks like the issue may be here in the WebSocket code:
WebSocket.prototype.send = function (packets) {
var self = this;
for (var i = 0; i < packets.length; i++) {
var packet = packets[i];
parser.encodePacket(packet, self.supportsBinary, send);
}
function send (data) {
debug('writing "%s"', data);
// always creates a new object since ws modifies it
var opts = {};
if (packet.options) {
opts.compress = packet.options.compress;
}
if (self.perMessageDeflate) {
var len = 'string' === typeof data ? Buffer.byteLength(data) : data.length;
if (len < self.perMessageDeflate.threshold) {
opts.compress = false;
}
}
self.writable = false;
self.socket.send(data, opts, onEnd);
}
function onEnd (err) {
if (err) return self.onError('write error', err.stack);
self.writable = true;
self.emit('drain');
}
};
Where you can see that the .writable property is set to false when some data is sent until it gets confirmation that the data has been written. So, when rapidly sending data in a loop, it may not be letting the event come through that signals that the data has been successfully sent. When you do a setTimeout() to let some things in the event loop get processed that confirmation event comes through and the .writable property gets set to true again so data can again be sent immediately.
To be honest, socket.io is built of so many abstract layers across dozens of modules that it's very difficult code to debug or analyze on GitHub so it's hard to be sure of the exact explanation. I did definitely see the .writable flag as false in the debugger which did cause a delay so this seems like a plausible explanation to me. I hope this helps.

What's the node.js paradigm for socket stream conversation?

I'm trying to implement a socket protocol and it is unclear to me how to proceed. I have the socket as a Stream object, and I am able to write() data to it to send on the socket, and I know that the "readable" or "data" events can be used to receive data. But this does not work well when the protocol involves a conversation in which one host is supposed to send a piece of data, wait for a response, and then send data again after the response.
In a block paradigm it would look like this:
send some data
wait for specific data reply
massage data and send it back
send additional data
As far as I can tell, node's Stream object does not have a read function that will asynchronously return with the number of bytes requested. Otherwise, each wait could just put the remaining functionality in its own callback.
What is the node.js paradigm for this type of communication?
Technically there is a Readable.read() but its not recommended (maybe you can't be sure of the size or it blocks, not sure.) You can keep track of state and on each data event add to a Buffer that you keep processing incrementally. You can use readUInt32LE etc. on Buffer to read specific pieces of binary data if you need to do that (or you can convert to string if its textual data). https://github.com/runvnc/metastream/blob/master/index.js
If you want to write it in your 'block paradigm', you could basically make some things a promise or async function and then
let specialReplyRes = null;
waitForSpecialReply = f => new Promise( res => specialReplyRes = res);
stream.on('data', (buff) => {
if (buff.toString().indexOf('special')>=0) specialReplyRes(buff.toString());
});
// ...
async function proto() {
stream.write(data);
let reply = await waitForSpecialReply();
const message = massage(reply);
stream.write(message);
}
Where your waitForSpecialReply promise is stored and resolved after a certain message is received through your parsing.

watching streaming HTTP response progress in NodeJS, express

i want to stream sizeable files in NodeJS 0.10.x using express#4.8.5 and pipes. currently i'm
doing it like this (in CoffeeScript):
app.get '/', ( request, response ) ->
input = P.create_readstream route
input
.pipe P.$split()
.pipe P.$trim()
.pipe P.$skip_empty()
.pipe P.$skip_comments()
.pipe P.$parse_csv headers: no, delimiter: '\t'
.pipe response
(P is pipedreams.)
what i would like to have is something like
.pipe count_bytes # ???
.pipe response
.pipe report_progress response
so when i look at the server running in the terminal, i get some indication of how many bytes have been
accepted by the client. right now, it is very annoying to see the client loading for ages without having
any indication whether the transmision will be done in a minute or tomorrow.
is there any middleware to do that? i couldn't find any.
oh, and do i have to call anything on response completion? it does look like it's working automagically right now.
For your second question, you don't have to close anything. The pipe function handles everything for you, even throttling of the streams (if the source stream has more data than the client can handle due to poor download speed, it will pause the source stream until the client can consume again the source instead of using a bunch of memory server side by completely reading the source).
For your first question, to have some stats server side on your streams, what you could use is a Transform stream like:
var Transform = require('stream').Transform;
var util = require('util').inherits;
function StatsStream(ip, options) {
Transform.call(this, options);
this.ip = ip;
}
inherits(StatsStream, Transform);
StatsStream.prototype._transform = function(chunk, encoding, callback) {
// here some bytes have been read from the source and are
// ready to go to the destination, do your logging here
console.log('flowing ', chunk.length, 'bytes to', this.ip);
// then tell the tranform stream that the bytes it should
// send to the destination is the same chunk you received...
// (and that no error occured)
callback(null, chunk);
};
Then in your requests handlers you can pipe like (sorry javascript):
input.pipe(new StatsStream(req.ip)).pipe(response)
I did this on top of my head so beware :)

NodeJS sockets initialized as unpaused?

A net.Socket object in NodeJS is a Readable Stream, however one note in the docs got me concerned:
For the Net.Socket 'data' event, the docs say
Note that the data will be lost if there is no listener when a Socket emits a 'data' event.
That seems to imply a Socket is returned to the calling script in "flowing-mode" and already un-paused? However, for a generic Readable Stream, the documentation for the 'data' event says
If you attach a data event listener, then it will switch the stream into flowing mode, and data will be passed to your handler as soon as it is available.
That "If" seems to imply if you wait a bit to bind to the 'data' event, the stream will wait for you, and if you intentionally want to miss the 'data' events, the example in the resume() method seems to indicate you must call the resume() method to start the flow of data.
My concern is that when working with a net.Server, when you receive a net.Socket as part of a 'connection' event, is it imperative that you start handling the 'data' events right away since it's already opened? Meaning if I do:
var s = new net.Server();
s.on('connection', function(socket) {
// Do some lengthy setup process here, blocking execution for a few seconds...
socket.on('data', function(d) { console.log(d); });
});
s.listen(8080);
Meaning not bind to the 'data' event right away, I could lose data? So is this a more robust way to handle incoming connections if you have a lengthy setup required for each one?
var s = new net.Server();
s.on('connection', function(socket) {
socket.pause(); // Not ready for you yet!
// Do some lengthy setup process here, blocking execution for a few seconds...
socket.on('data', function(d) { console.log(d); });
socket.resume(); // Okay, go!
});
s.listen(8080);
Anyone have experience working with listening on raw socket streams to know if this data loss is an issue?
I'm hoping this is an instance where the Net.Socket documentation wasn't updated since v0.10, since the stream documentation has a section that mentions 'data' events started emitting right away in versions prior to 0.10. Were TCP sockets properly updated to not start emitting 'data' packets right away, and the documentation not updated appropriately?
Yes, this is the docs flaw. Here is an example:
var net = require('net')
var server = net.createServer(onConnection)
function onConnection (socket) {
console.log('onConnection')
setTimeout(startReading, 1000)
function startReading () {
socket.on('data', read)
socket.on('end', stopReading)
}
function stopReading () {
socket.removeListener('data', read)
socket.removeListener('end', stopReading)
}
}
function read (data) {
console.log('Received: ' + data.toString('utf8'))
}
server.listen(1234, onListening)
function onListening () {
console.log('onListening')
net.connect(1234, onConnect)
}
function onConnect () {
console.log('onConnect')
this.write('1')
this.write('2')
this.write('3')
this.write('4')
this.write('5')
this.write('6')
}
All the data is received. If you explicitly resume() socket, you will lose it.
Also, if you do your "lengthy" setup in a blocking manner (which you shouldn't) you can't lose any IO as it has no chance to be processed, so no events will be emitted.

Resources