Pausing a non-flowing stream (delaying readable event) - node.js

On one end I have a http request (long polling). On the other end is a "game server" dispatching events. These ends are tied together with a duplex non-flowing stream in object mode and that part works fine.
The long poll end listens on readable and drains the stream by calling stream.read repeatedly. Then closes the client's connection.
The game server uses stream.write to push events to the clients.
Some events in the game actually spand several events and here's the problem:
When the game server adds several events at once (calling stream.write repeatedly) the first write triggers readable and the long poll is filled with the event and closed. That's very inconvinient.
The essential of the problem is that I can't silent readable and then trigger it when I'm done writing.
So my question is; can I somehow "pause" the stream and resume afterwards?
Is there a known another solution to this problem?
My best bet is to write an array of events, but I think that's somehow an antipattern.
Here's some code to illustrate my problem:
var stream = require('stream');
var connection = stream.PassThrough({ objectMode: true });
var exhaust = function() {
console.log('exhausting');
var chunk;
while ((chunk = connection.read()) !== null)
console.log(chunk);
console.log('exhausting end');
}
connection.on('readable', function(){
console.log('Ready to read');
exhaust();
});
for (var i = 0;i < 10;i++)
connection.write({ test: true });

I ended up writing an array of events to write, but I'm looking forward to this upcoming feature, which could be the missin solution:
http://strongloop.com/strongblog/performance-node-js-v-0-12-whats-new/
http://nodejs.org/docs/v0.11.10/api/stream.html#stream_writable_cork

Related

socket.io how to send multiple messages sequentially?

I'm using socket.io like this
Client:
socket.on('response', function(i){
console.log(i);
});
socket.emit('request', whateverdata);
Server:
socket.on('request', function(whateverdata){
for (i=0; i<10000; i++){
console.log(i);
socket.emit('response', i);
}
console.log("done!");
});
I need output like this when putting the two terminals side by side:
Server Client
0 0
1 1
. (etc) .
. .
9998 9998
9999 9999
done!
But instead I am getting this:
Server Client
0
1
. (etc)
.
9998
9999
done!
0
1
.
. (etc)
9998
9999
Why?
Shouldn't Socket.IO / Node emit the message immediately, not wait for the loop to complete before emitting any of them?
Notes:
The for loop is very long and computationally slow.
This question is referring to the socket.io library, not websockets in general.
Due to latency, waiting for confirmation from the client before sending each response is not possible
The order that the messages are received is not important, only that they are received as quickly as possible
The server emits them all in a loop and it takes a small bit of time for them to get to the client and get processed by the client in another process. This should not be surprising.
It is also possible that the single-threaded nature of Javascript in node.js prevents the emits from actually getting sent until your Javascript loop finishes. That would take detailed examination of socket.io code to know for sure if that is an issue. As I said before if you want to 1,1 then 2,2 then 3,3 instead of 1,2,3 sent, then 1,2,3 received you have to write code to force that.
If you want the client to receive the first before the server sends the 2nd, then you have to make the client send a response to the first and have the server not send the 2nd until it receives the response from the first. This is all async networking. You don't control the order of events in different processes unless you write specific code to force a particular sequence.
Also, how do you have client and server in the same console anyway? Unless you are writing out precise timestamps, you wouldn't be able to tell exactly what event came before the other in two separate processes.
One thing you could try is to send 10, then do a setTimeout(fn, 1) to send the next 10 and so on. That would give JS a chance to breathe and perhaps process some other events that are waiting for you to finish to allow the packets to get sent.
There's another networking issue too. By default TCP tries to batch up your sends (at the lowest TCP level). Each time you send, it sets a short timer and doesn't actually send until that timer fires. If more data arrives before the timer fires, it just adds that data to the "pending" packet and sets the timer again. This is referred to as the Nagle's algorithm. You can disable this "feature" on a per-socket basis with socket.setNoDelay(). You have to call that on the actual TCP socket.
I am seeing some discussion that Nagle's algorithm may already be turned off for socket.io (by default). Not sure yet.
In stepping through the process of socket.io's .emit(), there are some cases where the socket is marked as not yet writable. In those cases, the packets are added to a buffer and will be processed "later" on some future tick of the event loop. I cannot see exactly what puts the socket temporarily in this state, but I've definitely seen it happen in the debugger. When it's that way, a tight loop of .emit() will just buffer and won't send until you let other events in the event loop process. This is why doing setTimeout(fn, 0) every so often to keep sending will then let the prior packets process. There's some other event that needs to get processed before socket.io makes the socket writable again.
The issue occurs in the flush() method in engine.io (the transport layer for socket.io). Here's the code for .flush():
Socket.prototype.flush = function () {
if ('closed' !== this.readyState &&
this.transport.writable &&
this.writeBuffer.length) {
debug('flushing buffer to transport');
this.emit('flush', this.writeBuffer);
this.server.emit('flush', this, this.writeBuffer);
var wbuf = this.writeBuffer;
this.writeBuffer = [];
if (!this.transport.supportsFraming) {
this.sentCallbackFn.push(this.packetsFn);
} else {
this.sentCallbackFn.push.apply(this.sentCallbackFn, this.packetsFn);
}
this.packetsFn = [];
this.transport.send(wbuf);
this.emit('drain');
this.server.emit('drain', this);
}
};
What happens sometimes is that this.transport.writable is false. And, when that happens, it does not send the data yet. It will be sent on some future tick of the event loop.
From what I can tell, it looks like the issue may be here in the WebSocket code:
WebSocket.prototype.send = function (packets) {
var self = this;
for (var i = 0; i < packets.length; i++) {
var packet = packets[i];
parser.encodePacket(packet, self.supportsBinary, send);
}
function send (data) {
debug('writing "%s"', data);
// always creates a new object since ws modifies it
var opts = {};
if (packet.options) {
opts.compress = packet.options.compress;
}
if (self.perMessageDeflate) {
var len = 'string' === typeof data ? Buffer.byteLength(data) : data.length;
if (len < self.perMessageDeflate.threshold) {
opts.compress = false;
}
}
self.writable = false;
self.socket.send(data, opts, onEnd);
}
function onEnd (err) {
if (err) return self.onError('write error', err.stack);
self.writable = true;
self.emit('drain');
}
};
Where you can see that the .writable property is set to false when some data is sent until it gets confirmation that the data has been written. So, when rapidly sending data in a loop, it may not be letting the event come through that signals that the data has been successfully sent. When you do a setTimeout() to let some things in the event loop get processed that confirmation event comes through and the .writable property gets set to true again so data can again be sent immediately.
To be honest, socket.io is built of so many abstract layers across dozens of modules that it's very difficult code to debug or analyze on GitHub so it's hard to be sure of the exact explanation. I did definitely see the .writable flag as false in the debugger which did cause a delay so this seems like a plausible explanation to me. I hope this helps.

NodeJS streams and premature end

Assuming a Readable Stream in NodeJS and a Data (on('data', ...)) event handler tied to it that is relatively slow, is it possible for the End event to fire before the last Data handler(s) has finished, and if so, will it prematurely terminate that handler? Or, will all Data events get dispatched and run?
In my case, I am working with large files and want to commit to a DB every data chunk. I am worried that I may lose the last record or two (or more) if End is fired before the last DB calls in the handler actually complete.
Event 'end' fire after last 'data' event. But it may happend before the last Data handler has finished. It is possible that before one 'data' handler has finished, next is started. It depends of what you have in your code, but it is possible that later call of event 'data' finish before earlier. It may cause errors and problems in your code.
Example how to cause problems (to your own tests):
var fs = require('fs');
var rr = fs.createReadStream('somebigfile.jpg');
var i=0;
rr.on('data', function(chunk) {
i++;
var s = i;
console.log('readable:' + s);
setTimeout(function(){
console.log('timeout:'+s);
}, 50-i*10);
});
rr.on('end', function() {
console.log('end');
});
It will print in your console when start each 'data' event handler. And after some miliseconds when it finish. Finish may be in different order.
Solution:
Readable Streams have two modes 'flowing mode' and a 'paused mode'. When you add 'data' event handler, you auto set Readable Streams to flowing mode.
From documentation :
When in flowing mode, data is read from the underlying system and
provided to your program as fast as possible
In this mode events will not wait for your slow actions to finish. For your need is 'paused mode'.
From documentation:
In paused mode, you must explicitly call stream.read() to get chunks
of data out. Streams start out in paused mode.
In other words: you demand chunk of data, you get it, you work with it, and when you ready you ask for new chunk of data. In this mode you controll when you want to get your data.
How to change to 'paused mode':
It is default mode for this stream. But when you register 'data' event handler it switch to 'flowing mode'. Therefore not use readstream.on('data',...)
Instead use readstream.on('readable', function(){...}) when it fire, then it means that stream is ready to give chunk of data. To get chunk of data use var chunk = readstream.read();
Example from docs:
var fs = require('fs');
var rr = fs.createReadStream('foo.txt');
rr.on('readable', function() {
console.log('readable:', rr.read());
});
rr.on('end', function() {
console.log('end');
});
Please read documentation for more details, because there are more posibilities when stream is auto switched to 'flowing mode'.
Work with slow handlers and flowing mode:
If you want/need work in 'flowing mode', there is also solution. You can pause and resume stream. When you get chunk form readstream('data'), pause stream and when you finish work then resume it.
Example from documentation:
var readable = getReadableStreamSomehow();
readable.on('data', function(chunk) {
console.log('got %d bytes of data', chunk.length);
readable.pause();
console.log('there will be no more data for 1 second');
setTimeout(function() {
console.log('now data will start flowing again');
readable.resume();
}, 1000);
});

How do I test if a stream has ended?

I have a stream that may or may not have already ended. If it has ended, I don't want to sit there forever waiting for the end event. How do I check?
As of node 12.9.0, readable streams have a readableEnded property you can check – see readable.readableEnded in the Stream documentation.
(Writable streams have a corresponding property)
If you absolutely have to, you can check stream._readableState.ended, at least in node 0.10 and 0.12. This isn't public API though so be careful. Also apparently all streams aren't guarantee to have this property too:
Worth reading on this:
https://github.com/hapijs/hapi/issues/2368
https://github.com/nodejs/node/issues/445
You probably shouldn't check, and instead attach an end event listener. When the stream ends you will get an end event.
If a [readable]stream is implemented correctly, you will get an end event when there is nothing left to read. For open-ended streams that don't end, you could implement an activity timer to see how long it's been since the stream was last readable, and decide after a certain amount of time to remove your listeners.
(function() {
var timeout = null;
var WAIT_TIME = 10 * 1000; // 10 Sec
function stopListening( ) {
stream.off('data', listener);
}
function listener( data ) {
clearTimeout(timeout);
timeout = setTimeout(stopListening, WAIT_TIME);
// DO STUFF
}
timeout = setTimeout(stopListening, WAIT_TIME);
stream.on('data', listener);
})();

NodeJs: Never emits "end" when reading a TCP Socket

I am pretty new to Node.Js and I'm using tcp sockets to communicate with a client. Since the received data is fragmented I noticed that it prints "ondata" to the console more than once. I need to be able to read all the data and concatenate it in order to implement the other functions. I read the following http://blog.nodejs.org/2012/12/20/streams2/ and thought I can use socket.on('end',...) for this purpose. But it never prints "end" to the console.
Here is my code:
Client.prototype.send = function send(req, cb) {
var self = this;
var buffer = protocol.encodeRequest(req);
var header = new Buffer(16);
var packet = Buffer.concat([ header, buffer ], 16 + buffer.length);
function cleanup() {
self.socket.removeListener('data', ondata);
self.socket.removeListener('error', onerror);
}
var body = '';
function ondata() {
var chunk = this.read() || '';
body += chunk;
console.log('ondata');
}
self.socket.on('readable', ondata);
self.socket.on('end', function() {
console.log('end');
});
function onerror(err) {
cleanup();
cb(err);
}
self.socket.on('error', onerror);
self.socket.write(packet);
};
The end event will handle the FIN package of the TCP protocol (in other words: will handle the close package)
Event: 'end'#
Emitted when the other end of the socket sends a FIN packet.
By default (allowHalfOpen == false) the socket will destroy its file descriptor once it has written out its pending write queue. However, by setting allowHalfOpen == true the socket will not automatically end() its side allowing the user to write arbitrary amounts of data, with the caveat that the user is required to end() their side now.
About FIN package: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_termination
The solution
I understand your problem, the network communication have some data transfer gaps and it split your message in some packages. You just want read your fully content.
For solve this problem i will recommend you create a protocol. Just send a number with the size of your message before and while the size of your concatenated message was less than total of your message size, keep concatenating :)
I have created a lib yesterday to simplify that issue: https://www.npmjs.com/package/node-easysocket
I hope it helps :)

NodeJS sockets initialized as unpaused?

A net.Socket object in NodeJS is a Readable Stream, however one note in the docs got me concerned:
For the Net.Socket 'data' event, the docs say
Note that the data will be lost if there is no listener when a Socket emits a 'data' event.
That seems to imply a Socket is returned to the calling script in "flowing-mode" and already un-paused? However, for a generic Readable Stream, the documentation for the 'data' event says
If you attach a data event listener, then it will switch the stream into flowing mode, and data will be passed to your handler as soon as it is available.
That "If" seems to imply if you wait a bit to bind to the 'data' event, the stream will wait for you, and if you intentionally want to miss the 'data' events, the example in the resume() method seems to indicate you must call the resume() method to start the flow of data.
My concern is that when working with a net.Server, when you receive a net.Socket as part of a 'connection' event, is it imperative that you start handling the 'data' events right away since it's already opened? Meaning if I do:
var s = new net.Server();
s.on('connection', function(socket) {
// Do some lengthy setup process here, blocking execution for a few seconds...
socket.on('data', function(d) { console.log(d); });
});
s.listen(8080);
Meaning not bind to the 'data' event right away, I could lose data? So is this a more robust way to handle incoming connections if you have a lengthy setup required for each one?
var s = new net.Server();
s.on('connection', function(socket) {
socket.pause(); // Not ready for you yet!
// Do some lengthy setup process here, blocking execution for a few seconds...
socket.on('data', function(d) { console.log(d); });
socket.resume(); // Okay, go!
});
s.listen(8080);
Anyone have experience working with listening on raw socket streams to know if this data loss is an issue?
I'm hoping this is an instance where the Net.Socket documentation wasn't updated since v0.10, since the stream documentation has a section that mentions 'data' events started emitting right away in versions prior to 0.10. Were TCP sockets properly updated to not start emitting 'data' packets right away, and the documentation not updated appropriately?
Yes, this is the docs flaw. Here is an example:
var net = require('net')
var server = net.createServer(onConnection)
function onConnection (socket) {
console.log('onConnection')
setTimeout(startReading, 1000)
function startReading () {
socket.on('data', read)
socket.on('end', stopReading)
}
function stopReading () {
socket.removeListener('data', read)
socket.removeListener('end', stopReading)
}
}
function read (data) {
console.log('Received: ' + data.toString('utf8'))
}
server.listen(1234, onListening)
function onListening () {
console.log('onListening')
net.connect(1234, onConnect)
}
function onConnect () {
console.log('onConnect')
this.write('1')
this.write('2')
this.write('3')
this.write('4')
this.write('5')
this.write('6')
}
All the data is received. If you explicitly resume() socket, you will lose it.
Also, if you do your "lengthy" setup in a blocking manner (which you shouldn't) you can't lose any IO as it has no chance to be processed, so no events will be emitted.

Resources