NodeJS streams and premature end - node.js

Assuming a Readable Stream in NodeJS and a Data (on('data', ...)) event handler tied to it that is relatively slow, is it possible for the End event to fire before the last Data handler(s) has finished, and if so, will it prematurely terminate that handler? Or, will all Data events get dispatched and run?
In my case, I am working with large files and want to commit to a DB every data chunk. I am worried that I may lose the last record or two (or more) if End is fired before the last DB calls in the handler actually complete.

Event 'end' fire after last 'data' event. But it may happend before the last Data handler has finished. It is possible that before one 'data' handler has finished, next is started. It depends of what you have in your code, but it is possible that later call of event 'data' finish before earlier. It may cause errors and problems in your code.
Example how to cause problems (to your own tests):
var fs = require('fs');
var rr = fs.createReadStream('somebigfile.jpg');
var i=0;
rr.on('data', function(chunk) {
i++;
var s = i;
console.log('readable:' + s);
setTimeout(function(){
console.log('timeout:'+s);
}, 50-i*10);
});
rr.on('end', function() {
console.log('end');
});
It will print in your console when start each 'data' event handler. And after some miliseconds when it finish. Finish may be in different order.
Solution:
Readable Streams have two modes 'flowing mode' and a 'paused mode'. When you add 'data' event handler, you auto set Readable Streams to flowing mode.
From documentation :
When in flowing mode, data is read from the underlying system and
provided to your program as fast as possible
In this mode events will not wait for your slow actions to finish. For your need is 'paused mode'.
From documentation:
In paused mode, you must explicitly call stream.read() to get chunks
of data out. Streams start out in paused mode.
In other words: you demand chunk of data, you get it, you work with it, and when you ready you ask for new chunk of data. In this mode you controll when you want to get your data.
How to change to 'paused mode':
It is default mode for this stream. But when you register 'data' event handler it switch to 'flowing mode'. Therefore not use readstream.on('data',...)
Instead use readstream.on('readable', function(){...}) when it fire, then it means that stream is ready to give chunk of data. To get chunk of data use var chunk = readstream.read();
Example from docs:
var fs = require('fs');
var rr = fs.createReadStream('foo.txt');
rr.on('readable', function() {
console.log('readable:', rr.read());
});
rr.on('end', function() {
console.log('end');
});
Please read documentation for more details, because there are more posibilities when stream is auto switched to 'flowing mode'.
Work with slow handlers and flowing mode:
If you want/need work in 'flowing mode', there is also solution. You can pause and resume stream. When you get chunk form readstream('data'), pause stream and when you finish work then resume it.
Example from documentation:
var readable = getReadableStreamSomehow();
readable.on('data', function(chunk) {
console.log('got %d bytes of data', chunk.length);
readable.pause();
console.log('there will be no more data for 1 second');
setTimeout(function() {
console.log('now data will start flowing again');
readable.resume();
}, 1000);
});

Related

Why while loop is needed for reading a non-flowing mode stream in Node.js?

In the node.js documentation, I came across the following code
const readable = getReadableStreamSomehow();
// 'readable' may be triggered multiple times as data is buffered in
readable.on('readable', () => {
let chunk;
console.log('Stream is readable (new data received in buffer)');
// Use a loop to make sure we read all currently available data
while (null !== (chunk = readable.read())) {
console.log(`Read ${chunk.length} bytes of data...`);
}
});
// 'end' will be triggered once when there is no more data available
readable.on('end', () => {
console.log('Reached end of stream.');
});
Here is the comment from the node.js documentation concerning the usage of the while loop, saying it's needed to make sure all data is read
// Use a loop to make sure we read all currently available data
while (null !== (chunk = readable.read())) {
I couldn't understand why it is needed and tried to replace while with just if statement, and the process terminated after the very first read. Why?
From the node.js documentation
The readable.read() method should only be called on Readable streams operating in paused mode. In flowing mode, readable.read() is called automatically until the internal buffer is fully drained.
Be careful that this method is only meant for stream that has been paused.
And even further, if you understand what a stream is, you'll understand that you need to process chunks of data.
Each call to readable.read() returns a chunk of data, or null. The chunks are not concatenated. A while loop is necessary to consume all data currently in the buffer.
So i hope you understand that if you are not looping through your readable stream and only executing 1 read, you won't get your full data.
Ref: https://nodejs.org/api/stream.html

Access of global variables in setImmediate in node.js

Below is a piece of code:
var buffer = new Buffer(0, 'hex'); //Global buffer
socket.on('data', function(data) {
// Concatenate the received data to buffer
buffer = Buffer.concat([buffer, new Buffer(data, 'hex')]);
setImmediate(function() { // Executed asynchronously
/*Process messages received in buffer*/
var messageLength = getMessageLength(buffer);
while (buffer.length >= messageLength) {
/*Process message and send response*/
}
//Remove message from buffer after processing is done
buffer.splice(messageLength);
}) // End of setImmediate
}) //End of socket.on
I am using a global variable 'buffer', inside the setImmediate block(executed asynchronously). Is there a guarantee that the global buffer variable does not change(either due to addition of data or deletion of data) during the execution of code in setImmediate block?? If no, how to handle such that the buffer is accessed safely??
The oft-repeated saying "NodeJS is single-threaded" means there is no question of "safety" here. Simultaneous accesses to a variable are not possible because simultaneous operations do not occur. Even though the setImmediate code is executed asynchronously, that does not mean it is executed as the SAME TIME. It just means it is executed "soon". The parent function can return before this happens - but the parent function is not running when the anonymous setImmediate callback is triggered. At that time, the callback is the only thing running.
These operations are thus safe - but for what it's worth, it's not very efficient. NodeJS buffers are fixed-length, which is why you need to need to keep re-allocating a new one to append data. They're suitable for one-time loads but not really ideal for constant append operations. Consider using a readable stream. This allows you to pull out and process any length of data you want at a time, and can return a buffer. But internally it does not constantly re-allocate its storage block for the data read.

How do I prevent node.js from waiting for keyboard input?

I was trying to write a node.js script that only takes input from stdin if it's piped (as opposed to wait input from keyboard). Therefore I need to determine whether the stdin piped in is null.
First I tried using the readable event:
var s = process.stdin;
s.on('readable', function () {
console.log('Event "readable" is fired!');
var chunk = s.read();
console.log(chunk);
if (chunk===null) s.pause();
});
And the result is as expected:
$ node test.js
Event "readable" is fired!
null
$
Then I tried to do the same thing using data event, because I like to use flowing mode:
var s = process.stdin;
s.on('data', function (chunk) {
console.log('Event "data" is fired!');
console.log(chunk);
if (chunk===null) s.pause();
});
but this time it waited for keyboard input before the null check, and stucked there. I was wondering why it does that? Does that mean in order to do a null check, I need to pause it first, and wait readable to be fired, do the null check, and then resume the stream, just to prevent node.js from waiting keyboard input? This seems awkward to me. Is there a way to avoid using readable event?
Use tty.isatty() from the node core library. That function will return false if stdin is a pipe.

Pausing a non-flowing stream (delaying readable event)

On one end I have a http request (long polling). On the other end is a "game server" dispatching events. These ends are tied together with a duplex non-flowing stream in object mode and that part works fine.
The long poll end listens on readable and drains the stream by calling stream.read repeatedly. Then closes the client's connection.
The game server uses stream.write to push events to the clients.
Some events in the game actually spand several events and here's the problem:
When the game server adds several events at once (calling stream.write repeatedly) the first write triggers readable and the long poll is filled with the event and closed. That's very inconvinient.
The essential of the problem is that I can't silent readable and then trigger it when I'm done writing.
So my question is; can I somehow "pause" the stream and resume afterwards?
Is there a known another solution to this problem?
My best bet is to write an array of events, but I think that's somehow an antipattern.
Here's some code to illustrate my problem:
var stream = require('stream');
var connection = stream.PassThrough({ objectMode: true });
var exhaust = function() {
console.log('exhausting');
var chunk;
while ((chunk = connection.read()) !== null)
console.log(chunk);
console.log('exhausting end');
}
connection.on('readable', function(){
console.log('Ready to read');
exhaust();
});
for (var i = 0;i < 10;i++)
connection.write({ test: true });
I ended up writing an array of events to write, but I'm looking forward to this upcoming feature, which could be the missin solution:
http://strongloop.com/strongblog/performance-node-js-v-0-12-whats-new/
http://nodejs.org/docs/v0.11.10/api/stream.html#stream_writable_cork

Better way to make node not exit?

In a node program I'm reading from a file stream with fs.createReadStream. But when I pause the stream the program exits. I thought the program would keep running since the file is still opened, just not being read.
Currently to get it to not exit I'm setting an interval that does nothing.
setInterval(function() {}, 10000000);
When I'm ready to let the program exit, I clear it. But is there a better way?
Example Code where node will exit:
var fs = require('fs');
var rs = fs.createReadStream('file.js');
rs.pause();
Node will exit when there is no more queued work. Calling pause on a ReadableStream simply pauses the data event. At that point, there are no more events being emitted and no outstanding work requests, so Node will exit. The setInterval works since it counts as queued work.
Generally this is not a problem since you will probably be doing something after you pause that stream. Once you resume the stream, there will be a bunch of queued I/O and your code will execute before Node exits.
Let me give you an example. Here is a script that exits without printing anything:
var fs = require('fs');
var rs = fs.createReadStream('file.js');
rs.pause();
rs.on('data', function (data) {
console.log(data); // never gets executed
});
The stream is paused, there is no outstanding work, and my callback never runs.
However, this script does actually print output:
var fs = require('fs');
var rs = fs.createReadStream('file.js');
rs.pause();
rs.on('data', function (data) {
console.log(data); // prints stuff
});
rs.resume(); // queues I/O
In conclusion, as long as you are eventually calling resume later, you should be fine.
Short way based on answers below
require('fs').createReadStream('file.js').pause();

Resources