Can an event based read function ever run out of order? - node.js

Given a situation where I use the nodejs readline library to iterate over each line in the STDIN stream, do some processing on it and write it back out to STDOUT as in the following example:
var rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
terminal: false
});
function my_function(line) {
var output = ...(line);
process.stdout.write(output);
}
rl.on('line', my_function);
I'm concerned that the processing I'm doing will take very different amounts of time depending on the line content so some lines will return very quickly while others takes some time to sort out. Is it possible that my_function() will ever run out of order and hence cause the output stream to be scrambled? Should I be looking into using a synchronous loop of some kind instead of this asynchronous event handler?

The JavaScript execution itself is single-threaded, so as long as you're only performing synchronous operations inside the event handler, there is no problem.
If you are performing asynchronous operations inside the event handler, then it is possible that another 'line' event could be emitted before your asynchronous operation(s) are complete. In that case, you would need to rl.pause() first and then rl.resume() once you are finished with your asynchronous operations. However, this isn't foolproof since 'line' events could still be emitted after a rl.pause() if the current chunk of data read from the input stream had multiple line breaks.
So if you are performing asynchronous operations inside the event handler, you are probably better off just reading from the stream yourself so that you have more control over the parsing behavior. This is actually pretty easy to do, for example:
function parseStream(stream, callback) {
// Assuming all stream data is text and not binary ...
var buffer = '';
var RE_EOL = /\r?\n/g;
stream.on('data', function(data) {
buffer += data;
processBuffer();
});
stream.on('end', callback);
stream.on('error', callback);
function processBuffer() {
var idx = RE_EOL.exec(buffer);
if (~idx) {
// Found a line ending
var line = buffer.slice(0, RE_EOL.index);
buffer = buffer.slice(RE_EOL.index + RE_EOL[0].length);
stream.pause();
callback(null, line, processBuffer);
} else {
stream.resume();
}
}
}
// ...
processStream(process.stdin, function(err, line, done) {
if (err) throw err;
if (line === undefined) {
// No more data will be available (stream ended)
console.log('(Stream ended!)');
return;
}
// Do something with `line`
console.log(line);
// Call `done()` whenever your async operation(s) are all finished
done();
});

Related

Promise resolving to child stream stdout and rejecting child stream stderr

I'd like to build a promise that spawns a child process using require('child_process').spawn. The process streams its output to stdout and its errors to stderr.
I would like the promise to:
reject(child.stderr stream (or its data)) if child.stderr emits any data.
resolve(child.stdout stream) only if no error is emitted.
I'm doing this because I want to chain the promise to:
a then that processes the child.stdout stream (upload the stream to an S3 bucket).
a catch that can process the child.stderr stream, allowing me to properly handle errors.
Is it feasible to combine promises and process streams like this ?
I was thinking of working around stderr but unsure about whats happening in between to stdout if a lot of data is coming into it and I don't process it fast enough.
As I see it, the issue is that you don't know whether you ever got data on stderr until the entire process is done as it could put data there at any time.
So, you have to wait for the entire process to be done before calling resolve() or reject(). And, if you then want the entire data to be sent to either one of those, you'd have to buffer them. You could call reject() as soon as you got data on stderr, but you aren't guaranteed to have all the data yet because it's a stream.
So, if you don't want to buffer, you're better off just letting the caller see the streams directly.
If you are OK with buffering the data, you can buffer it yourself like this:
Based on the spawn example in the node.js doc, you could add promise support to it like this:
const spawn = require('child_process').spawn;
function runIt(cmd, args) {
return new Promise(function(resolve, reject) {
const ls = spawn(cmd, args);
// Edit thomas.g: My child process generates binary data so I use buffers instead, see my comments inside the code
// Edit thomas.g: let stdoutData = new Buffer(0)
let stdoutData = "";
let stderrData= "";
ls.stdout.on('data', (data) => {
// Edit thomas.g: stdoutData = Buffer.concat([stdoutData, chunk]);
stdoutData += data;
});
ls.stderr.on('data', (data) => {
stderrData += data;
});
ls.on('close', (code) => {
if (stderrData){
reject(stderrData);
} else {
resolve(stdoutData);
}
});
ls.on('error', (err) => {
reject(err);
});
})
}
//usage
runIt('ls', ['-lh', '/usr']).then(function(stdoutData) {
// process stdout data here
}, function(err) {
// process stdError data here or error object (if some other type of error)
});

In this code, why using a closure?

I don't get why a closure is being used in the code below:
function writeData(socket, data){
var success = !socket.write(data);
if(!success){
(function(socket, data){
socket.once('drain', function(){
writeData(socket, data);
});
})(socket, data)
}
}
and why using var success=!socket.write(data); instead directly input.
May be socket.write is not a boolean?
The IIFE is unnecessary, you can rewrite the code to this:
function writeData(socket, data){
var success = ! socket.write(data);
if (! success) {
socket.once('drain', function() {
writeData(socket, data);
});
}
}
Or even this:
function writeData(socket, data){
var success = ! socket.write(data);
if (! success) {
socket.once('drain', writeData.bind(this, socket, data));
}
}
According to the documentation for socket.write(), the method
Sends data on the socket. The second parameter specifies the encoding
in the case of a string--it defaults to UTF8 encoding.
Returns true if the entire data was flushed successfully to the kernel
buffer. Returns false if all or part of the data was queued in user
memory. 'drain' will be emitted when the buffer is again free.
The optional callback parameter will be executed when the data is
finally written out - this may not be immediately.
In the code, if the first socket.write() is not able to flush all the data in one go, the closure waits for the socket drain event, in which case it will call writeData method again. This is a very ingenious way of creating an asynchronous recursive function, which will get called until success returns true.

How to correctly calculate the the number of bytes of a node.js stream that have been processed?

I have a stream I'm sending over the wire and takes a bit of time to fully send, so I want to display how far along it is on the fly. I know you can listen on the 'data' event for streams, but in newer versions of node, it also puts the stream into "flowing mode". I want to make sure i'm doing this correctly.
Currently I have the following stuff:
deploymentPackageStream.pause() // to prevent it from entering "flowing mode"
var bytesSent = 0
deploymentPackageStream.on('data', function(data) {
bytesSent+=data.length
process.stdout.write('\r ')
process.stdout.write('\r'+(bytesSent/1000)+'kb sent')
})
deploymentPackageStream.resume()
// copy over the deployment package
execute(conn, 'cat > deploymentPackage.sh', deploymentPackageStream).wait()
This gives me the right bytesSent output, but the resulting package seems to be missing some data off the front. If I put the 'resume' line after executing the copy line (the last line), it doesn't copy anything. If I don't resume, it also doesn't copy anything. What's going on and how do I do this properly without disrupting the stream and without entering flowing mode (I want back pressure)?
I should mention, i'm still using node v0.10.x
Alright, I made something that essentially is a passthrough, but calls a callback with data as it comes in:
// creates a stream that can view all the data in a stream and passes the data through
// parameters:
// stream - the stream to peek at
// callback - called when there's data sent from the passed stream
var StreamPeeker = exports.StreamPeeker = function(stream, callback) {
Readable.call(this)
this.stream = stream
stream.on('readable', function() {
var data = stream.read()
if(data !== null) {
if(!this.push(data)) stream.pause()
callback(data)
}
}.bind(this))
stream.on('end', function() {
this.push(null)
}.bind(this))
}
util.inherits(StreamPeeker, Readable)
StreamPeeker.prototype._read = function() {
this.stream.resume()
}
If I understand streams properly, this should appropriately handle backpressure.
Using this, I can just count up data.length in the callback like this:
var peeker = new StreamPeeker(stream, function(data) {
// use data.length
})
peeker.pipe(destination)

EventEmitter in the middle of a chain of Promises

I am doing something that involves running a sequence of child_process.spawn() in order (to do some setup, then run the actual meaty command that the caller is interested in, then do some cleanup).
Something like:
doAllTheThings()
.then(function(exitStatus){
// all the things were done
// and we've returned the exitStatus of
// a command in the middle of a chain
});
Where doAllTheThings() is something like:
function doAllTheThings() {
runSetupCommand()
.then(function(){
return runInterestingCommand();
})
.then(function(exitStatus){
return runTearDownCommand(exitStatus); // pass exitStatus along to return to caller
});
}
Internally I'm using child_process.spawn(), which returns an EventEmitter and I'm effectively returning the result of the close event from runInterestingCommand() back to the caller.
Now I need to also send data events from stdout and stderr to the caller, which are also from EventEmitters. Is there a way to make this work with (Bluebird) Promises, or are they just getting in the way of EventEmitters that emit more than one event?
Ideally I'd like to be able to write:
doAllTheThings()
.on('stdout', function(data){
// process a chunk of received stdout data
})
.on('stderr', function(data){
// process a chunk of received stderr data
})
.then(function(exitStatus){
// all the things were done
// and we've returned the exitStatus of
// a command in the middle of a chain
});
The only way I can think to make my program work is to rewrite it to remove the promise chain and just use a raw EventEmitter inside something that wraps the setup/teardown, something like:
withTemporaryState(function(done){
var cmd = runInterestingCommand();
cmd.on('stdout', function(data){
// process a chunk of received stdout data
});
cmd.on('stderr', function(data){
// process a chunk of received stderr data
});
cmd.on('close', function(exitStatus){
// process the exitStatus
done();
});
});
But then since EventEmitters are so common throughout Node.js, I can't help but think I should be able to make them work in Promise chains. Any clues?
Actually, one of the reasons I want to keep using Bluebird, is because I want to use the Cancellation features to allow the running command to be cancelled from the outside.
There are two approaches, one provides the syntax you originally asked for, the other takes delegates.
function doAllTheThings(){
var com = runInterestingCommand();
var p = new Promise(function(resolve, reject){
com.on("close", resolve);
com.on("error", reject);
});
p.on = function(){ com.on.apply(com, arguments); return p; };
return p;
}
Which would let you use your desired syntax:
doAllTheThings()
.on('stdout', function(data){
// process a chunk of received stdout data
})
.on('stderr', function(data){
// process a chunk of received stderr data
})
.then(function(exitStatus){
// all the things were done
// and we've returned the exitStatus of
// a command in the middle of a chain
});
However, IMO this is somewhat misleading and it might be desirable to pass the delegates in:
function doAllTheThings(onData, onErr){
var com = runInterestingCommand();
var p = new Promise(function(resolve, reject){
com.on("close", resolve);
com.on("error", reject);
});
com.on("stdout", onData).on("strerr", onErr);
return p;
}
Which would let you do:
doAllTheThings(function(data){
// process a chunk of received stdout data
}, function(data){
// process a chunk of received stderr data
})
.then(function(exitStatus){
// all the things were done
// and we've returned the exitStatus of
// a command in the middle of a chain
});

Pausing readline in Node.js

Consider the code below ... I am trying to pause the stream after reading the first 5 lines:
var fs = require('fs');
var readline = require('readline');
var stream = require('stream');
var numlines = 0;
var instream = fs.createReadStream("myfile.json");
var outstream = new stream;
var readStream = readline.createInterface(instream, outstream);
readStream.on('line', function(line){
numlines++;
console.log("Read " + numlines + " lines");
if (numlines >= 5) {
console.log("Pausing stream");
readStream.pause();
}
});
The output (copied next) suggests that it keeps reading lines after the pause. Perhaps readline has queued up a few more lines in the buffer, and is feeding them to me anyway ... this would make sense if it continues to read asynchronously in the background, but based on the documentation, I don't know what the proper behavior should be. Any recommendations on how to achieve the desired effect?
Read 1 lines
Read 2 lines
Read 3 lines
Read 4 lines
Read 5 lines
Pausing stream
Read 6 lines
Pausing stream
Read 7 lines
Somewhat unintuitively, the pause methods does not stop queued up line events:
Calling rl.pause() does not immediately pause other events (including 'line') from being emitted by the readline.Interface instance.
There is however a 3rd-party module named line-by-line where pause does pause the line events until it is resumed.
var LineByLineReader = require('line-by-line'),
lr = new LineByLineReader('big_file.txt');
lr.on('error', function (err) {
// 'err' contains error object
});
lr.on('line', function (line) {
// pause emitting of lines...
lr.pause();
// ...do your asynchronous line processing..
setTimeout(function () {
// ...and continue emitting lines.
lr.resume();
}, 100);
});
lr.on('end', function () {
// All lines are read, file is closed now.
});
(I have no affiliation with the module, just found it useful for dealing with this issue.)
So, it turns out that the readline stream tends to "drip" (i.e., leak a few extra lines) even after a pause(). The documentation does not make this clear, but it's true.
If you want the pause() toggle to appear immediate, you'll have to create your own line buffer and accumulate the leftover lines yourself.
add some points:
.on('pause', function() {
console.log(numlines)
})
You will get the 5. It mentioned in the node.js document :
The input stream is not paused and receives the SIGCONT event. (See events SIGTSTP and SIGCONT)
So, I created a tmp buffer in the line event. Use a flag to determine whether it is triggered paused.
.on('line', function(line) {
if (paused) {
putLineInBulkTmp(line);
} else {
putLineInBulk(line);
}
}
then in the on pause, and resume:
.on('pause', function() {
paused = true;
doSomething(bulk, function(resp) {
// clean up bulk for the next.
bulk = [];
// clone tmp buffer.
bulk = clone(bulktmp);
bulktmp = [];
lr.resume();
});
})
.on('resume', () => {
paused = false;
})
Use this way to handle this kind of situation.

Resources