Parse output of spawned node.js child process line by line - node.js

I have a PhantomJS/CasperJS script which I'm running from within a node.js script using process.spawn(). Since CasperJS doesn't support require()ing modules, I'm trying to print commands from CasperJS to stdout and then read them in from my node.js script using spawn.stdout.on('data', function(data) {}); in order to do things like add objects to redis/mongoose (convoluted, yes, but seems more straightforward than setting up a web service for this...) The CasperJS script executes a series of commands and creates, say, 20 screenshots which need to be added to my database.
However, I can't figure out how to break the data variable (a Buffer?) into lines... I've tried converting it to a string and then doing a replace, I've tried doing spawn.stdout.setEncoding('utf8'); but nothing seems to work...
Here is what I have right now
var spawn = require('child_process').spawn;
var bin = "casperjs"
//googlelinks.js is the example given at http://casperjs.org/#quickstart
var args = ['scripts/googlelinks.js'];
var cspr = spawn(bin, args);
//cspr.stdout.setEncoding('utf8');
cspr.stdout.on('data', function (data) {
var buff = new Buffer(data);
console.log("foo: " + buff.toString('utf8'));
});
cspr.stderr.on('data', function (data) {
data += '';
console.log(data.replace("\n", "\nstderr: "));
});
cspr.on('exit', function (code) {
console.log('child process exited with code ' + code);
process.exit(code);
});
https://gist.github.com/2131204

Try this:
cspr.stdout.setEncoding('utf8');
cspr.stdout.on('data', function(data) {
var str = data.toString(), lines = str.split(/(\r?\n)/g);
for (var i=0; i<lines.length; i++) {
// Process the line, noting it might be incomplete.
}
});
Note that the "data" event might not necessarily break evenly between lines of output, so a single line might span multiple data events.

I've actually written a Node library for exactly this purpose, it's called stream-splitter and you can find it on Github: samcday/stream-splitter.
The library provides a special Stream you can pipe your casper stdout into, along with a delimiter (in your case, \n), and it will emit neat token events, one for each line it has split out from the input Stream. The internal implementation for this is very simple, and delegates most of the magic to substack/node-buffers which means there's no unnecessary Buffer allocations/copies.

I found a nicer way to do this with just pure node, which seems to work well:
const childProcess = require('child_process');
const readline = require('readline');
const cspr = childProcess.spawn(bin, args);
const rl = readline.createInterface({ input: cspr.stdout });
rl.on('line', line => /* handle line here */)

Adding to maerics' answer, which does not deal properly with cases where only part of a line is fed in a data dump (theirs will give you the first part and the second part of the line individually, as two separate lines.)
var _breakOffFirstLine = /\r?\n/
function filterStdoutDataDumpsToTextLines(callback){ //returns a function that takes chunks of stdin data, aggregates it, and passes lines one by one through to callback, all as soon as it gets them.
var acc = ''
return function(data){
var splitted = data.toString().split(_breakOffFirstLine)
var inTactLines = splitted.slice(0, splitted.length-1)
var inTactLines[0] = acc+inTactLines[0] //if there was a partial, unended line in the previous dump, it is completed by the first section.
acc = splitted[splitted.length-1] //if there is a partial, unended line in this dump, store it to be completed by the next (we assume there will be a terminating newline at some point. This is, generally, a safe assumption.)
for(var i=0; i<inTactLines.length; ++i){
callback(inTactLines[i])
}
}
}
usage:
process.stdout.on('data', filterStdoutDataDumpsToTextLines(function(line){
//each time this inner function is called, you will be getting a single, complete line of the stdout ^^
}) )

You can give this a try. It will ignore any empty lines or empty new line breaks.
cspr.stdout.on('data', (data) => {
data = data.toString().split(/(\r?\n)/g);
data.forEach((item, index) => {
if (data[index] !== '\n' && data[index] !== '') {
console.log(data[index]);
}
});
});

Old stuff but still useful...
I have made a custom stream Transform subclass for this purpose.
See https://stackoverflow.com/a/59400367/4861714

#nyctef's answer uses an official nodejs package.
Here is a link to the documentation: https://nodejs.org/api/readline.html
The node:readline module provides an interface for reading data from a Readable stream (such as process.stdin) one line at a time.
My personal use-case is parsing json output from the "docker watch" command created in a spawned child_process.
const dockerWatchProcess = spawn(...)
...
const rl = readline.createInterface({
input: dockerWatchProcess.stdout,
output: null,
});
rl.on('line', (log: string) => {
console.log('dockerWatchProcess event::', log);
// code to process a change to a docker event
...
});

Related

Custom Node JS REPL input/output stream

I need to have custom REPL input/output stream. for example I need to pass a piece of script to the REPL when some event happens and get it's output and do something with it.
To describe it more clear to you, I'm working on a vscode plugin (github: source code) which provides REPL. in my case I have a vscode WebView and from there, I get user input and then I want to pass that input to the node REPL and get its output and show it to user.
So, how would I achieve that? If you need more information please tell me. thanks in advance.
EDIT 1:
const replServer = repl.start({
input: /* what should be here? */,
output: /* what should be here? */
});
Edit 2:
can anyone explain me what is the usage of input/output parameters in the above example?
Here is a solution that worked for me.
const {
PassThrough
} = require('stream')
const repl = require('repl')
const input = new PassThrough()
const output = new PassThrough()
output.setEncoding('utf-8')
const _repl = repl.start({
prompt: 'awesomeRepl> ',
input,
output
})
_repl.on('exit', function() {
// Do something when REPL exit
console.log('Exited REPL...')
})
function evaluate(code) {
let evaluatedCode = ''
output.on('data', (chunk) => {
evaluatedCode += chunk.toString()
console.log(evaluatedCode)
})
input.write(`${code}\n`)
return result
}
evaluate('2 + 2') // should return 4
Notice created the REPL instance outside the evaluate function so we don't create a new instance for every call of evaluate
To create a repl server you just need to do
const repl = require('repl')
repl.start({prompt: "> ", input: input_stream, output: output_stream");
prompt is a string that is the prompt, stream is the input. input_stream needs to be a readable stream, output_stream needs to be a writable one. you can read more about streams here. Once the streams are working you can do
output_stream.on('data', (chunk) => {
14 //whatever you do with the data
15 });

Using stream-combiner and Writable Streams (stream-adventure)

i'm working on nodeschool.io's stream-adventure. The challenge:
Write a module that returns a readable/writable stream using the
stream-combiner module. You can use this code to start with:
var combine = require('stream-combiner')
module.exports = function () {
return combine(
// read newline-separated json,
// group books into genres,
// then gzip the output
)
}
Your stream will be written a newline-separated JSON list of science fiction
genres and books. All the books after a "type":"genre" row belong in that
genre until the next "type":"genre" comes along in the output.
{"type":"genre","name":"cyberpunk"}
{"type":"book","name":"Neuromancer"}
{"type":"book","name":"Snow Crash"}
{"type":"genre","name":"space opera"}
{"type":"book","name":"A Deepness in the Sky"}
{"type":"book","name":"Void"}
Your program should generate a newline-separated list of JSON lines of genres,
each with a "books" array containing all the books in that genre. The input
above would yield the output:
{"name":"cyberpunk","books":["Neuromancer","Snow Crash"]}
{"name":"space opera","books":["A Deepness in the Sky","Void"]}
Your stream should take this list of JSON lines and gzip it with
zlib.createGzip().
HINTS
The stream-combiner module creates a pipeline from a list of streams,
returning a single stream that exposes the first stream as the writable side and
the last stream as the readable side like the duplexer module, but with an
arbitrary number of streams in between. Unlike the duplexer module, each
stream is piped to the next. For example:
var combine = require('stream-combiner');
var stream = combine(a, b, c, d);
will internally do a.pipe(b).pipe(c).pipe(d) but the stream returned by
combine() has its writable side hooked into a and its readable side hooked
into d.
As in the previous LINES adventure, the split module is very handy here. You
can put a split stream directly into the stream-combiner pipeline.
Note that split can send empty lines too.
If you end up using split and stream-combiner, make sure to install them
into the directory where your solution file resides by doing:
`npm install stream-combiner split`
Note: when you test the program, the source stream is automatically inserted into the program, so it's perfectly fine to have split() as the first parameter in combine(split(), etc., etc.)
I'm trying to solve this challenge without using the 'through' package.
My code:
var combiner = require('stream-combiner');
var stream = require('stream')
var split = require('split');
var zlib = require('zlib');
module.exports = function() {
var ws = new stream.Writable({decodeStrings: false});
function ResultObj() {
name: '';
books: [];
}
ws._write = function(chunk, enc, next) {
if(chunk.length === 0) {
next();
}
chunk = JSON.parse(chunk);
if(chunk.type === 'genre') {
if(currentResult) {
this.push(JSON.stringify(currentResult) + '\n');
}
var currentResult = new ResultObj();
currentResult.name = chunk.name;
} else {
currentResult.books.push(chunk.name);
}
next();
var wsObj = this;
ws.end = function(d) {
wsObj.push(JSON.stringify(currentResult) + '\n');
}
}
return combiner(split(), ws, zlib.createGzip());
}
My code does not work and returns 'Cannot pipe. Not readable'. Can someone point out to me where i'm going wrong?
Any other comments on how to improve are welcome too...

Block for stdin in Node.js

Short explanation:
I'm attempting to write a simple game in Node.js that needs to wait for user input every turn. How do I avoid callback hell (e.g. messy code) internal to a turn loop where each turn loop iteration needs to block and wait for input from stdin?
Long explanation:
All the explanations I have read on StackOverflow when someone asks about blocking for stdin input seem to be "that's not what Node.js is about!"
I understand that Node.js is designed to be non-blocking and I also understand why. However I feel that it has me stuck between a rock and a hard place on how to solve this. I feel like I have three options:
Find a way to block for stdin and retain my while loop
Ditch the while loop and instead recursively call a method (like nextTurn) whenever the previous turn ends.
Ditch the while loop and instead use setTimeout(0, ...) or something similar to call a method (like nextTurn) whenever a turn ends.
With option (1) I am going against Node.js principles of non-blocking IO.
With option (2) I will eventually reach a stack overflow as each call adds another turn to the call stack.
With option (3) my code ends up being a mess to follow.
Internal to Node.js there are default functions that are marked **Sync (e.g. see the fs library or the sleep function) and I'm wondering why there is no Sync method for getting user input? And if I were to write something similar to fs.readSync how would I go about doing it and still follow best practices?
Just found this:
https://www.npmjs.com/package/readline-sync
Example code (after doing an npm install readline-sync)
var readlineSync = require('readline-sync');
while(true) {
var yn = readlineSync.question("Do you like having tools that let you code how you want, rather than how their authors wanted?");
if(yn === 'y') {
console.log("Hooray!");
} else {
console.log("Back to callback world, I guess...");
process.exit();
}
}
Only problem so far is the wailing of the "That's not how node is meant to be used!" chorus, but I have earplugs :)
I agree with the comment about moving towards an event based system and would ditch the loops. I've thrown together a quick example of text based processing which can be used for simple text games.
var fs = require('fs'),
es = require('event-stream');
process.stdin
.pipe(es.split())
.on('data', parseCommand);
var actionHandlers = {};
function parseCommand(command) {
var words = command.split(' '),
action = '';
if(words.length > 1) {
action = words.shift();
}
if(actionHandlers[action]) {
actionHandlers[action](words);
} else {
invalidAction(action);
}
}
function invalidAction(action) {
console.log('Unknown Action:', action);
}
actionHandlers['move'] = function(words) {
console.log('You move', words);
}
actionHandlers['attack'] = function(words) {
console.log('You attack', words);
}
You can now break up your actions into discrete functions which you can register with a central actionHandlers variable. This makes adding new commands almost trivial. If you can add some details on why the above approach wouldn't work well for you, let me know and I'll revise the answer.
ArtHare's solution, at least for my use case, blocks background execution, including those started by a promise. While this code isn't elegant, it did block execution of the current function, until the read from stdin completed.
While this code must run from inside an async function, keep in mind that running an async function from a top-level context (directly from a script, not contained within any other function) will block that function until it completes.
Below is a full .js script demonstrating usage, tested with node v8.12.0:
const readline = require('readline');
const sleep = (waitTimeInMs) => new Promise(resolve => setTimeout(resolve, waitTimeInMs));
async function blockReadLine() {
var rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
terminal: false
});
let result = undefined;
rl.on('line', function(line){
result = line;
})
while(!result) await sleep(100);
return result;
}
async function run() {
new Promise(async () => {
while(true) {
console.log("Won't be silenced! Won't be censored!");
await sleep(1000);
}
});
let result = await blockReadLine();
console.log("The result was:" + result);
process.exit(0);
}
run();

How to skip first lines of the file with node-csv parser?

Currently I'm using node-csv (http://www.adaltas.com/projects/node-csv/) for csv file parsing.
Is there a way to skip first few lines of the file before starting to parse the data? As some csv reports for example have report details in the first few lines before the actual headers and data start.
LOG REPORT <- data about the report
DATE: 1.1.1900
DATE,EVENT,MESSAGE <- data headers
1.1.1900,LOG,Hello World! <- actual data stars here
All you need to do to pass argument {from_line: 2}inside parse() function.
like the snippet below
const fs = require('fs');
const parse = require('csv-parse');
fs.createReadStream('path/to/file')
.pipe(parse({ delimiter: ',', from_line: 2 }))
.on('data', (row) => {
// it will start from 2nd row
console.log(row)
})
Assuming you're using v0.4 or greater with the new refactor (i.e. csv-generate, csv-parse, stream-transform, and csv-stringify), you can use the built-in transform to skip the first line, with a bit of extra work.
var fs = require('fs'),
csv = require('csv');
var skipHeader = true; // config option
var read = fs.createReadStream('in.csv'),
write = fs.createWriteStream('out.jsonish'),
parse = csv.parse(),
rowCount = 0, // to keep track of where we are
transform = csv.transform(function(row,cb) {
var result;
if ( skipHeader && rowCount === 0 ) { // if the option is turned on and this is the first line
result = null; // pass null to cb to skip
} else {
result = JSON.stringify(row)+'\n'; // otherwise apply the transform however you want
}
rowCount++; // next time we're not at the first line anymore
cb(null,result); // let node-csv know we're done transforming
});
read
.pipe(parse)
.pipe(transform)
.pipe(write).once('finish',function() {
// done
});
Essentially we track the number of rows that have been transformed and if we're on the very first one (and we in-fact wish to skip the header via skipHeader bool), then pass null to the callback as the second param (first one is always error), otherwise pass the transformed result.
This will also work with synchronous parsing, but requires a change since there are no callback in synchronous mode. Also, the same logic could be applied to the older v0.2 library since it also has row transforming built-in.
See http://csv.adaltas.com/transform/#skipping-and-creating-records
This is pretty easy to apply, and IMO has a pretty low footprint. Usually you want to keep track of rows processed for status purposes, and I almost always transform the result set before sending it to Writable, so it is very simple to just add in the extra logic to check for skipping the header. The added benefit here is that we're using the same module to apply skipping logic as we are to parse/transform - no extra dependencies are needed.
You have two options here:
You can process the file line-by-line. I posted a code snippet in an answer earlier. You can use that
var rl = readline.createInterface({
input: instream,
output: outstream,
terminal: false
});
rl.on('line', function(line) {
console.log(line);
//Do your stuff ...
//Then write to outstream
rl.write(line);
});
You can give an offset to your filestream which will skip those bytes. You can see it in the documentation
fs.createReadStream('sample.txt', {start: 90, end: 99});
This is much easier if you know the offset is fixed.

Catching console.log in node.js?

Is there a way that I can catch eventual console output caused by console.log(...) within node.js to prevent cloggering the terminal whilst unit testing a module?
Thanks
A better way could be to directly hook up the output you to need to catch data of, because with Linus method if some module write directly to stdout with process.stdout.write('foo') for example, it wont be caught.
var logs = [],
hook_stream = function(_stream, fn) {
// Reference default write method
var old_write = _stream.write;
// _stream now write with our shiny function
_stream.write = fn;
return function() {
// reset to the default write method
_stream.write = old_write;
};
},
// hook up standard output
unhook_stdout = hook_stream(process.stdout, function(string, encoding, fd) {
logs.push(string);
});
// goes to our custom write method
console.log('foo');
console.log('bar');
unhook_stdout();
console.log('Not hooked anymore.');
// Now do what you want with logs stored by the hook
logs.forEach(function(_log) {
console.log('logged: ' + _log);
});
EDIT
console.log() ends its output with a newline, you may want to strip it so you'd better write:
_stream.write = function(string, encoding, fd) {
var new_str = string.replace(/\n$/, '');
fn(new_str, encoding, fd);
};
EDIT
Improved, generic way to do this on any method of any object with async support See the gist.
module.js:
module.exports = function() {
console.log("foo");
}
program:
console.log = function() {};
mod = require("./module");
mod();
// Look ma no output!
Edit: Obviously you can collect the log messages for later if you wish:
var log = [];
console.log = function() {
log.push([].slice.call(arguments));
};
capture-console solves this problem nicely.
var capcon = require('capture-console');
var stderr = capcon.captureStderr(function scope() {
// whatever is done in here has stderr captured,
// the return value is a string containing stderr
});
var stdout = capcon.captureStdout(function scope() {
// whatever is done in here has stdout captured,
// the return value is a string containing stdout
});
and later
Intercepting
You should be aware that all capture functions will still pass the values through to the main stdio write() functions, so logging will still go to your standard IO devices.
If this is not desirable, you can use the intercept functions. These functions are literally s/capture/intercept when compared to those shown above, and the only difference is that calls aren't forwarded through to the base implementation.
Simply add the following snippet to your code will let you catch the logs and still print it in the console:
var log = [];
console.log = function(d) {
log.push(d);
process.stdout.write(d + '\n');
};

Resources