NodeJs set readline module speed - node.js

Im reading a text file in NodeJs using readline module.
var lineReader = require('readline').createInterface({
input: require('fs').createReadStream('log.txt')
});
lineReader.on('line', function (line) {
console.log(line);
});
lineReader.on('close', function() {
console.log('Finished!');
});
Is there any way to set the time of the reading?
For example i want to read each line every 5msec.

You can pause the reader stream as soon as you read a line. Then resume it 5ms later. Repeat this till the end of file. Make sure to adjust highWaterMark option to a lower value so that the file reader stream doesn't read multiple lines at once.
var lineReader = require('readline').createInterface({
input: require('fs').createReadStream('./log.txt',{
highWaterMark : 10
})
});
lineReader.on('line', line => {
lineReader.pause(); // pause reader
// Resume 5ms later
setTimeout(()=>{
lineReader.resume();
}, 5)
console.log(line);
});

You can use observables to do this. Here's an example of the kind of buffering I think you want with click events instead of file line events. Not sure if there's a cleaner way to do it that avoids the setInterval though....
let i = 0;
const source = Rx.Observable
.fromEvent(document.querySelector('#container'), 'click')
.controlled();
var subscription =
source.subscribe(() => console.log('was clicked ' + i++));
setInterval(() => source.request(1), 500);
Here's a fiddle and also a link to docs for rx:
https://jsfiddle.net/w6ewg175/
https://github.com/Reactive-Extensions/RxJS/blob/master/doc/api/core/operators/controlled.md

Related

Asynchronous file read reading different number of lines each time, not halting

I built a simple asynchronous implementation of the readlines module built into nodejs, which is simply a wrapper around the event-based module itself. The code is below;
const readline = require('readline');
module.exports = {
createInterface: args => {
let self = {
interface: readline.createInterface(args),
readLine: () => new Promise((succ, fail) => {
if (self.interface === null) {
succ(null);
} else {
self.interface.once('line', succ);
}
}),
hasLine: () => self.interface !== null
};
self.interface.on('close', () => {
self.interface = null;
});
return self;
}
}
Ideally, I would use it like so, in code like this;
const readline = require("./async-readline");
let filename = "bar.txt";
let linereader = readline.createInterface({
input: fs.createReadStream(filename)
});
let lines = 0;
while (linereader.hasLine()) {
let line = await linereader.readLine();
lines++;
console.log(lines);
}
console.log("Finished");
However, i've observed some erratic and unexpected behavior with this async wrapper. For one, it fails to recognize when the file ends, and simply hangs once it reaches the last line, never printing "Finished". And on top of that, when the input file is large, say a couple thousand lines, it's always off by a few lines and doesn't successfully read the full file before halting. in a 2000+ line file it could be off by as many as 20-40 lines. If I throw a print statement into the .on('close' listener, I see that it does trigger; however, the program still doesn't recognize that it should no longer have lines to read.
It seems that in nodejs v11.7, the readline interface was given async iterator functionality and can simply be looped through with a for await ... of loop;
const rl = readline.createInterface({
input: fs.createReadStream(filename);
});
for await (const line of rl) {
console.log(line)
}
How to get synchronous readline, or "simulate" it using async, in nodejs?

Can an event based read function ever run out of order?

Given a situation where I use the nodejs readline library to iterate over each line in the STDIN stream, do some processing on it and write it back out to STDOUT as in the following example:
var rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
terminal: false
});
function my_function(line) {
var output = ...(line);
process.stdout.write(output);
}
rl.on('line', my_function);
I'm concerned that the processing I'm doing will take very different amounts of time depending on the line content so some lines will return very quickly while others takes some time to sort out. Is it possible that my_function() will ever run out of order and hence cause the output stream to be scrambled? Should I be looking into using a synchronous loop of some kind instead of this asynchronous event handler?
The JavaScript execution itself is single-threaded, so as long as you're only performing synchronous operations inside the event handler, there is no problem.
If you are performing asynchronous operations inside the event handler, then it is possible that another 'line' event could be emitted before your asynchronous operation(s) are complete. In that case, you would need to rl.pause() first and then rl.resume() once you are finished with your asynchronous operations. However, this isn't foolproof since 'line' events could still be emitted after a rl.pause() if the current chunk of data read from the input stream had multiple line breaks.
So if you are performing asynchronous operations inside the event handler, you are probably better off just reading from the stream yourself so that you have more control over the parsing behavior. This is actually pretty easy to do, for example:
function parseStream(stream, callback) {
// Assuming all stream data is text and not binary ...
var buffer = '';
var RE_EOL = /\r?\n/g;
stream.on('data', function(data) {
buffer += data;
processBuffer();
});
stream.on('end', callback);
stream.on('error', callback);
function processBuffer() {
var idx = RE_EOL.exec(buffer);
if (~idx) {
// Found a line ending
var line = buffer.slice(0, RE_EOL.index);
buffer = buffer.slice(RE_EOL.index + RE_EOL[0].length);
stream.pause();
callback(null, line, processBuffer);
} else {
stream.resume();
}
}
}
// ...
processStream(process.stdin, function(err, line, done) {
if (err) throw err;
if (line === undefined) {
// No more data will be available (stream ended)
console.log('(Stream ended!)');
return;
}
// Do something with `line`
console.log(line);
// Call `done()` whenever your async operation(s) are all finished
done();
});

Pausing readline in Node.js

Consider the code below ... I am trying to pause the stream after reading the first 5 lines:
var fs = require('fs');
var readline = require('readline');
var stream = require('stream');
var numlines = 0;
var instream = fs.createReadStream("myfile.json");
var outstream = new stream;
var readStream = readline.createInterface(instream, outstream);
readStream.on('line', function(line){
numlines++;
console.log("Read " + numlines + " lines");
if (numlines >= 5) {
console.log("Pausing stream");
readStream.pause();
}
});
The output (copied next) suggests that it keeps reading lines after the pause. Perhaps readline has queued up a few more lines in the buffer, and is feeding them to me anyway ... this would make sense if it continues to read asynchronously in the background, but based on the documentation, I don't know what the proper behavior should be. Any recommendations on how to achieve the desired effect?
Read 1 lines
Read 2 lines
Read 3 lines
Read 4 lines
Read 5 lines
Pausing stream
Read 6 lines
Pausing stream
Read 7 lines
Somewhat unintuitively, the pause methods does not stop queued up line events:
Calling rl.pause() does not immediately pause other events (including 'line') from being emitted by the readline.Interface instance.
There is however a 3rd-party module named line-by-line where pause does pause the line events until it is resumed.
var LineByLineReader = require('line-by-line'),
lr = new LineByLineReader('big_file.txt');
lr.on('error', function (err) {
// 'err' contains error object
});
lr.on('line', function (line) {
// pause emitting of lines...
lr.pause();
// ...do your asynchronous line processing..
setTimeout(function () {
// ...and continue emitting lines.
lr.resume();
}, 100);
});
lr.on('end', function () {
// All lines are read, file is closed now.
});
(I have no affiliation with the module, just found it useful for dealing with this issue.)
So, it turns out that the readline stream tends to "drip" (i.e., leak a few extra lines) even after a pause(). The documentation does not make this clear, but it's true.
If you want the pause() toggle to appear immediate, you'll have to create your own line buffer and accumulate the leftover lines yourself.
add some points:
.on('pause', function() {
console.log(numlines)
})
You will get the 5. It mentioned in the node.js document :
The input stream is not paused and receives the SIGCONT event. (See events SIGTSTP and SIGCONT)
So, I created a tmp buffer in the line event. Use a flag to determine whether it is triggered paused.
.on('line', function(line) {
if (paused) {
putLineInBulkTmp(line);
} else {
putLineInBulk(line);
}
}
then in the on pause, and resume:
.on('pause', function() {
paused = true;
doSomething(bulk, function(resp) {
// clean up bulk for the next.
bulk = [];
// clone tmp buffer.
bulk = clone(bulktmp);
bulktmp = [];
lr.resume();
});
})
.on('resume', () => {
paused = false;
})
Use this way to handle this kind of situation.

Get Data from CSV File in nodejs

I have a csv file having about 10k records. I need to retrieve it one by one in my nodejs app.
The scenario is there is when user clicks button "X" first time, the async request is sent to nodejs app which gets data from first row from CSV file. When he clicks again, it'll show data from second row and it keeps on going.
I tried using fast-csv and lazy but all of them read the complete file. Is their a way I can achieve tihs?
Node comes with a readline module in it's core, allowing you to process a readable stream line by line.
var fs = require("fs"),
readline = require("readline");
var file = "something.csv";
var rl = readline.createInterface({
input: fs.createReadStream(file),
output: null,
terminal: false
})
rl.on("line", function(line) {
console.log("Got line: " + line);
});
rl.on("close", function() {
console.log("All data processed.");
});
I think the module 'split' by dominic tarr will suffice.
It breaks up the stream line by line.
https://npmjs.org/package/split
fs.createReadStream(file)
.pipe(split())
.on('data', function (line) {
//each chunk now is a seperate line!
})

Parse output of spawned node.js child process line by line

I have a PhantomJS/CasperJS script which I'm running from within a node.js script using process.spawn(). Since CasperJS doesn't support require()ing modules, I'm trying to print commands from CasperJS to stdout and then read them in from my node.js script using spawn.stdout.on('data', function(data) {}); in order to do things like add objects to redis/mongoose (convoluted, yes, but seems more straightforward than setting up a web service for this...) The CasperJS script executes a series of commands and creates, say, 20 screenshots which need to be added to my database.
However, I can't figure out how to break the data variable (a Buffer?) into lines... I've tried converting it to a string and then doing a replace, I've tried doing spawn.stdout.setEncoding('utf8'); but nothing seems to work...
Here is what I have right now
var spawn = require('child_process').spawn;
var bin = "casperjs"
//googlelinks.js is the example given at http://casperjs.org/#quickstart
var args = ['scripts/googlelinks.js'];
var cspr = spawn(bin, args);
//cspr.stdout.setEncoding('utf8');
cspr.stdout.on('data', function (data) {
var buff = new Buffer(data);
console.log("foo: " + buff.toString('utf8'));
});
cspr.stderr.on('data', function (data) {
data += '';
console.log(data.replace("\n", "\nstderr: "));
});
cspr.on('exit', function (code) {
console.log('child process exited with code ' + code);
process.exit(code);
});
https://gist.github.com/2131204
Try this:
cspr.stdout.setEncoding('utf8');
cspr.stdout.on('data', function(data) {
var str = data.toString(), lines = str.split(/(\r?\n)/g);
for (var i=0; i<lines.length; i++) {
// Process the line, noting it might be incomplete.
}
});
Note that the "data" event might not necessarily break evenly between lines of output, so a single line might span multiple data events.
I've actually written a Node library for exactly this purpose, it's called stream-splitter and you can find it on Github: samcday/stream-splitter.
The library provides a special Stream you can pipe your casper stdout into, along with a delimiter (in your case, \n), and it will emit neat token events, one for each line it has split out from the input Stream. The internal implementation for this is very simple, and delegates most of the magic to substack/node-buffers which means there's no unnecessary Buffer allocations/copies.
I found a nicer way to do this with just pure node, which seems to work well:
const childProcess = require('child_process');
const readline = require('readline');
const cspr = childProcess.spawn(bin, args);
const rl = readline.createInterface({ input: cspr.stdout });
rl.on('line', line => /* handle line here */)
Adding to maerics' answer, which does not deal properly with cases where only part of a line is fed in a data dump (theirs will give you the first part and the second part of the line individually, as two separate lines.)
var _breakOffFirstLine = /\r?\n/
function filterStdoutDataDumpsToTextLines(callback){ //returns a function that takes chunks of stdin data, aggregates it, and passes lines one by one through to callback, all as soon as it gets them.
var acc = ''
return function(data){
var splitted = data.toString().split(_breakOffFirstLine)
var inTactLines = splitted.slice(0, splitted.length-1)
var inTactLines[0] = acc+inTactLines[0] //if there was a partial, unended line in the previous dump, it is completed by the first section.
acc = splitted[splitted.length-1] //if there is a partial, unended line in this dump, store it to be completed by the next (we assume there will be a terminating newline at some point. This is, generally, a safe assumption.)
for(var i=0; i<inTactLines.length; ++i){
callback(inTactLines[i])
}
}
}
usage:
process.stdout.on('data', filterStdoutDataDumpsToTextLines(function(line){
//each time this inner function is called, you will be getting a single, complete line of the stdout ^^
}) )
You can give this a try. It will ignore any empty lines or empty new line breaks.
cspr.stdout.on('data', (data) => {
data = data.toString().split(/(\r?\n)/g);
data.forEach((item, index) => {
if (data[index] !== '\n' && data[index] !== '') {
console.log(data[index]);
}
});
});
Old stuff but still useful...
I have made a custom stream Transform subclass for this purpose.
See https://stackoverflow.com/a/59400367/4861714
#nyctef's answer uses an official nodejs package.
Here is a link to the documentation: https://nodejs.org/api/readline.html
The node:readline module provides an interface for reading data from a Readable stream (such as process.stdin) one line at a time.
My personal use-case is parsing json output from the "docker watch" command created in a spawned child_process.
const dockerWatchProcess = spawn(...)
...
const rl = readline.createInterface({
input: dockerWatchProcess.stdout,
output: null,
});
rl.on('line', (log: string) => {
console.log('dockerWatchProcess event::', log);
// code to process a change to a docker event
...
});

Resources