Pausing readline in Node.js - node.js

Consider the code below ... I am trying to pause the stream after reading the first 5 lines:
var fs = require('fs');
var readline = require('readline');
var stream = require('stream');
var numlines = 0;
var instream = fs.createReadStream("myfile.json");
var outstream = new stream;
var readStream = readline.createInterface(instream, outstream);
readStream.on('line', function(line){
numlines++;
console.log("Read " + numlines + " lines");
if (numlines >= 5) {
console.log("Pausing stream");
readStream.pause();
}
});
The output (copied next) suggests that it keeps reading lines after the pause. Perhaps readline has queued up a few more lines in the buffer, and is feeding them to me anyway ... this would make sense if it continues to read asynchronously in the background, but based on the documentation, I don't know what the proper behavior should be. Any recommendations on how to achieve the desired effect?
Read 1 lines
Read 2 lines
Read 3 lines
Read 4 lines
Read 5 lines
Pausing stream
Read 6 lines
Pausing stream
Read 7 lines

Somewhat unintuitively, the pause methods does not stop queued up line events:
Calling rl.pause() does not immediately pause other events (including 'line') from being emitted by the readline.Interface instance.
There is however a 3rd-party module named line-by-line where pause does pause the line events until it is resumed.
var LineByLineReader = require('line-by-line'),
lr = new LineByLineReader('big_file.txt');
lr.on('error', function (err) {
// 'err' contains error object
});
lr.on('line', function (line) {
// pause emitting of lines...
lr.pause();
// ...do your asynchronous line processing..
setTimeout(function () {
// ...and continue emitting lines.
lr.resume();
}, 100);
});
lr.on('end', function () {
// All lines are read, file is closed now.
});
(I have no affiliation with the module, just found it useful for dealing with this issue.)

So, it turns out that the readline stream tends to "drip" (i.e., leak a few extra lines) even after a pause(). The documentation does not make this clear, but it's true.
If you want the pause() toggle to appear immediate, you'll have to create your own line buffer and accumulate the leftover lines yourself.

add some points:
.on('pause', function() {
console.log(numlines)
})
You will get the 5. It mentioned in the node.js document :
The input stream is not paused and receives the SIGCONT event. (See events SIGTSTP and SIGCONT)
So, I created a tmp buffer in the line event. Use a flag to determine whether it is triggered paused.
.on('line', function(line) {
if (paused) {
putLineInBulkTmp(line);
} else {
putLineInBulk(line);
}
}
then in the on pause, and resume:
.on('pause', function() {
paused = true;
doSomething(bulk, function(resp) {
// clean up bulk for the next.
bulk = [];
// clone tmp buffer.
bulk = clone(bulktmp);
bulktmp = [];
lr.resume();
});
})
.on('resume', () => {
paused = false;
})
Use this way to handle this kind of situation.

Related

How to properly close a writable stream in Node js?

I'm quite new to javascripts. I'm using node js writable stream to write a .txt file; It works well, but I cannot understand how to properly close the file, as its content is blank as long as the program is running. More in detail I need to read from that .txt file after it has been written, but doing it this way returns an empty buffer.
let myWriteStream = fs.createWriteStream("./filepath.txt");
myWriteStream.write(stringBuffer + "\n");
myWriteStream.on('close', () => {
console.log('close event emitted');
});
myWriteStream.end();
// do things..
let data = fs.readFileSync("./filepath.txt").toString().split("\n");
Seems like the event emitted by the .end() method is triggered after the file reading, causing it to be read as empty. If I put a while() to wait for the event to be triggered, so that I know for sure the stream is closed before the reading, the program waits forever.
Do you have any clue of what I'm doing wrong?
your missing 2 things one test that write is succeed
then you need to wait for stream finish event
const { readFileSync, createWriteStream } = require('fs')
const stringBuffer = Buffer.from(readFileSync('index.js')
)
const filePath = "./filepath.txt"
const myWriteStream = createWriteStream(filePath)
let backPressureTest = false;
while (!backPressureTest) {
backPressureTest = myWriteStream.write(stringBuffer + "\n");
}
myWriteStream.on('close', () => {
console.log('close event emitted');
});
myWriteStream.on('finish', () => {
console.log('finish event emitted');
let data = readFileSync(filePath).toString().split("\n");
console.log(data);
});
myWriteStream.end();

Replay a log file with NodeJS as if it were happening in real-time

I have a log file with about 14.000 aircraft position datapoints captured from a system called Flarm, it looks like this:
{"addr":"A","time":1531919658.578100,"dist":902.98,"alt":385,"vs":-8}
{"addr":"A","time":1531919658.987861,"dist":914.47,"alt":384,"vs":-7}
{"addr":"A","time":1531919660.217471,"dist":925.26,"alt":383,"vs":-7}
{"addr":"A","time":1531919660.623466,"dist":925.26,"alt":383,"vs":-7}
What I need to do is find a way to 'play' this file back in real-time (as if it were occuring right now, even though it's pre-recorded), and emit an event whenever a log entry 'occurs'. The file is not being added to, it's pre-recorded and the playing back would occur at a later stage.
The reason for doing this is that I don't have access to the receiving equipment when I'm developing.
The only way I can think to do it is to set a timeout for every log entry, but that doesn't seem like the right way to do it. Also, this process would have to scale to longer recordings (this one was only an hour long).
Are there other ways of doing this?
If you want to "play them back" with the actual time difference, a setTimeout is pretty much what you have to do.
const processEntry = (entry, index) => {
index++;
const nextEntry = getEntry(index);
if (nextEntry == null) return;
const timeDiff = nextEntry.time - entry.time;
emitEntryEvent(entry);
setTimeout(processEntry, timeDiff, nextEntry, index);
};
processEntry(getEntry(0), 0);
This emits the current entry and then sets a timeout based on the difference until the next entry.
getEntry could either fetch lines from a prefilled array or fetch lines individually based on the index. In the latter case only two lines of data would only be in memory at the same time.
Got it working in the end! setTimeout turned out to be the answer, and combined with the input of Lucas S. this is what I ended up with:
const EventEmitter = require('events');
const fs = require('fs');
const readable = fs.createReadStream("./data/2018-07-18_1509log.json", {
encoding: 'utf8',
fd: null
});
function read_next_line() {
var chunk;
var line = '';
// While this is a thing we can do, assign chunk
while ((chunk = readable.read(1)) !== null) {
// If chunk is a newline character, return the line
if (chunk === '\n'){
return JSON.parse(line);
} else {
line += chunk;
}
}
return false;
}
var lines = [];
var nextline;
const processEntry = () => {
// If lines is empty, read a line
if (lines.length === 0) lines.push(read_next_line());
// Quit here if we've reached the last line
if ((nextline = read_next_line()) == false) return true;
// Else push the just read line into our array
lines.push(nextline);
// Get the time difference in milliseconds
var delay = Number(lines[1].time - lines[0].time) * 1000;
// Remove the first line
lines.shift();
module.exports.emit('data', lines[0]);
// Repeat after the calculated delay
setTimeout(processEntry, delay);
}
var ready_to_start = false;
// When the stream becomes readable, allow starting
readable.on('readable', function() {
ready_to_start = true;
});
module.exports = new EventEmitter;
module.exports.start = function() {
if (ready_to_start) processEntry();
if (!ready_to_start) return false;
}
Assuming you want to visualize the flight logs, you can use fs watch as below, to watch the log file for changes:
fs.watch('somefile', function (event, filename) {
console.log('event is: ' + event);
if (filename) {
console.log('filename provided: ' + filename);
} else {
console.log('filename not provided');
}
});
Code excerpt is from here. For more information on fs.watch() check out here
Then, for seamless update on frontend, you can setup a Websocket to your server where you watch the log file and send newly added row via that socket to frontend.
After you get the data in frontend you can visualize it there. While I haven't done any flight visualization project before, I've used D3js to visualize other stuff (sound, numerical data, metric analysis and etc.) couple of times and it did the job every time.

Can an event based read function ever run out of order?

Given a situation where I use the nodejs readline library to iterate over each line in the STDIN stream, do some processing on it and write it back out to STDOUT as in the following example:
var rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
terminal: false
});
function my_function(line) {
var output = ...(line);
process.stdout.write(output);
}
rl.on('line', my_function);
I'm concerned that the processing I'm doing will take very different amounts of time depending on the line content so some lines will return very quickly while others takes some time to sort out. Is it possible that my_function() will ever run out of order and hence cause the output stream to be scrambled? Should I be looking into using a synchronous loop of some kind instead of this asynchronous event handler?
The JavaScript execution itself is single-threaded, so as long as you're only performing synchronous operations inside the event handler, there is no problem.
If you are performing asynchronous operations inside the event handler, then it is possible that another 'line' event could be emitted before your asynchronous operation(s) are complete. In that case, you would need to rl.pause() first and then rl.resume() once you are finished with your asynchronous operations. However, this isn't foolproof since 'line' events could still be emitted after a rl.pause() if the current chunk of data read from the input stream had multiple line breaks.
So if you are performing asynchronous operations inside the event handler, you are probably better off just reading from the stream yourself so that you have more control over the parsing behavior. This is actually pretty easy to do, for example:
function parseStream(stream, callback) {
// Assuming all stream data is text and not binary ...
var buffer = '';
var RE_EOL = /\r?\n/g;
stream.on('data', function(data) {
buffer += data;
processBuffer();
});
stream.on('end', callback);
stream.on('error', callback);
function processBuffer() {
var idx = RE_EOL.exec(buffer);
if (~idx) {
// Found a line ending
var line = buffer.slice(0, RE_EOL.index);
buffer = buffer.slice(RE_EOL.index + RE_EOL[0].length);
stream.pause();
callback(null, line, processBuffer);
} else {
stream.resume();
}
}
}
// ...
processStream(process.stdin, function(err, line, done) {
if (err) throw err;
if (line === undefined) {
// No more data will be available (stream ended)
console.log('(Stream ended!)');
return;
}
// Do something with `line`
console.log(line);
// Call `done()` whenever your async operation(s) are all finished
done();
});

NodeJs set readline module speed

Im reading a text file in NodeJs using readline module.
var lineReader = require('readline').createInterface({
input: require('fs').createReadStream('log.txt')
});
lineReader.on('line', function (line) {
console.log(line);
});
lineReader.on('close', function() {
console.log('Finished!');
});
Is there any way to set the time of the reading?
For example i want to read each line every 5msec.
You can pause the reader stream as soon as you read a line. Then resume it 5ms later. Repeat this till the end of file. Make sure to adjust highWaterMark option to a lower value so that the file reader stream doesn't read multiple lines at once.
var lineReader = require('readline').createInterface({
input: require('fs').createReadStream('./log.txt',{
highWaterMark : 10
})
});
lineReader.on('line', line => {
lineReader.pause(); // pause reader
// Resume 5ms later
setTimeout(()=>{
lineReader.resume();
}, 5)
console.log(line);
});
You can use observables to do this. Here's an example of the kind of buffering I think you want with click events instead of file line events. Not sure if there's a cleaner way to do it that avoids the setInterval though....
let i = 0;
const source = Rx.Observable
.fromEvent(document.querySelector('#container'), 'click')
.controlled();
var subscription =
source.subscribe(() => console.log('was clicked ' + i++));
setInterval(() => source.request(1), 500);
Here's a fiddle and also a link to docs for rx:
https://jsfiddle.net/w6ewg175/
https://github.com/Reactive-Extensions/RxJS/blob/master/doc/api/core/operators/controlled.md

Balancing slow I/O in a fast stream read stream

In node.js I have a read stream that I wish to reformat and write to a database. As the read stream is fast and the write is slow the node.js queue could be overwhelmed as the queue of writes builds up (assume the stream is gb's of data). How do I force the read to wait for the write part of the code so this does not happen without blocking ?
var request = http.get({
host: 'api.geonames.org',
port: 80,
path: '/children?' + qs.stringify({
geonameId: geonameId,
username: "demo"
})
}).on('response', function(response) {
response.setEncoding('utf8');
var xml = new XmlStream(response, 'utf8');
xml.on('endElement: geoname ', function(input) {
console.log('geoname');
var output = new Object();
output.Name = input.name;
output.lat = input.lat;
output.lng = input.lng;
output._key = input.geonameId;
data.db.document.create(output, data.doc, function(callback){
//this is really slow.
}
// i do not want to return from here and receive more data until the 'create' above has completed
});
});
I just ran into this problem last night, and in my hackathon induced sleep deprived state, here is how I solved it:
I would increment a counter whenever I sent a job out to be processed, and decremented the counter when the operation completed. To keep the outbound traffic from overwhelming the other service, I would pause the stream when there was a certain number of pending outbound requests. The code is very similar to the following.
var instream = fs.createReadStream('./combined.csv');
var outstream = new stream;
var inProcess = 0;
var paused = false;
var rl = readline.createInterface(instream, outstream);
rl.on('line', function(line) {
inProcess++;
if(inProcess > 100) {
console.log('pausing input to clear queue');
rl.pause();
paused = true;
}
someService.doSomethingSlow(line, function() {
inProcess--;
if(paused && inProcess < 10) {
console.log('resuming stream');
paused = false;
rl.resume();
}
if (err) throw err;
});
});
rl.on('end', function() {
rl.close();
});
Not the most elegant solution, but it worked and allowed me to process the million+ lines without running out of memory or throttling the other service.
My solution simply extends an empty stream.Writable and is fundamentally identical to #Timothy's, but uses events and
doesn't rely on Streams1 .pause() and .resume() (which didn't seem to be having any effect on my data pipeline,
anyway).
var stream = require("stream");
var liveRequests = 0;
var maxLiveRequests = 100;
var streamPaused = false;
var requestClient = new stream.Writable();
function requestCompleted(){
liveRequests--;
if(streamPaused && liveRequests < maxLiveRequests){
streamPaused = false;
requestClient.emit("resumeStream");
}
}
requestClient._write = function (data, enc, next){
makeRequest(data, requestCompleted);
liveRequests++;
if(liveRequests >= maxLiveRequests){
streamPaused = true;
requestClient.once("resumeStream", function resume(){
next();
});
}
else {
next();
}
};
A counter, liveRequests, keeps track of the number of concurrent requests, and is incremented whenever
makeRequest() is called and decremented when it completes (ie, when requestCompleted()) is called. If a request has
just been made and liveRequests exceeds maxLiveRequests, we pause the stream with streamPaused. If a request
completes, the stream is paused, and liveRequests is now less than maxLiveRequests, we can resume the stream. Since
subsequent data items are read by _write() when its next() callback is called, we can simply defer the latter with
an event-listener on our custom "resumeStream" event, which mimics pausing/resuming.
Now, simply readStream.pipe(requestClient).
Edit: I abstracted this solution, along with automatic batching of input data, in a package.

Resources