How to do `tail -f logfile.txt`-like processing in node.js? - node.js

tail -f logfile.txt outputs the last 10 lines of logfile.txt, and then continues to output appended data as the file grows.
What's the recommended way of doing the -f part in node.js?
The following outputs the entire file (ignoring the "show the last 10 lines") and then exits.
var fs = require('fs');
var rs = fs.createReadStream('logfile.txt', { flags: 'r', encoding: 'utf8'});
rs.on('data', function(data) {
console.log(data);
});
I understand the event-loop is exiting because after the stream end & close event there are no more events -- I'm curious about the best way of continuing to monitor the stream.

The canonical way to do this is with fs.watchFile.
Alternatively, you could just use the node-tail module, which uses fs.watchFile internally and has already done the work for you. Here is an example of using it straight from the documentation:
Tail = require('tail').Tail;
tail = new Tail("fileToTail");
tail.on("line", function(data) {
console.log(data);
});

node.js APi documentation on fs.watchFile states:
Stability: 2 - Unstable. Use fs.watch instead, if available.
Funny though that it says almost the exact same thing for fs.watch:
Stability: 2 - Unstable. Not available on all platforms.
In any case, I went ahead and did yet another small webapp, TailGate, that will tail your files using the fs.watch variant.
Feel free to check it out here:
TailGate on github.

you can try to use fs.read instead of ReadStream
var fs = require('fs')
var buf = new Buffer(16);
buf.fill(0);
function read(fd)
{
fs.read(fd, buf, 0, buf.length, null, function(err, bytesRead, buf1) {
console.log(buf1.toString());
if (bytesRead != 0) {
read(fd);
} else {
setTimeout(function() {
read(fd);
}, 1000);
}
});
}
fs.open('logfile', 'r', function(err, fd) {
read(fd);
});
Note that read calls callback even if there is no data and it just reached end of file. Without timeout you'll get 100% cpu here. You could try to use fs.watchFile to get new data immediately.

Substack has a file slice module that behaves exactly like tail -f, slice-file can stream updates after the initial slice of 10 lines.
var sf = require('slice-file');
var xs = sf('/var/log/mylogfile.txt');
xs.follow(-10).pipe(process.stdout);
Source: https://github.com/substack/slice-file#follow

https://github.com/jandre/always-tail seems a great option if you have to worry about log rotating, example from the readme:
var Tail = require('always-tail');
var fs = require('fs');
var filename = "/tmp/testlog";
if (!fs.existsSync(filename)) fs.writeFileSync(filename, "");
var tail = new Tail(filename, '\n');
tail.on('line', function(data) {
console.log("got line:", data);
});
tail.on('error', function(data) {
console.log("error:", data);
});
tail.watch();

Related

NodeJS - read and write file causes corruption

I'm kinda new to NodeJS and I'm working on a simple file encoder.
I planned to change the very first 20kb of a file and just copy the rest of it.
So I used the following code, but it changed some bytes in the rest of the file.
Here is my code:
var fs = require('fs');
var config = require('./config');
fs.open(config.encodeOutput, 'w', function(err, fw) {
if(err) {
console.log(err);
} else {
fs.readFile(config.source, function(err, data) {
var start = 0;
var buff = readChunk(data, start);
while(buff.length) {
if(start < config.encodeSize) {
var buffer = makeSomeChanges(buff);
writeChunk(fw, buffer);
} else {
writeChunk(fw, buff);
}
start += config.ENCODE_BUFFER_SIZE;
buff = readChunk(data, start);
}
});
}
});
function readChunk(buffer, start) {
return buffer.slice(start, start + config.ENCODE_BUFFER_SIZE);
}
function writeChunk(fd, chunk) {
fs.writeFile(fd, chunk, {encoding: 'binary', flag: 'a'});
}
I opened encoded file and compared it with the original file.
I even commented these parts:
//if(start < config.encodeSize) {
// var buffer = makeSomeChanges(buff);
// writeChunk(fw, buffer);
//} else {
writeChunk(fw, buff);
//}
So my program just copies the file, but it still changes some bytes.
What is wrong?
So I checked the pattern and I realized some bytes are not in the right place and I guessed that it should be because I'm using async write function.
I changed fs.writeFile() to fs.writeFileSync() and everything is working fine now.
Since you were using asynchronous IO, you should've been waiting for a queue of operations, as multiple writes happening at the same time are likely to end up corrupting your file. This explains why your issue is solved using synchronous IO — this way, a further write cannot start before the previous one completed.
However, using synchronous APIs when asynchronous ones are available is a poor choice, due to which your program will be actually blocked while it writes to the file. You should go for async and create a queue to wait for.

How to reset nodejs stream?

How to reset nodejs stream?
How to read stream again in nodejs?
Thanks in advance!
var fs = require('fs');
var lineReader = require('line-reader');
// proxy.txt = only 3 lines
var readStream = fs.createReadStream('proxy.txt');
lineReader.open(readStream, function (err, reader) {
for(var i=0; i<6; i++) {
reader.nextLine(function(err, line) {
if(err) {
readStream.reset(); // ???
} else {
console.log(line);
}
});
}
});
There are two ways of solving your problem, as someone commented before you could simply wrap all that in a function and instead of resetting - simply read the file again.
Ofc this won't work well with HTTP requests for example so the other way, provided that you do take a much bigger memory usage into account, you can simply accumulate your data.
What you'd need is to implement some sort of "rewindable stream" - this means that you'd essentially need to implement a Transform stream that would keep a list of all the buffers and write them to a piped stream on a rewind method.
Take a look at the node API for streams here, the methods should look somewhat like this.
class Rewindable extends Transform {
constructor() {
super();
this.accumulator = [];
}
_transform(buf, enc, cb) {
this.accumulator.push(buf);
callback()
}
rewind() {
var stream = new PassThrough();
this.accumulator.forEach((chunk) => stream.write(chunk))
return stream;
}
And you would use this like this:
var readStream = fs.createReadStream('proxy.txt');
var rewindableStream = readStream.pipe(new Rewindable());
(...).on("whenerver-you-want-to-reset", () => {
var rewound = rewindablesteram.rewind();
/// and do whatever you like with your stream.
});
Actually I think I'll add this to my scramjet. :)
Edit
I released the logic below in rereadable-stream npm package. The upshot over the stream depicted here is that you can now control the buffer length and get rid of the read data.
At the same time you can keep a window of count items and tail a number of chunks backwards.

What is better async style in these two Node.js examples?

I am working through nodeschool.io learnyounode and on the fourth challenge, which is:
Write a program that uses a single asynchronous filesystem operation to read a file and print the number of newlines it contains to the console (stdout), similar to running cat file | wc -l.
I wrote one solution, which is different than the solution provided, but both seem to work, and I am curious to know which would be better style, and how they might function differently.
Here is my solution:
var fs = require('fs');
var fileAsArray = [];
function asyncRead(print) {
fs.readFile(process.argv[2], 'utf-8', function callback(error, contents) {
fileAsArray = contents.split('\n');
print();
})
}
function printNumberOfLines() {
console.log(fileAsArray.length - 1);
}
asyncRead(printNumberOfLines);
And here is the solution provided by learnyounode:
var fs = require('fs')
var file = process.argv[2]
fs.readFile(file, function (err, contents) {
// fs.readFile(file, 'utf8', callback) can also be used
var lines = contents.toString().split('\n').length - 1
console.log(lines)
})
I also noticed that the learnyounode code lacks semicolons. I thought they were strongly recommended/required?

non-blocking way to write to filesystem with node.js

I've written a non-blocking tcp-server with node.js. This server listens on a port and reroutes the request to an other server via a http.request()
To have a back-log of the messages rerouted I want to append every message (single line of information) in a file with the date as filename.
The server is going to be hit by several devices on alternating intervals with small txt strings (800bytes). Writing to the filesystem implicitly calls for a blocking event. Is there a way to prevent this behavior??
If appendFile doesn't work out right, I have myself tested a solution for this using File streams that works with multiple clusters and won't clobber the output
Just use the asynchronous methods of the fs module like appendFile.
http://nodejs.org/api/fs.html#fs_fs_appendfile_filename_data_encoding_utf8_callback
Something like this might help.
var fs = require('fs');
var writer = {
files: {},
appendFile: function(path, data) {
if(this.files[path] === undefined) {
this.files[path] = {open: false, queue: []};
}
this.files[path].queue.push(data);
if(!this.files[path].open) {
this.files[path].open = true;
this.nextWrite(path);
}
},
nextWrite: function(path) {
var data = this.files[path].queue.shift(),
self = this;
if(data === undefined)
return this.files[path].open = false;
fs.appendFile(path, data, function(err) {
if (err) throw err;
self.nextWrite(path);
});
}
}
It requires version 0.8.0 of node for fs.appendFile, but it keeps a queue per file and then appends the things in the order they were added. It works, but I didn't spent very much time on it.. so use it for educational purposes only.
writer.appendFile('test.txt','hello');

Parse output of spawned node.js child process line by line

I have a PhantomJS/CasperJS script which I'm running from within a node.js script using process.spawn(). Since CasperJS doesn't support require()ing modules, I'm trying to print commands from CasperJS to stdout and then read them in from my node.js script using spawn.stdout.on('data', function(data) {}); in order to do things like add objects to redis/mongoose (convoluted, yes, but seems more straightforward than setting up a web service for this...) The CasperJS script executes a series of commands and creates, say, 20 screenshots which need to be added to my database.
However, I can't figure out how to break the data variable (a Buffer?) into lines... I've tried converting it to a string and then doing a replace, I've tried doing spawn.stdout.setEncoding('utf8'); but nothing seems to work...
Here is what I have right now
var spawn = require('child_process').spawn;
var bin = "casperjs"
//googlelinks.js is the example given at http://casperjs.org/#quickstart
var args = ['scripts/googlelinks.js'];
var cspr = spawn(bin, args);
//cspr.stdout.setEncoding('utf8');
cspr.stdout.on('data', function (data) {
var buff = new Buffer(data);
console.log("foo: " + buff.toString('utf8'));
});
cspr.stderr.on('data', function (data) {
data += '';
console.log(data.replace("\n", "\nstderr: "));
});
cspr.on('exit', function (code) {
console.log('child process exited with code ' + code);
process.exit(code);
});
https://gist.github.com/2131204
Try this:
cspr.stdout.setEncoding('utf8');
cspr.stdout.on('data', function(data) {
var str = data.toString(), lines = str.split(/(\r?\n)/g);
for (var i=0; i<lines.length; i++) {
// Process the line, noting it might be incomplete.
}
});
Note that the "data" event might not necessarily break evenly between lines of output, so a single line might span multiple data events.
I've actually written a Node library for exactly this purpose, it's called stream-splitter and you can find it on Github: samcday/stream-splitter.
The library provides a special Stream you can pipe your casper stdout into, along with a delimiter (in your case, \n), and it will emit neat token events, one for each line it has split out from the input Stream. The internal implementation for this is very simple, and delegates most of the magic to substack/node-buffers which means there's no unnecessary Buffer allocations/copies.
I found a nicer way to do this with just pure node, which seems to work well:
const childProcess = require('child_process');
const readline = require('readline');
const cspr = childProcess.spawn(bin, args);
const rl = readline.createInterface({ input: cspr.stdout });
rl.on('line', line => /* handle line here */)
Adding to maerics' answer, which does not deal properly with cases where only part of a line is fed in a data dump (theirs will give you the first part and the second part of the line individually, as two separate lines.)
var _breakOffFirstLine = /\r?\n/
function filterStdoutDataDumpsToTextLines(callback){ //returns a function that takes chunks of stdin data, aggregates it, and passes lines one by one through to callback, all as soon as it gets them.
var acc = ''
return function(data){
var splitted = data.toString().split(_breakOffFirstLine)
var inTactLines = splitted.slice(0, splitted.length-1)
var inTactLines[0] = acc+inTactLines[0] //if there was a partial, unended line in the previous dump, it is completed by the first section.
acc = splitted[splitted.length-1] //if there is a partial, unended line in this dump, store it to be completed by the next (we assume there will be a terminating newline at some point. This is, generally, a safe assumption.)
for(var i=0; i<inTactLines.length; ++i){
callback(inTactLines[i])
}
}
}
usage:
process.stdout.on('data', filterStdoutDataDumpsToTextLines(function(line){
//each time this inner function is called, you will be getting a single, complete line of the stdout ^^
}) )
You can give this a try. It will ignore any empty lines or empty new line breaks.
cspr.stdout.on('data', (data) => {
data = data.toString().split(/(\r?\n)/g);
data.forEach((item, index) => {
if (data[index] !== '\n' && data[index] !== '') {
console.log(data[index]);
}
});
});
Old stuff but still useful...
I have made a custom stream Transform subclass for this purpose.
See https://stackoverflow.com/a/59400367/4861714
#nyctef's answer uses an official nodejs package.
Here is a link to the documentation: https://nodejs.org/api/readline.html
The node:readline module provides an interface for reading data from a Readable stream (such as process.stdin) one line at a time.
My personal use-case is parsing json output from the "docker watch" command created in a spawned child_process.
const dockerWatchProcess = spawn(...)
...
const rl = readline.createInterface({
input: dockerWatchProcess.stdout,
output: null,
});
rl.on('line', (log: string) => {
console.log('dockerWatchProcess event::', log);
// code to process a change to a docker event
...
});

Resources