Nodejs read file line by line and accumulate results in global object

Nodejs read file line by line and accumulate results in global object - node.js

Classic embarrassing newbie question. Why doesn't my store variable contain any results? I know it is accumulating results along the way. I also know enough about nodejs to know it has to do with promises, single-threadedness, etc.
var readline = require('readline');
var fs = require("fs");
var pathToFile = '/scratch/myData.csv';
var rd = readline.createInterface({
input: fs.createReadStream(pathToFile),
// output: process.stdout,
console: false
});
var store = {};
rd.on('line', function(line) {
store[line] = 1;
// console.log (`store is now: ${JSON.stringify (store)}`);
});
console.log (`store is now: ${JSON.stringify (store)}`);

This has nothing to do with Promises, (Although, you can promisify it, if you like).
As you said, it is accumulating the results line by line, but this is hapening inside the scope of the callback function.
And if you want to make use of the data, you will have to call another function inside the callback function when the last line is called, (or maybe listen to a different event).
Something like the following:
var store = {};
rd.on('line', function(line) {
store[line] = 1;
// console.log (`store is now: ${JSON.stringify (store)}`);
})
.on('close', function() {
myFunc(store);
});
function myFunc(store){
console.log (`store is now: ${JSON.stringify (store)}`);
}

Related

NodeJS calling readline within a function

I'm learning NodeJS and I encounter a basic issue. I'm trying to read a file line by line, and for each line I read to send an HTTP request to / + <the line> e.g.:
wlist.txt contents
line
line2
failed attempt:
const request = require('request') // for http request later
const readline = require('readline')
const fs = require('fs')
function fileLoader() {
const readInterface = readline.createInterface({
input: fs.createReadStream('C:\\etc\\code\\req\\wlist.txt'),
});
readInterface.on('line', function(line) {
return "test";
});
}
var aba = fileLoader();
console.log(aba); // undefined
My logic inserting fileLoader as a function and not "as is" is that I later on have a switch case that uses the file load to different purposes such as XML request or JSON request.. lets say:
switch (myArgs[0]) {
case 'json':
let myJSON = {username: 'val'};
request({
url: "http://192.168.1.2:3000",
method: "POST",
json: true,
body: myJSON
}, function (error, response, body){
console.log(response.headers)
console.log(response.body)
});
break;
case 'xml': .....
I'm fully aware theres something I missing, probably regarding async / promises or anything, but to really educate, may someone please go easy on me and show me the way? I've tried everything and just can't get a grasp of whats the problem..

I believe you would like to do something like this:
https://gist.github.com/EB-BartVanVliet/533d55eb17c97f2a12ed25f479786f4a
Essentially what I do is:
Parse the file, look for empty lines and remove those
I declare a async start function so that I can use await inside the for loop
Log the output

You can do simple like this:
var sendRequest = function (input) {
// Do whatever you want here
}
var lineReader = require('readline').createInterface({
input: require('fs').createReadStream('path_to_your_file')
});
lineReader.on('line', function (line) {
console.log('Line from file:', line);
sendRequest(line);
});

readline is asynchronous so chances are that console.log is being called before fileLoader has finished. Try using readline-sync if you are happy to block whilst the file is read.
Otherwise you should re-write so the the on('line',...) method performs the action you want to take with the line as it is read. (I think this is what you want - "read a file line by line, and for each line I read to send an HTTP request"). E.g.
on('line', (input) => { /* perform send http stuff/call function to do it */ } );
Or, if you only want to act when the whole file is read, you'll have to re-structure so that the file-read is wrapped in a promise (or use async/await).

nodejs read multiple csv files counting lines, and produce and overall tally at the end

I have the following Nodejs code. My intention is to produce a single count of of all lines in all the files. However, when I run this code, I just receive a count of the smallest file.
I think I understand why. All the 6 files reads get launched in quick succession and naturally, the shortest files finishes first, and doesn't wait for all the other tallies to complete.
My question is: What's the best Nodejs approach to this problem? In real life, I want to do a more complex operation than incrementing the counter each time, but this gets across the idea.
Should I use promises somehow to do this, or perhaps key off of some other kind of event?
var fs = require("fs");
var readline = require('readline');
var TOTAL_LINES = 0;
allCSVFiles = ["a", "b", "c", "d", "e", "f"];
allCSVFiles.forEach(function(file, idx, array){
var pathToFile = `/scratch/testDir/${file}.csv`;
var rd = readline.createInterface({
input: fs.createReadStream(pathToFile),
// output: process.stdout,
console: false
});
rd.on('line', function(line) {
TOTAL_LINES++;
})
.on('close', function() {
console.log (`closing: ${pathToFile}`);
if (idx === array.length - 1){
console.log (`Grand total: ${TOTAL_LINES}`);
}
})
});

Yes, you can use promise to do async reading of files. Due to the async nature of Node.js, simply using fs.readFile would result in all files processed asynchronously.
Ref: http://www.yaoyuyang.com/2017/01/20/nodejs-batch-file-processing.html
This example shows create a totalsummary empty file, then how to keep on appending to a file for each promise completion. In your case using promise before appending to the target summary file, read the existing file content to capture the previous line count, then do a sum and update the file based on the aggregated total.
Recommendation: If you have a long running computation, you should start another process (using child_process creation) for Parallel processing and then just have your node.js process asynchronously wait for results.
Ref: Parallelizing tasks in Node.js
Best way to execute parallel processing in Node.js
So please explain your use case.

Ok, I think I have an answer to my own question. Please feel free to critique it.
var fs = require("fs");
var readline = require('readline');
var TOTAL_LINES = 0;
var allMyPromises = [];
allCSVFiles = ["a", "b", "c", "d", "e", "f"];
allCSVFiles.forEach(function(file, idx, array){
var myPromise = readOneFile (file, idx, array);
allMyPromises.push (myPromise);
});
Promise.all(allMyPromises).then(function(values) {
console.log (`Grand total: ${TOTAL_LINES}`);
});
function readOneFile(file,idx, array){
return new Promise(function(resolve, reject) {
var pathToFile = `/scratch/testDir/${file}.csv`;
var rd = readline.createInterface({
input: fs.createReadStream(pathToFile),
// output: process.stdout,
console: false
});
rd.on('line', function(line) {
TOTAL_LINES++;
})
.on('close', function() {
console.log (`closing: ${pathToFile}`);
resolve (TOTAL_LINES);
})
}
)
}

Is it possible to register multiple listeners to a child process's stdout data event? [duplicate]

I need to run two commands in series that need to read data from the same stream.
After piping a stream into another the buffer is emptied so i can't read data from that stream again so this doesn't work:
var spawn = require('child_process').spawn;
var fs = require('fs');
var request = require('request');
var inputStream = request('http://placehold.it/640x360');
var identify = spawn('identify',['-']);
inputStream.pipe(identify.stdin);
var chunks = [];
identify.stdout.on('data',function(chunk) {
chunks.push(chunk);
});
identify.stdout.on('end',function() {
var size = getSize(Buffer.concat(chunks)); //width
var convert = spawn('convert',['-','-scale',size * 0.5,'png:-']);
inputStream.pipe(convert.stdin);
convert.stdout.pipe(fs.createWriteStream('half.png'));
});
function getSize(buffer){
return parseInt(buffer.toString().split(' ')[2].split('x')[0]);
}
Request complains about this
Error: You cannot pipe after data has been emitted from the response.
and changing the inputStream to fs.createWriteStream yields the same issue of course.
I don't want to write into a file but reuse in some way the stream that request produces (or any other for that matter).
Is there a way to reuse a readable stream once it finishes piping?
What would be the best way to accomplish something like the above example?

You have to create duplicate of the stream by piping it to two streams. You can create a simple stream with a PassThrough stream, it simply passes the input to the output.
const spawn = require('child_process').spawn;
const PassThrough = require('stream').PassThrough;
const a = spawn('echo', ['hi user']);
const b = new PassThrough();
const c = new PassThrough();
a.stdout.pipe(b);
a.stdout.pipe(c);
let count = 0;
b.on('data', function (chunk) {
count += chunk.length;
});
b.on('end', function () {
console.log(count);
c.pipe(process.stdout);
});
Output:
8
hi user

The first answer only works if streams take roughly the same amount of time to process data. If one takes significantly longer, the faster one will request new data, consequently overwriting the data still being used by the slower one (I had this problem after trying to solve it using a duplicate stream).
The following pattern worked very well for me. It uses a library based on Stream2 streams, Streamz, and Promises to synchronize async streams via a callback. Using the familiar example from the first answer:
spawn = require('child_process').spawn;
pass = require('stream').PassThrough;
streamz = require('streamz').PassThrough;
var Promise = require('bluebird');
a = spawn('echo', ['hi user']);
b = new pass;
c = new pass;
a.stdout.pipe(streamz(combineStreamOperations));
function combineStreamOperations(data, next){
Promise.join(b, c, function(b, c){ //perform n operations on the same data
next(); //request more
}
count = 0;
b.on('data', function(chunk) { count += chunk.length; });
b.on('end', function() { console.log(count); c.pipe(process.stdout); });

You can use this small npm package I created:
readable-stream-clone
With this you can reuse readable streams as many times as you need

For general problem, the following code works fine
var PassThrough = require('stream').PassThrough
a=PassThrough()
b1=PassThrough()
b2=PassThrough()
a.pipe(b1)
a.pipe(b2)
b1.on('data', function(data) {
console.log('b1:', data.toString())
})
b2.on('data', function(data) {
console.log('b2:', data.toString())
})
a.write('text')

I have a different solution to write to two streams simultaneously, naturally, the time to write will be the addition of the two times, but I use it to respond to a download request, where I want to keep a copy of the downloaded file on my server (actually I use a S3 backup, so I cache the most used files locally to avoid multiple file transfers)
/**
* A utility class made to write to a file while answering a file download request
*/
class TwoOutputStreams {
constructor(streamOne, streamTwo) {
this.streamOne = streamOne
this.streamTwo = streamTwo
}
setHeader(header, value) {
if (this.streamOne.setHeader)
this.streamOne.setHeader(header, value)
if (this.streamTwo.setHeader)
this.streamTwo.setHeader(header, value)
}
write(chunk) {
this.streamOne.write(chunk)
this.streamTwo.write(chunk)
}
end() {
this.streamOne.end()
this.streamTwo.end()
}
}
You can then use this as a regular OutputStream
const twoStreamsOut = new TwoOutputStreams(fileOut, responseStream)
and pass it to to your method as if it was a response or a fileOutputStream

If you have async operations on the PassThrough streams, the answers posted here won't work.
A solution that works for async operations includes buffering the stream content and then creating streams from the buffered result.
To buffer the result you can use concat-stream
const Promise = require('bluebird');
const concat = require('concat-stream');
const getBuffer = function(stream){
return new Promise(function(resolve, reject){
var gotBuffer = function(buffer){
resolve(buffer);
}
var concatStream = concat(gotBuffer);
stream.on('error', reject);
stream.pipe(concatStream);
});
}
To create streams from the buffer you can use:
const { Readable } = require('stream');
const getBufferStream = function(buffer){
const stream = new Readable();
stream.push(buffer);
stream.push(null);
return Promise.resolve(stream);
}

What about piping into two or more streams not at the same time ?
For example :
var PassThrough = require('stream').PassThrough;
var mybiraryStream = stream.start(); //never ending audio stream
var file1 = fs.createWriteStream('file1.wav',{encoding:'binary'})
var file2 = fs.createWriteStream('file2.wav',{encoding:'binary'})
var mypass = PassThrough
mybinaryStream.pipe(mypass)
mypass.pipe(file1)
setTimeout(function(){
mypass.pipe(file2);
},2000)
The above code does not produce any errors but the file2 is empty

How to get callback value in node.js in a variable

this is my code.
var fs = require('fs')
var test = readafile('file.txt', function(returnValue) {
console.log(returnValue);
test = returnValue;
});
console.log(test);
function readafile(filepath,callback){
var attachment_path = filepath;
fs.readFile(attachment_path, function(err,data){
var attachment_encoded = new Buffer(data, 'binary').toString('base64');
callback(attachment_encoded);
});
}
In that if i need that return value of that function in variable test means how to achieve that ?
In that console.log(test) it says undefined.
since it is a callback function.
How to get it properly ?

You can't really expect getting a synchronous behavior (like getting a return value) with asynchronous code. You can use fs.readFileSync to avoid the asynchronous aspect or just use your value inside your callback.
Otherwise the async module could help you out.

Catching console.log in node.js?

Is there a way that I can catch eventual console output caused by console.log(...) within node.js to prevent cloggering the terminal whilst unit testing a module?
Thanks

A better way could be to directly hook up the output you to need to catch data of, because with Linus method if some module write directly to stdout with process.stdout.write('foo') for example, it wont be caught.
var logs = [],
hook_stream = function(_stream, fn) {
// Reference default write method
var old_write = _stream.write;
// _stream now write with our shiny function
_stream.write = fn;
return function() {
// reset to the default write method
_stream.write = old_write;
};
},
// hook up standard output
unhook_stdout = hook_stream(process.stdout, function(string, encoding, fd) {
logs.push(string);
});
// goes to our custom write method
console.log('foo');
console.log('bar');
unhook_stdout();
console.log('Not hooked anymore.');
// Now do what you want with logs stored by the hook
logs.forEach(function(_log) {
console.log('logged: ' + _log);
});
EDIT
console.log() ends its output with a newline, you may want to strip it so you'd better write:
_stream.write = function(string, encoding, fd) {
var new_str = string.replace(/\n$/, '');
fn(new_str, encoding, fd);
};
EDIT
Improved, generic way to do this on any method of any object with async support See the gist.

module.js:
module.exports = function() {
console.log("foo");
}
program:
console.log = function() {};
mod = require("./module");
mod();
// Look ma no output!
Edit: Obviously you can collect the log messages for later if you wish:
var log = [];
console.log = function() {
log.push([].slice.call(arguments));
};

capture-console solves this problem nicely.
var capcon = require('capture-console');
var stderr = capcon.captureStderr(function scope() {
// whatever is done in here has stderr captured,
// the return value is a string containing stderr
});
var stdout = capcon.captureStdout(function scope() {
// whatever is done in here has stdout captured,
// the return value is a string containing stdout
});
and later
Intercepting
You should be aware that all capture functions will still pass the values through to the main stdio write() functions, so logging will still go to your standard IO devices.
If this is not desirable, you can use the intercept functions. These functions are literally s/capture/intercept when compared to those shown above, and the only difference is that calls aren't forwarded through to the base implementation.

Simply add the following snippet to your code will let you catch the logs and still print it in the console:
var log = [];
console.log = function(d) {
log.push(d);
process.stdout.write(d + '\n');
};

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Nodejs read file line by line and accumulate results in global object - node.js

Related

NodeJS calling readline within a function

nodejs read multiple csv files counting lines, and produce and overall tally at the end

Is it possible to register multiple listeners to a child process's stdout data event? [duplicate]

How to get callback value in node.js in a variable

Catching console.log in node.js?

Categories

Resources