Objective
Forcing fs (and the libraries using it) to write everything to files before terminating application.
Background
I am writing an object to a CSV file using the npm package csv-write-stream.
Once the library is done writing the CSV file, I want to terminate my application using process.exit().
Code
To achieve the aforementioned objective, I have written the following:
let writer = csvWriter({
headers: ['country', 'postalCode']
});
writer.pipe(fs.createWriteStream('myOutputFile.csv'));
//Very big array with a lot of postal code info
let currCountryCodes = [{country: Portugal, postalCode: '2950-286'}, {country: Barcelona, postalCode: '08013'}];
for (let j = 0; j < currCountryCodes.length; j++) {
writer.write(currCountryCodes[j]);
}
writer.end(function() {
console.log('=== CSV written successfully, stopping application ===');
process.exit();
});
Problem
The problem here is that if I execute process.exit(), the library wont have time to write to the file, and the file will be empty.
Since the library uses fs, my solution to this problem, is to force a fs.dump() or something similar in NodeJs, but after searching, I found nothing similar.
Questions
How can I force fs to dump (push) all the content to the file before exiting the application?
If the first option is not possible, is there a way to wait for the application to write and then close it ?
I think your guess is right.
When you call process.exit(), the piped write stream hasn't finished writing yet.
If you really want to terminate your server explicitly, this will do.
let r = fs.createWriteStream('myOutputFile.csv');
writer.pipe(r);
...
writer.end(function() {
r.end(function() {
console.log('=== CSV written successfully, stopping application ===');
process.exit();
});
});
Related
I wrote a simple utility to convert a somewhat weird json file (multiple objects not in an array) to csv for some system testing purposes. The read and transformation themselves are fine, and the resulting string is logged to the console correctly, but sometimes the resulting csv file is missing the first data line (it shows header, 1 blank line, then rest of data). I'm using read and write streams, without any provisions for backpressure. I don't think the problem is backpressure, since only the 1st line gets skipped, but I could be wrong. Any ideas?
const fs = require('fs');
const readline = require('readline');
const JSONbig = require('json-bigint');
// Create read interface to stream each line
const readInterface = readline.createInterface({
input: fs.createReadStream('./confirm.json'),
// output: process.stdout,
console: false
});
const writeHeader = fs.createWriteStream('./confirm.csv');
const header = "ACTION_TYPE,PROCESS_PICK,TYPE_FLAG,APP_ID,FACILITY_ID,CONTAINER_ID,USER_ID,CONFIRM_DATE_TS,PICK_QTY,REMAINING_QTY,PICK_STATUS,ASSIGNMENT_ID,LOCATION_ID,ITEM_ID,CLUSTER_ID,TOTAL_QTY,TOTAL_ITEMS,WAVE_NBR,QA_FLAG,WORK_DIRECTIVE_ID\n";
writeHeader.write(header);
// Create write interface to save each csv line
const writeDetail = fs.createWriteStream('./confirm.csv', {
flags: 'a'
});
readInterface.on('line', function(line) {
let task = JSONbig.parse(line);
task.businessData.MESSAGE.RECORD[0].DETAIL.REG_DETAIL.forEach(element => {
let csv = "I,PTB,0,VCO,PR9999999011,,cpicker1,2020121000000," + element.QUANTITYTOPICK.toString() + ",0,COMPLETED," +
task.businessData.MESSAGE.RECORD[0].ASSIGNMENTNUMBER.toString() + "," + element.LOCATIONNUMBER.toString() + "," +
element.ITEMNUMBER.toString() + ",,,," +
task.businessData.MESSAGE.RECORD[0].WAVE.toString() + ",N," + element.CARTONNUMBER.toString() + "\n";
console.log(csv);
try {
writeDetail.write(csv);
} catch (err) {
console.error(err);
}
});
});
Edit: Based on the feedback below, I consolidated the write streams into one (the missing line was still happening, but it's better coding anyway). I also added a try block around the JSON parse. Ran the code several times over different files, and no missing line. Maybe the write was happening before the parse was done? In any case, it seems my problem is resolved for the moment. I'll have to research how to properly handle backpressure later. Thanks for the help.
The code you show here is opening two separate writestreams on the same file and then writing to both of them without any timing coordination between them. That will clearly conflict.
You open one here:
const writeHeader = fs.createWriteStream('./confirm.csv');
const header = "ACTION_TYPE,PROCESS_PICK,TYPE_FLAG,APP_ID,FACILITY_ID,CONTAINER_ID,USER_ID,CONFIRM_DATE_TS,PICK_QTY,REMAINING_QTY,PICK_STATUS,ASSIGNMENT_ID,LOCATION_ID,ITEM_ID,CLUSTER_ID,TOTAL_QTY,TOTAL_ITEMS,WAVE_NBR,QA_FLAG,WORK_DIRECTIVE_ID\n";
writeHeader.write(header);
And, you open one here:
// Create write interface to save each csv line
const writeDetail = fs.createWriteStream('./confirm.csv', {
flags: 'a'
});
And, then you write to the second one in your loop. Those clearly conflict. The write from the first is probably not complete when you open the second and it also may not be flushed to disk yet either. The second one opens for append, but doesn't accurately read the file position for appending because the first one hasn't yet succeeded.
This code doesn't show any reason for using separate write streams at all so the cleanest way to address this would be to just use one writestream that will accurately serialize the writes. Otherwise, you have to wait for the first writestream to finish and close before opening the second one.
And, your .forEach() loop needs to have backpressure support since you're repeatedly calling .write() and, at some data size, you can get backpressure. I agree that backpressure is not likely the cause of the issue you are asking about, but is something else you need to fix when rapdily writing in a loop.
I have a Node v10.14.1 program that reads a CSV file line-by-line using the readline Interface
My .on('line') is an async callback performs some operations which read/write from a db, thus I use async/await to deal with the promises.
A short version of the program's code block of interest would look something like:
const readline = require('readline');
const filesystem = require('fs');
const reader = readline.createInterface({
input: filesystem.createReadStream(pathToSomeCSV)
});
reader.on('line', async (line) => {
await doSomeDBStuff();
})
If I leave the above the way it is, the process does not exit. However, if I
reader.on('close', () => {process.exit()});
then the process exits prior to all of the on('line') callbacks finishing and their promises resolving.
My question is: is there a way to say "Upon all lines being read AND all on('line') callbacks being completed with their promises resolved, then exit the process (I assume with process.exit())"?
Investigation
I get the feeling the docs are leaving some non-obvious details out. I was unable to get this official example working correctly (which is what your question appears to be based on). That implementation would kill my application prematurely. Or, if I removed the 'close' listener, the terminal would just hang forever on exit. I tried overriding process.on('exit') to no avail. I also tried the prompt-sync package, but it consistently corrupted my terminal.
Solution
I found a lovely answer here which offers a good solution.
Create the function:
const prompt = msg => {
fs.writeSync(1, String(msg));
let s = '', buf = Buffer.alloc(1);
while(buf[0] - 10 && buf[0] - 13)
s += buf, fs.readSync(0, buf, 0, 1, 0);
return s.slice(1);
};
Use it:
const result = prompt('Input something: ');
console.log('Your input was: ' + result);
No terminal corruption, the application does not die prematurely, and it does not hang on exit, either.
This solution is not perfect however - it intentionally blocks the main thread while waiting for user input, meaning you cannot run other functions in the background while waiting for user input. In my mind user input should be thread-blocking in most cases anyway, so this solution works very well for me personally.
Edit: see an improved version for Linux here.
I'm writing an app in Node and have been running into a rare but detrimental occurrence.
So I have a schedule.txt and I write to it when the user makes a change but then also read it every second and then parse it for use throughout the program.
Rarely what happens is as a user is writing to the file (asynchronously) the app (based on the timer) reads the same file and attempts to parse it and fails.
I know from a design stand-point maybe this is just bound to happen... but I'm wondering if there is a quick fix I can do now. Would using writeFileSync help my situation? (make it more 'atomic'?) I just want to make sure that the app doesn't read the file while another process is still writing to the file.
TIA!
Niko
Seems like you'd want to serialize your read/writes. If it were me, I might try having a "manager" object which encapsulates the serialization, which you'd use like:
var fileManager = require('./file-manager');
// somewhere in the program
fileManager.scheduleWrite(data, function(err){
// now the write is done
});
// somewhere else in the program
fileManager.scheduleRead(function(err, data){
// `data` contains the data
});
Then implement it using Q or a similar promises lib, like:
// in file-manager.js
var wait = Q();
module.exports = {
scheduleWrite: function(data, cb){
wait = wait.then(function(){
// write data and call cb()
});
},
scheduleRead: function(){
wait = wait.then(function(){
// read data and call cb(data)
});
}
};
The wait var will "stack up" into a serialized chain of tasks where the next one won't start until the previous one completes.
I'm using node-tail to read a file in linux and send it down to a socket.
node.js sending data read from a text file
var io = require('socket.io');
Tail = require('tail').Tail;
tail = new Tail("/tmp/test.txt");
io.sockets.on('connection', function (socket) {
tail.on("line", function(data) {
socket.emit('Message', { test: data });
});
});
Receiving side
var socket = io.connect();
socket.on('Message', function (data) {
console.log(data.test);
});
This works but when I try to modify this part
tail = new Tail("/tmp/test.txt");
to this
tail = new Tail("/tmp/FIFOFILE");
I can't get any data from it.
Is there anyway to read a named pipe in linux? or a package that can read a named pipe?
I can get it to work in a silly way:
// app.js
process.stdin.resume();
process.stdin.on('data', function(chunk) {
console.log('D', chunk);
});
And start like this:
node app.js < /tmp/FIFOFILE
If I create a readable stream for the named pipe, it ends after having read the first piece of data written to the named pipe. Not sure why stdin is special.
The OS will send an EOF when the last process finishes writing to the FIFO. If only one process is writing to the FIFO then you get an EOF when that process finishes writing its stuff. This EOF triggers Node to close the stream.
The trick to avoiding this is given by #JoshuaWalsh in this answer, namely: you open the pipe yourself FOR READING AND WRITING - even though you have no intention of ever writing to it. This means that the OS sees that there is always at least one process writing to the file and so you never get the EOF.
So... just add in something like:
let fifoHandle = fs.open(fifoPath, fs.constants.O_RDWR,function(){console.log('FIFO open')});
You don't ever have to do anything with fifoHandle - just make sure it sticks around and doesn't get garbage collected.
In fact... in my case I was using createReadStream, and I found that simply adding the fs.constants.O_RDWR to this was enough (even though I have no intention of ever writing to the fifo.
let fifo = fs.createReadStream(fifoPath,{flags: fs.constants.O_RDWR});
fifo.on('data',function(data){
console.log('Got data:'+data.toString());
}
I have a file with a lot of entries (10+ million), each representing a partial document that is being saved to a mongo database (based on some criteria, non-trivial).
To avoid overloading the database (which is doing other operations at the same time), I wish to read in chunks of X lines, wait for them to finish, read the next X lines, etc.
Is there any way to use any of the fscallback-mechanisms to also "halt" progress at a certain point, without blocking the entire program? From what I can tell they will all run from start to finish with no way of stopping it, unless you stop reading the file entirely.
The issues is that because of the file size, memory also becomes an issue and because of the time the updates take, a LOT of the data will be held in memory exceeding the 1 GB limit and causing the program to crash. Secondarily, as I said, I don't want to queue 1 million updates and completely stress the mongo database.
Any and all suggestions welcome.
UPDATE: Final solution using line-reader (available via npm) below, in pseudo-code.
var lineReader = require('line-reader');
var filename = <wherever you get it from>;
lineReader(filename, function(line, last, cb) {
//
// Do work here, line contains the line data
// last is true if it's the last line in the file
//
function checkProcessed(callback) {
if (doneProcessing()) { // Implement doneProcessing to check whether whatever you are doing is done
callback();
}
else {
setTimeout(function() { checkProcessed(callback) }, 100); // Adjust timeout according to expecting time to process one line
}
}
checkProcessed(cb);
});
This is implemented to make sure doneProcessing() returns true before attempting to work on more lines - this means you can effectively throttle whatever you are doing.
I don't use MongoDB and I'm not an expert in using Lazy, but I think something like below might work or give you some ideas. (note that I have not tested this code)
var fs = require('fs'),
lazy = require('lazy');
var readStream = fs.createReadStream('yourfile.txt');
var file = lazy(readStream)
.lines // ask to read stream line by line
.take(100) // and read 100 lines at a time.
.join(function(onehundredlines){
readStream.pause(); // pause reading the stream
writeToMongoDB(onehundredLines, function(err){
// error checking goes here
// resume the stream 1 second after MongoDB finishes saving.
setTimeout(readStream.resume, 1000);
});
});
}