How to see broken file writes with node.js - node.js

My english is not so good so I hope to be clear.
Just for sake of curiosity I want to test multiple concurrent(*) file writes on the same file and see that t produce errors.
The manual is clear on that:
Note that it is unsafe to use fs.writeFile multiple times on the same
file without waiting for the callback. For this scenario,
fs.createWriteStream is strongly recommended.
So if I write a relatively big amount of data into a file and in the while it is still writing I'm call another file write on the same file without waiting for the callback.. I'm expecting some sort of error.
I tried to wrote a small example to test this situation but I can't manage to see any errors.
"use strict";
const fs = require('fs');
const writeToFile = (filename, data) => new Promise((resolve, reject) => {
fs.writeFile(filename, data, { flag: 'a' }, err => {
if (err) {
return reject(err);
}
return resolve();
});
});
const getChars = (num, char) => {
let result = '';
for (let i = 0; i < num; ++i) {
result += char;
}
return result + '\n';
};
const k = 10000000;
const data0 = getChars(k, 0);
const data1 = getChars(k, 1);
writeToFile('test1', data0)
.then(() => console.log('0 written'))
.catch(e => console.log('errors in write 0'));
writeToFile('test1', data1)
.then(() => console.log('1 written'))
.catch(e => console.log('errors in write 1'));
To test it instead of open the file with some editor (that is a little bit slow in my box) I use a linux command to see the end of the first buffer and the beginning of the second buffer (and that they do not overlap):
tail -c 10000010 test1 | grep 0
But I'm not sure it is the right way to test it.
Just to be clear I'm with node v6.2.2, and mac 10.11.6.
Does anyone over there can point me a small sketch that uses fs.writeFile that produce a wrong output?
(*) concurrent = don't wait for the end of one file write to begin the next one

Related

Cannot append data to a file using createWriteStream() in Node

I cannot append data from multiple Read Streams using createWriteStreams() in Node.
It just creates a new file after each iteration.
chunkNo = 4;
for (i = 0; i < chunkNo; i++) {
const response = await this.downloadChunk(downloadUrl, i); // response.data is a ReadStream
await new Promise<void>((resolve, reject) => {
const stream = response.data.pipe(fs.createWriteStream('C:\\big_file.mkv'), { flags: 'a'});
stream.on('finish', () => {
resolve();
});
});
}
What am I missing? Even though the append flag is there.
Thank you in advance
To better consolidate your intuition about what the a (append) flag purpose is supposed to indicate, first have a look at this answer: Difference between modes a, a+, w, w+, and r+ in built-in open function?
It details the different modes in which a file can be open. As many other languages, JavaScript reuses the same principle that in C.
From this man page, the relevant section in our case is:
``a'' Open for writing. The file is created if it does not exist. The
stream is positioned at the end of the file. Subsequent writes
to the file will always end up at the then current end of file,
irrespective of any intervening fseek(3) or similar.
So in your code example, a new file would indeed be created at each iteration of the loop.
Therefore, what you can do is move the fs.createWriteStream('C:\\big_file.mkv', { flags: 'a'}) outside of the for loop, assigned a name to this readable stream and use it the pipe(), something along the way of :
chunkNo = 4;
const dest = fs.createWriteStream('C:\\big_file.mkv', { flags: 'a'});
for (i = 0; i < chunkNo; i++) {
const response = await this.downloadChunk(downloadUrl, i); // response.data is a ReadStream
await new Promise<void>((resolve, reject) => {
const stream = response.data.pipe(dest);
stream.on('finish', () => {
resolve();
});
});
}

How to close the file descriptor opened using fs.readFile/writeFile

I've legacy code version 0.12.7 which is working perfectly fine.
However it is giving EMFILE "Too many files open" error frequently.
How can I release the file descriptor opened using:
require("fs").readFile(resobj.name, 'utf8', function (err, data)
{
});
You will most likely need to read the files in batches, like this:
const fs = require('fs/promises');
const files = [...<array of millions of file paths>];
const MAX_FILES_TO_PROCESS = 1000;
let promises;
let contents;
// Process 1000 files at a time
(async () => {
for (let a = 0; a < files.length; a += MAX_FILES_TO_PROCESS) {
promises = (files.slice(a, MAX_FILES_TO_PROCESS)).map(path => fs.readFile(path));
contents = await Promise.all(promises);
//Process the contents, then continue on the next loop
}
})();
Two observations that may help you:
Use createReadStream instead of readFile because the latter reads the entire file into memory. If you're handling thousands or millions of objects it's not scalable.
readFile doesn't return a descriptor because it closes the file automatically.

Asynchronous file read reading different number of lines each time, not halting

I built a simple asynchronous implementation of the readlines module built into nodejs, which is simply a wrapper around the event-based module itself. The code is below;
const readline = require('readline');
module.exports = {
createInterface: args => {
let self = {
interface: readline.createInterface(args),
readLine: () => new Promise((succ, fail) => {
if (self.interface === null) {
succ(null);
} else {
self.interface.once('line', succ);
}
}),
hasLine: () => self.interface !== null
};
self.interface.on('close', () => {
self.interface = null;
});
return self;
}
}
Ideally, I would use it like so, in code like this;
const readline = require("./async-readline");
let filename = "bar.txt";
let linereader = readline.createInterface({
input: fs.createReadStream(filename)
});
let lines = 0;
while (linereader.hasLine()) {
let line = await linereader.readLine();
lines++;
console.log(lines);
}
console.log("Finished");
However, i've observed some erratic and unexpected behavior with this async wrapper. For one, it fails to recognize when the file ends, and simply hangs once it reaches the last line, never printing "Finished". And on top of that, when the input file is large, say a couple thousand lines, it's always off by a few lines and doesn't successfully read the full file before halting. in a 2000+ line file it could be off by as many as 20-40 lines. If I throw a print statement into the .on('close' listener, I see that it does trigger; however, the program still doesn't recognize that it should no longer have lines to read.
It seems that in nodejs v11.7, the readline interface was given async iterator functionality and can simply be looped through with a for await ... of loop;
const rl = readline.createInterface({
input: fs.createReadStream(filename);
});
for await (const line of rl) {
console.log(line)
}
How to get synchronous readline, or "simulate" it using async, in nodejs?

Creating an empty file of a certain size?

How can we create an empty file of certain size? I have a requirement where I need to create an empty file (i.e. file filled with zero bytes). In order to do so, this is the approach I am currently taking:
I create an empty file of zero byte size.
Next I keep on appending zero bytes buffer (max 2 GB at a time) to that file till I reach the desired size.
Here's the code I am using currently:
const createEmptyFileOfSize = (fileName, size) => {
return new Promise((resolve, reject) => {
try {
//First create an empty file.
fs.writeFile(fileName, Buffer.alloc(0), (error) => {
if (error) {
reject(error);
} else {
let sizeRemaining = size;
do {
const chunkSize = Math.min(sizeRemaining, buffer.kMaxLength);
const dataBuffer = Buffer.alloc(chunkSize);
try {
fs.appendFileSync(fileName, dataBuffer);
sizeRemaining -= chunkSize;
} catch (error) {
reject(error);
}
} while (sizeRemaining > 0);
resolve(true);
}
});
} catch (error) {
reject(error);
}
});
};
While this code works and I am able to create very large files (though it takes significant time to create an empty large file [roughly about 5 seconds to create a 10 GB file]) however I am wondering if there's a better way of accomplishing this.
There is no need to write anything in the file. All you have to do is to open the file for writing ('w' means "create if doesn't exist, truncate if exists") and write at least one byte to the offset you need. If the offset is larger than the current size of the file (when it exists) the file is extended to accommodate the new offset.
Your code should be as simple as this:
const fs = require('fs');
const createEmptyFileOfSize = (fileName, size) => {
return new Promise((resolve, reject) => {
fh = fs.openSync(fileName, 'w');
fs.writeSync(fh, 'ok', Math.max(0, size - 2));
fs.closeSync(fh);
resolve(true);
});
};
// Create a file of 1 GiB
createEmptyFileOfSize('./1.txt', 1024*1024*1024);
Please note that the code above doesn't handle the errors. It was written to show an use case. Your real production code should handle the errors (and reject the promise, of course).
Read more about fs.openSync(), fs.writeSync() and fs.closeSync().
Update
A Promise should do its processing asynchronously; the executor function passed to the constructor should end as soon as possible, leaving the Promise in the pending state. Later, when the processing completes, the Promise will use the resolve or reject callbacks passed to the executor to change its state.
The complete code, with error handling and the correct creation of a Promise could be:
const fs = require('fs');
const createEmptyFileOfSize = (fileName, size) => {
return new Promise((resolve, reject) => {
// Check size
if (size < 0) {
reject("Error: a negative size doesn't make any sense")
return;
}
// Will do the processing asynchronously
setTimeout(() => {
try {
// Open the file for writing; 'w' creates the file
// (if it doesn't exist) or truncates it (if it exists)
fd = fs.openSync(fileName, 'w');
if (size > 0) {
// Write one byte (with code 0) at the desired offset
// This forces the expanding of the file and fills the gap
// with characters with code 0
fs.writeSync(fd, Buffer.alloc(1), 0, 1, size - 1);
}
// Close the file to commit the changes to the file system
fs.closeSync(fd);
// Promise fulfilled
resolve(true);
} catch (error) {
// Promise rejected
reject(error);
}
// Create the file after the processing of the current JavaScript event loop
}, 0)
});
};

Nodejs Read very large file(~10GB), Process line by line then write to other file

I have a 10 GB log file in a particular format, I want to process this file line by line and then write the output to other file after applying some transformations. I am using node for this operation.
Though this method is fine but it takes a hell lot of time to do this. I was able to do this within 30-45 mins in JAVA, but in node it is taking more than 160 minutes to do the same job. Following is the code:
Following is the initiation code which reads each line from the input.
var path = '../10GB_input_file.txt';
var output_file = '../output.txt';
function fileopsmain(){
fs.exists(output_file, function(exists){
if(exists) {
fs.unlink(output_file, function (err) {
if (err) throw err;
console.log('successfully deleted ' + output_file);
});
}
});
new lazy(fs.createReadStream(path, {bufferSize: 128 * 4096}))
.lines
.forEach(function(line){
var line_arr = line.toString().split(';');
perform_line_ops(line_arr, line_arr[6], line_arr[7], line_arr[10]);
}
);
}
This is the method that performs some operation over that line and
passes the input to write method to write it into the output file.
function perform_line_ops(line_arr, range_start, range_end, daynums){
var _new_lines = '';
for(var i=0; i<days; i++){
//perform some operation to modify line pass it to print
}
write_line_ops(_new_lines);
}
Following method is used to write data into a new file.
function write_line_ops(line) {
if(line != null && line != ''){
fs.appendFileSync(output_file, line);
}
}
I want to bring this time down to 15-20 mins. Is it possible to do so.
Also for the record I'm trying this on a intel i7 processor with 8 GB of RAM.
You can do this easily without a module. For example:
var fs = require('fs');
var inspect = require('util').inspect;
var buffer = '';
var rs = fs.createReadStream('foo.log');
rs.on('data', function(chunk) {
var lines = (buffer + chunk).split(/\r?\n/g);
buffer = lines.pop();
for (var i = 0; i < lines.length; ++i) {
// do something with `lines[i]`
console.log('found line: ' + inspect(lines[i]));
}
});
rs.on('end', function() {
// optionally process `buffer` here if you want to treat leftover data without
// a newline as a "line"
console.log('ended on non-empty buffer: ' + inspect(buffer));
});
I can't guess where the possible bottleneck is in your code.
Can you add the library or the source code of the lazy function?
How many operations does your perform_line_ops do? (if/else, switch/case, function calls)
I've created a example based on your given code, I know that this does not answer your question but maybe helps you understand how node handles such case.
const fs = require('fs')
const path = require('path')
const inputFile = path.resolve(__dirname, '../input_file.txt')
const outputFile = path.resolve(__dirname, '../output_file.txt')
function bootstrap() {
// fs.exists is deprecated
// check if output file exists
// https://nodejs.org/api/fs.html#fs_fs_exists_path_callback
fs.exists(outputFile, (exists) => {
if (exists) {
// output file exists, delete it
// https://nodejs.org/api/fs.html#fs_fs_unlink_path_callback
fs.unlink(outputFile, (err) => {
if (err) {
throw err
}
console.info(`successfully deleted: ${outputFile}`)
checkInputFile()
})
} else {
// output file doesn't exist, move on
checkInputFile()
}
})
}
function checkInputFile() {
// check if input file can be read
// https://nodejs.org/api/fs.html#fs_fs_access_path_mode_callback
fs.access(inputFile, fs.constants.R_OK, (err) => {
if (err) {
// file can't be read, throw error
throw err
}
// file can be read, move on
loadInputFile()
})
}
function saveToOutput() {
// create write stream
// https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options
const stream = fs.createWriteStream(outputFile, {
flags: 'w'
})
// return wrapper function which simply writes data into the stream
return (data) => {
// check if the stream is writable
if (stream.writable) {
if (data === null) {
stream.end()
} else if (data instanceof Array) {
stream.write(data.join('\n'))
} else {
stream.write(data)
}
}
}
}
function parseLine(line, respond) {
respond([line])
}
function loadInputFile() {
// create write stream
const saveOutput = saveToOutput()
// create read stream
// https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options
const stream = fs.createReadStream(inputFile, {
autoClose: true,
encoding: 'utf8',
flags: 'r'
})
let buffer = null
stream.on('data', (chunk) => {
// append the buffer to the current chunk
const lines = (buffer !== null)
? (buffer + chunk).split('\n')
: chunk.split('\n')
const lineLength = lines.length
let lineIndex = -1
// save last line for later (last line can be incomplete)
buffer = lines[lineLength - 1]
// loop trough all lines
// but don't include the last line
while (++lineIndex < lineLength - 1) {
parseLine(lines[lineIndex], saveOutput)
}
})
stream.on('end', () => {
if (buffer !== null && buffer.length > 0) {
// parse the last line
parseLine(buffer, saveOutput)
}
// Passing null signals the end of the stream (EOF)
saveOutput(null)
})
}
// kick off the parsing process
bootstrap()
I know this is old but...
At a guess appendFileSync() _write()_s to the file system and waits for the response. Lots of small writes are generally expensive, presuming you use a BufferedWriter in Java you might get faster results by skipping some write()s.
Use one of the async writes and see if node buffers sensibly, or write the lines to large node Buffer until it is full and always write a full (or nearly full) Buffer. By tuning the buffer size you could validate if the number of writes affects perf. I suspect it would.
The execution is slow, because you're not using node's asynchronous operations. In essence, you're executing the code like this:
> read some lines
> transform
> write some lines
> repeat
While you could be doing everything at once, or at least reading and writing. Some examples in the answers here do that, but the syntax is at least complicated. Using scramjet you can do it in a couple simple lines:
const {StringStream} = require('scramjet');
fs.createReadStream(path, {bufferSize: 128 * 4096})
.pipe(new StringStream({maxParallel: 128}) // I assume this is an utf-8 file
.split("\n") // split per line
.parse((line) => line.split(';')) // parse line
.map([line_arr, range_start, range_end, daynums] => {
return simplyReturnYourResultForTheOtherFileHere(
line_arr, range_start, range_end, daynums
); // run your code, return promise if you're doing some async work
})
.stringify((result) => result.toString())
.pipe(fs.createWriteStream)
.on("finish", () => console.log("done"))
.on("error", (e) => console.log("error"))
This will probably run much faster.

Resources