I cannot append data from multiple Read Streams using createWriteStreams() in Node.
It just creates a new file after each iteration.
chunkNo = 4;
for (i = 0; i < chunkNo; i++) {
const response = await this.downloadChunk(downloadUrl, i); // response.data is a ReadStream
await new Promise<void>((resolve, reject) => {
const stream = response.data.pipe(fs.createWriteStream('C:\\big_file.mkv'), { flags: 'a'});
stream.on('finish', () => {
resolve();
});
});
}
What am I missing? Even though the append flag is there.
Thank you in advance
To better consolidate your intuition about what the a (append) flag purpose is supposed to indicate, first have a look at this answer: Difference between modes a, a+, w, w+, and r+ in built-in open function?
It details the different modes in which a file can be open. As many other languages, JavaScript reuses the same principle that in C.
From this man page, the relevant section in our case is:
``a'' Open for writing. The file is created if it does not exist. The
stream is positioned at the end of the file. Subsequent writes
to the file will always end up at the then current end of file,
irrespective of any intervening fseek(3) or similar.
So in your code example, a new file would indeed be created at each iteration of the loop.
Therefore, what you can do is move the fs.createWriteStream('C:\\big_file.mkv', { flags: 'a'}) outside of the for loop, assigned a name to this readable stream and use it the pipe(), something along the way of :
chunkNo = 4;
const dest = fs.createWriteStream('C:\\big_file.mkv', { flags: 'a'});
for (i = 0; i < chunkNo; i++) {
const response = await this.downloadChunk(downloadUrl, i); // response.data is a ReadStream
await new Promise<void>((resolve, reject) => {
const stream = response.data.pipe(dest);
stream.on('finish', () => {
resolve();
});
});
}
Related
My aim is to do something like that :
function writer1(data,file){
const w = fs.createWriteStream(file,{flags:'w'})
for(let i = 0; i< data.length; i++){
w.write(data[i])
}
w.end()
}
function writer2(data,file, *some-stuff*){
const w = fs.createWriteStream(file,{flags:'w'})
for(let i = 0; i< data.length; i++){
if(data[i] !== *some-stuff*){
w.write(data[i])
}
}
w.end()
}
writer1(data,"file.txt")
writer2(data,"file.txt", "some string")
IMPORTANT TO NOTE : in the true piece of code I'm writing, writer1 has a condition to run; it runs only if the file it needs to write does not exists
But here is my problem; if the according files does not exists, i.e. if the 'STATE' of the project is init-state, then writer1 is launched but somehow shadows the execution of writer2.
The result is a txt file filled with the content DATA.
On the second pass, then writer1 is not launched, does not shadow the execution of writer2, and the result is a txt file filled with the content of DATA MINUS the variable some-stuff.
Essentially, my question is :
Why is the first stream shadowing the second and how to prevent that ?
I do understand that there's something asynchronous to be dealed with or a request to be made to the stream object in order to allow for other streams to access the same file. What is missing ?
Writing to a stream is an asynchronous process. If you open the file again in writer2 before writer1 has closed it, the writings of writer2 may be lost.
The following variant of writer1 is an asynchronous function that resolves only after closing the file. You can await this before calling writer2.
function writer1(data, file) {
return new Promise(function(resolve, reject) {
const w = fs.createWriteStream(file, {flags: 'w'})
.on("close", resolve)
.on("error", reject);
for (let i = 0; i < data.length; i++)
w.write(data[i]);
w.end();
});
}
function writer2(...) // similar
await writer1(data, "file.txt");
await writer2(data, "file.txt", "some string");
But I second jfriend00's question about what problem you are trying to solve.
I have two functions.
The first function reads all the files in a folder and writes their data to a new file.
The second function takes that new file (output of function 1) as input and creates another file. Therefore it has to wait until the write stream of function 1 has finished.
const fs = require('fs');
const path = require('path');
function f1(inputDir, outputFile) {
let stream = fs.createWriteStream(outputFile, {flags:'a'}); // new data should be appended to outputFile piece by piece (hence flag a)
let files = await fs.promises.readdir(inputDir);
for(let file of files) {
let pathOfCurrentFile = path.join(inputDir, file);
let stat = fs.statSync(pathOfCurrentFile);
if(stat.isFile()) {
data = await fs.readFileSync(pathOfCurrentFile, 'utf8');
// now the data is being modified for output
let result = data + 'other stuff';
stream.write(result);
}
}
stream.end();
}
function f2(inputFile, outputFile) {
let newData = doStuffWithMy(inputFile);
let stream = fs.createWriteStream(outputFile);
stream.write(newData);
stream.end();
}
f1('myFiles', 'myNewFile.txt');
f2('myNewFile.txt', 'myNewestFile.txt');
Here's what happens:
'myNewFile.txt' (output of f1) is created correctly
'myNewestFile.txt' is created but is either empty or only contains one or two words (it should contain a long text)
When I use a timeout before executing f2, it works fine, but I can't use a timeout because there can be thousands of input files in the inputDir, therefore I need a way to do it dynamically.
I've experimented with async/await, callbacks, promises etc. but that stuff seems to be a little to advanced for me, I couldn't get it to work.
Is there anything else I can try?
Since you asked about a synchronous version, here's what that could look like. This should only be used in a single user script or in startup code, not in a running server. A server should only use asynchronous file I/O.
// synchronous version
function f1(inputDir, outputFile) {
let outputHandle = fs.openSync(outputFile, "a");
try {
let files = fs.readdirSync(inputDir, {withFileTypes: true});
for (let f of files) {
if (f.isFile()) {
let pathOfCurrentFile = path.join(inputDir, f.name);
let data = fs.readFileSync(pathOfCurrentFile, 'utf8');
fs.writeSync(outputHandle, data);
}
}
} finally {
fs.closeSync(outputHandle);
}
}
function f2(inputFile, outputFile) {
let newData = doStuffWithMy(inputFile);
fs.writeFileSync(outputFile, newData);
}
f1('myFiles', 'myNewFile.txt');
f2('myNewFile.txt', 'myNewestFile.txt');
I built a simple asynchronous implementation of the readlines module built into nodejs, which is simply a wrapper around the event-based module itself. The code is below;
const readline = require('readline');
module.exports = {
createInterface: args => {
let self = {
interface: readline.createInterface(args),
readLine: () => new Promise((succ, fail) => {
if (self.interface === null) {
succ(null);
} else {
self.interface.once('line', succ);
}
}),
hasLine: () => self.interface !== null
};
self.interface.on('close', () => {
self.interface = null;
});
return self;
}
}
Ideally, I would use it like so, in code like this;
const readline = require("./async-readline");
let filename = "bar.txt";
let linereader = readline.createInterface({
input: fs.createReadStream(filename)
});
let lines = 0;
while (linereader.hasLine()) {
let line = await linereader.readLine();
lines++;
console.log(lines);
}
console.log("Finished");
However, i've observed some erratic and unexpected behavior with this async wrapper. For one, it fails to recognize when the file ends, and simply hangs once it reaches the last line, never printing "Finished". And on top of that, when the input file is large, say a couple thousand lines, it's always off by a few lines and doesn't successfully read the full file before halting. in a 2000+ line file it could be off by as many as 20-40 lines. If I throw a print statement into the .on('close' listener, I see that it does trigger; however, the program still doesn't recognize that it should no longer have lines to read.
It seems that in nodejs v11.7, the readline interface was given async iterator functionality and can simply be looped through with a for await ... of loop;
const rl = readline.createInterface({
input: fs.createReadStream(filename);
});
for await (const line of rl) {
console.log(line)
}
How to get synchronous readline, or "simulate" it using async, in nodejs?
My english is not so good so I hope to be clear.
Just for sake of curiosity I want to test multiple concurrent(*) file writes on the same file and see that t produce errors.
The manual is clear on that:
Note that it is unsafe to use fs.writeFile multiple times on the same
file without waiting for the callback. For this scenario,
fs.createWriteStream is strongly recommended.
So if I write a relatively big amount of data into a file and in the while it is still writing I'm call another file write on the same file without waiting for the callback.. I'm expecting some sort of error.
I tried to wrote a small example to test this situation but I can't manage to see any errors.
"use strict";
const fs = require('fs');
const writeToFile = (filename, data) => new Promise((resolve, reject) => {
fs.writeFile(filename, data, { flag: 'a' }, err => {
if (err) {
return reject(err);
}
return resolve();
});
});
const getChars = (num, char) => {
let result = '';
for (let i = 0; i < num; ++i) {
result += char;
}
return result + '\n';
};
const k = 10000000;
const data0 = getChars(k, 0);
const data1 = getChars(k, 1);
writeToFile('test1', data0)
.then(() => console.log('0 written'))
.catch(e => console.log('errors in write 0'));
writeToFile('test1', data1)
.then(() => console.log('1 written'))
.catch(e => console.log('errors in write 1'));
To test it instead of open the file with some editor (that is a little bit slow in my box) I use a linux command to see the end of the first buffer and the beginning of the second buffer (and that they do not overlap):
tail -c 10000010 test1 | grep 0
But I'm not sure it is the right way to test it.
Just to be clear I'm with node v6.2.2, and mac 10.11.6.
Does anyone over there can point me a small sketch that uses fs.writeFile that produce a wrong output?
(*) concurrent = don't wait for the end of one file write to begin the next one
I have a 10 GB log file in a particular format, I want to process this file line by line and then write the output to other file after applying some transformations. I am using node for this operation.
Though this method is fine but it takes a hell lot of time to do this. I was able to do this within 30-45 mins in JAVA, but in node it is taking more than 160 minutes to do the same job. Following is the code:
Following is the initiation code which reads each line from the input.
var path = '../10GB_input_file.txt';
var output_file = '../output.txt';
function fileopsmain(){
fs.exists(output_file, function(exists){
if(exists) {
fs.unlink(output_file, function (err) {
if (err) throw err;
console.log('successfully deleted ' + output_file);
});
}
});
new lazy(fs.createReadStream(path, {bufferSize: 128 * 4096}))
.lines
.forEach(function(line){
var line_arr = line.toString().split(';');
perform_line_ops(line_arr, line_arr[6], line_arr[7], line_arr[10]);
}
);
}
This is the method that performs some operation over that line and
passes the input to write method to write it into the output file.
function perform_line_ops(line_arr, range_start, range_end, daynums){
var _new_lines = '';
for(var i=0; i<days; i++){
//perform some operation to modify line pass it to print
}
write_line_ops(_new_lines);
}
Following method is used to write data into a new file.
function write_line_ops(line) {
if(line != null && line != ''){
fs.appendFileSync(output_file, line);
}
}
I want to bring this time down to 15-20 mins. Is it possible to do so.
Also for the record I'm trying this on a intel i7 processor with 8 GB of RAM.
You can do this easily without a module. For example:
var fs = require('fs');
var inspect = require('util').inspect;
var buffer = '';
var rs = fs.createReadStream('foo.log');
rs.on('data', function(chunk) {
var lines = (buffer + chunk).split(/\r?\n/g);
buffer = lines.pop();
for (var i = 0; i < lines.length; ++i) {
// do something with `lines[i]`
console.log('found line: ' + inspect(lines[i]));
}
});
rs.on('end', function() {
// optionally process `buffer` here if you want to treat leftover data without
// a newline as a "line"
console.log('ended on non-empty buffer: ' + inspect(buffer));
});
I can't guess where the possible bottleneck is in your code.
Can you add the library or the source code of the lazy function?
How many operations does your perform_line_ops do? (if/else, switch/case, function calls)
I've created a example based on your given code, I know that this does not answer your question but maybe helps you understand how node handles such case.
const fs = require('fs')
const path = require('path')
const inputFile = path.resolve(__dirname, '../input_file.txt')
const outputFile = path.resolve(__dirname, '../output_file.txt')
function bootstrap() {
// fs.exists is deprecated
// check if output file exists
// https://nodejs.org/api/fs.html#fs_fs_exists_path_callback
fs.exists(outputFile, (exists) => {
if (exists) {
// output file exists, delete it
// https://nodejs.org/api/fs.html#fs_fs_unlink_path_callback
fs.unlink(outputFile, (err) => {
if (err) {
throw err
}
console.info(`successfully deleted: ${outputFile}`)
checkInputFile()
})
} else {
// output file doesn't exist, move on
checkInputFile()
}
})
}
function checkInputFile() {
// check if input file can be read
// https://nodejs.org/api/fs.html#fs_fs_access_path_mode_callback
fs.access(inputFile, fs.constants.R_OK, (err) => {
if (err) {
// file can't be read, throw error
throw err
}
// file can be read, move on
loadInputFile()
})
}
function saveToOutput() {
// create write stream
// https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options
const stream = fs.createWriteStream(outputFile, {
flags: 'w'
})
// return wrapper function which simply writes data into the stream
return (data) => {
// check if the stream is writable
if (stream.writable) {
if (data === null) {
stream.end()
} else if (data instanceof Array) {
stream.write(data.join('\n'))
} else {
stream.write(data)
}
}
}
}
function parseLine(line, respond) {
respond([line])
}
function loadInputFile() {
// create write stream
const saveOutput = saveToOutput()
// create read stream
// https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options
const stream = fs.createReadStream(inputFile, {
autoClose: true,
encoding: 'utf8',
flags: 'r'
})
let buffer = null
stream.on('data', (chunk) => {
// append the buffer to the current chunk
const lines = (buffer !== null)
? (buffer + chunk).split('\n')
: chunk.split('\n')
const lineLength = lines.length
let lineIndex = -1
// save last line for later (last line can be incomplete)
buffer = lines[lineLength - 1]
// loop trough all lines
// but don't include the last line
while (++lineIndex < lineLength - 1) {
parseLine(lines[lineIndex], saveOutput)
}
})
stream.on('end', () => {
if (buffer !== null && buffer.length > 0) {
// parse the last line
parseLine(buffer, saveOutput)
}
// Passing null signals the end of the stream (EOF)
saveOutput(null)
})
}
// kick off the parsing process
bootstrap()
I know this is old but...
At a guess appendFileSync() _write()_s to the file system and waits for the response. Lots of small writes are generally expensive, presuming you use a BufferedWriter in Java you might get faster results by skipping some write()s.
Use one of the async writes and see if node buffers sensibly, or write the lines to large node Buffer until it is full and always write a full (or nearly full) Buffer. By tuning the buffer size you could validate if the number of writes affects perf. I suspect it would.
The execution is slow, because you're not using node's asynchronous operations. In essence, you're executing the code like this:
> read some lines
> transform
> write some lines
> repeat
While you could be doing everything at once, or at least reading and writing. Some examples in the answers here do that, but the syntax is at least complicated. Using scramjet you can do it in a couple simple lines:
const {StringStream} = require('scramjet');
fs.createReadStream(path, {bufferSize: 128 * 4096})
.pipe(new StringStream({maxParallel: 128}) // I assume this is an utf-8 file
.split("\n") // split per line
.parse((line) => line.split(';')) // parse line
.map([line_arr, range_start, range_end, daynums] => {
return simplyReturnYourResultForTheOtherFileHere(
line_arr, range_start, range_end, daynums
); // run your code, return promise if you're doing some async work
})
.stringify((result) => result.toString())
.pipe(fs.createWriteStream)
.on("finish", () => console.log("done"))
.on("error", (e) => console.log("error"))
This will probably run much faster.