reading and writing a large txt file in node.js causing exception

reading and writing a large txt file in node.js causing exception - node.js

I have a large txt file of size approx ~ 1GB. I am trying to read the contents from the this file and write to another file.My code is -->
var fs = require("fs");
var fb = fs.openSync('./copy.txt','r+');
fs.open('./largefile.txt','r',function(error,fd){
fs.fstat(fd,function(error,stats){
var totalFileSize = stats.size,
chunk = 512,
buffer = new Buffer(512),
bytesRead = 0;
while(bytesRead < totalFileSize){
if((totalFileSize - bytesRead) < chunk){
chunk = totalFileSize - bytesRead ;
}
fs.read(fd,buffer,0,chunk,bytesRead,function(err, bytesRead, buffer){
fs.write(fb,buffer,0,chunk,bytesRead,function(err,written,buffer){});
});
bytesRead = bytesRead + chunk;
}
});
});
I got this error console ->
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Q1) What could I am possibly doing wrong?
Q2) Are there any benefits of doing that in child_process?If yes, should I use fork() or spawn() and how?(I am new to node.js and find child_process pretty confusing.)

All of the fs functions you're using are asynchronous, so you're effectively trying to open copy.txt thousands of times simultaneously.
It looks like you're also never updating bytesRead so your while loop will run forever.

Related

Nodejs for loop - stream runs out of memory

I'm generating a CSV file that I'd like to save.
It's a bit large, but the code is very simple.
I use streams as to prevent out of memory errors, but it's happening regardless.
Any tips?
const fs = require('fs');
var noOfRows = 2000000000;
var stream = fs.createWriteStream('myFile.csv', {flags: 'a'});
for (var i=0;i<=noOfRows;i++){
var col = '';
col += i;
stream.write(col)
}

add a drain eventlistener.
const fs = require("fs");
var noOfRows = 2000000000;
var stream = fs.createWriteStream("myFile.csv", { flags: "a" });
var i = 0;
function write() {
var ok = true;
do {
var data = i + "";
if (i === noOfRows) {
// last time!
stream.write(data);
} else {
// see if we should continue, or wait
// don't pass the callback, because we're not done yet.
ok = stream.write(data);
}
i++;
} while (i<=noOfRows && ok);
if (i < noOfRows) {
// had to stop early!
// write some more once it drains
stream.once("drain", write);
}
}
write();
And noOfRows is so big, it may cause your .csv file size out of disk size

Your .csv file has too much data to be kept in stream. Streams basically uses your computer's physical memory so it can store only upto the free physical memory. e.g. if your computer has 8GB of RAM of which lets say 6 GB is free then the stream can't store more than 6GB. You can break it up into chunks and then merge it back at the destination later.

There is no hard size limit on .csv files. The limit in any scenario would be the file system / hdd size.
The maximum file size of any file on a filesystem is determined by the
filesystem itself - not by the file type or filename suffix.
To prevent out memory errors check you file size limit as per your filesystem partition.

How to write incrementally to a text file and flush output

My Node.js program - which is an ordinary command line program that by and large doesn't do anything operationally unusual, nothing system-specific or asynchronous or anything like that - needs to write messages to a file from time to time, and then it will be interrupted with ^C and it needs the contents of the file to still be there.
I've tried using fs.createWriteStream but that just ends up with a 0-byte file. (The file does contain text if the program ends by running off the end of the main file, but that's not the scenario I have.)
I've tried using winston but that ends up not creating the file at all. (The file does contain text if the program ends by running off the end of the main file, but that's not the scenario I have.)
And fs.writeFile works perfectly when you have all the text you want to write up front, but doesn't seem to support appending a line at a time.
What is the recommended way to do this?
Edit: specific code I've tried:
var fs = require('fs')
var log = fs.createWriteStream('test.log')
for (var i = 0; i < 1000000; i++) {
console.log(i)
log.write(i + '\n')
}
Run for a few seconds, hit ^C, leaves a 0-byte file.

Turns out Node provides a lower level file I/O API that seems to work fine!
var fs = require('fs')
var log = fs.openSync('test.log', 'w')
for (var i = 0; i < 100000; i++) {
console.log(i)
fs.writeSync(log, i + '\n')
}

NodeJS doesn't work in the traditional way. It uses a single thread, so by running a large loop and doing I/O inside, you aren't giving it a chance (i.e. releasing the CPU) to do other async operations for eg: flushing memory buffer to actual file.
The logic must be - do one write, then pass your function (which invokes the write) as a callback to process.nextTick or as callback to the write stream's drain event (if buffer was full during last write).
Here's a quick and dirty version which does what you need. Notice that there are no long-running loops or other CPU blockage, instead I schedule my subsequent writes for future and return quickly, momentarily freeing up the CPU for other things.
var fs = require('fs')
var log = fs.createWriteStream('test.log');
var i = 0;
function my_write() {
if (i++ < 1000000)
{
var res = log.write("" + i + "\r\n");
if (!res) {
log.on('drain',my_write);
} else {
process.nextTick(my_write);
}
console.log("Done" + i + " " + res + "\r\n");
}
}
my_write();

This function might also be helpful.
/**
* Write `data` to a `stream`. if the buffer is full will block
* until it's flushed and ready to be written again.
* [see](https://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback)
*/
export function write(data, stream) {
return new Promise((resolve, reject) => {
if (stream.write(data)) {
process.nextTick(resolve);
} else {
stream.once("drain", () => {
stream.off("error", reject);
resolve();
});
stream.once("error", reject);
}
});
}

You are writing into file using for loop which is bad but that's other case. First of all createWriteStream doesn't close the file automatically you should call close.
If you call close immediately after for loop it will close without writing because it's async.
For more info read here: https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options
Problem is async function inside for loop.

Fast file copy with progress information in Node.js?

Is there any chance to copy large files with Node.js with progress infos and fast?
Solution 1 : fs.createReadStream().pipe(...) = useless, up to 5 slower than native cp
See: Fastest way to copy file in node.js, progress information is possible (with npm package 'progress-stream' ):
fs = require('fs');
fs.createReadStream('test.log').pipe(fs.createWriteStream('newLog.log'));
The only problem with that way is that it takes easily 5 times longer compared "cp source dest". See also the appendix below for the full test code.
Solution 2 : rsync ---info=progress2 = same slow as solution 1 = useless
Solution 3 : My last resort, write a native module for node.js, using "CoreUtils" (linux sources for cp and others) or other functions as shown in Fast file copy with progress
Does anyone knows better than solution 3? I'd like to avoid native code but it seems the best fit.
thanks! any package recommendations or hints (tried all fs**) are welcome!
Appendix:
test code, using pipe and progress:
var path = require('path');
var progress = require('progress-stream');
var fs = require('fs');
var _source = path.resolve('../inc/big.avi');// 1.5GB
var _target= '/tmp/a.avi';
var stat = fs.statSync(_source);
var str = progress({
length: stat.size,
time: 100
});
str.on('progress', function(progress) {
console.log(progress.percentage);
});
function copyFile(source, target, cb) {
var cbCalled = false;
var rd = fs.createReadStream(source);
rd.on("error", function(err) {
done(err);
});
var wr = fs.createWriteStream(target);
wr.on("error", function(err) {
done(err);
});
wr.on("close", function(ex) {
done();
});
rd.pipe(str).pipe(wr);
function done(err) {
if (!cbCalled) {
console.log('done');
cb && cb(err);
cbCalled = true;
}
}
}
copyFile(_source,_target);
update: a fast (with detailed progress!) C version is implemented here: https://github.com/MidnightCommander/mc/blob/master/src/filemanager/file.c#L1480. Seems the best place to go from :-)

One aspect that may slow down the process is related to console.log. Take a look into this code:
const fs = require('fs');
const sourceFile = 'large.exe'
const destFile = 'large_copy.exe'
console.time('copying')
fs.stat(sourceFile, function(err, stat){
const filesize = stat.size
let bytesCopied = 0
const readStream = fs.createReadStream(sourceFile)
readStream.on('data', function(buffer){
bytesCopied+= buffer.length
let porcentage = ((bytesCopied/filesize)*100).toFixed(2)
console.log(porcentage+'%') // run once with this and later with this line commented
})
readStream.on('end', function(){
console.timeEnd('copying')
})
readStream.pipe(fs.createWriteStream(destFile));
})
Here are the execution times copying a 400mb file:
with console.log: 692.950ms
without console.log: 382.540ms

cpy and cp-file both support progress reporting

I have the same issue. I want to copy large files as fast as possible and want progress information. I created a test utility that tests the different copy methods:
https://www.npmjs.com/package/copy-speed-test
You can run it simply with:
npx copy-speed-test --source someFile.zip --destination someNonExistentFolder
It does a native copy using child_process.exec(), a copy file using fs.copyFile and it uses createReadStream with a variety of different buffer sizes (you can change buffer sizes by passing them on the command line. run npx copy-speed-test -h for more info.
Some things I learnt:
fs.copyFile is just as fast as native
you can get quite inconsistent results on all these methods, particularly when copying from and to the same disc and with SSDs
if using a large buffer then createReadStream is nearly as good as the other methods
if you use a very large buffer then the progress is not very accurate.
The last point is because the progress is based on the read stream, not the write stream. if copying a 1.5GB file and your buffer is 1GB then the progress immediately jumps to 66% then jumps to 100% and you then have to wait whilst the write stream finishes writing. I don't think that you can display the progress of the write stream.
If you have the same issue I would recommend that you run these tests with similar file sizes to what you will be dealing with and across similar media. My end use case is copying a file from an SD card plugged into a raspberry pi and copied across a network to a NAS so that's what I was the scenario that I ran the tests for.
I hope someone other than me finds it useful!

I solved a similar problem (using Node v8 or v10) by changing the buffer size. I think the default buffer size is around 16kb, which fills and empties quickly but requires a full cycle around the event loop for each operation. I changed the buffer to 1MB and writing a 2GB image fell from taking around 30 minutes to 5, which sounds similar to what you are seeing. My image was also decompressed on the fly, which possibly exacerbated the problem. Documentation on stream buffering has been in the manual since at least Node v6: https://nodejs.org/api/stream.html#stream_buffering
Here are the key code components you can use:
let gzSize = 1; // do not initialize divisors to 0
const hwm = { highWaterMark: 1024 * 1024 }
const inStream = fs.createReadStream( filepath, hwm );
// Capture the filesize for showing percentages
inStream.on( 'open', function fileOpen( fdin ) {
inStream.pause(); // wait for fstat before starting
fs.fstat( fdin, function( err, stats ) {
gzSize = stats.size;
// openTargetDevice does a complicated fopen() for the output.
// This could simply be inStream.resume()
openTargetDevice( gzSize, targetDeviceOpened );
});
});
inStream.on( 'data', function shaData( data ) {
const bytesRead = data.length;
offset += bytesRead;
console.log( `Read ${offset} of ${gzSize} bytes, ${Math.floor( offset * 100 / gzSize )}% ...` );
// Write to the output file, etc.
});
// Once the target is open, I convert the fd to a stream and resume the input.
// For the purpose of example, note only that the output has the same buffer size.
function targetDeviceOpened( error, fd, device ) {
if( error ) return exitOnError( error );
const writeOpts = Object.assign( { fd }, hwm );
outStream = fs.createWriteStream( undefined, writeOpts );
outStream.on( 'open', function fileOpen( fdin ) {
// In a simpler structure, this is in the fstat() callback.
inStream.resume(); // we have the _input_ size, resume read
});
// [...]
}
I have not made any attempt to optimize these further; the result is similar to what I get on the commandline using 'dd' which is my benchmark.
I left in converting a file descriptor to a stream and using the pause/resume logic so you can see how these might be useful in more complicated situations than the simple fs.statSync() in your original post. Otherwise, this is simply adding the highWaterMark option to Tulio's answer.

Here is what I'm trying to use now, it copies 1 file with progress:
String.prototype.toHHMMSS = function () {
var sec_num = parseInt(this, 10); // don't forget the second param
var hours = Math.floor(sec_num / 3600);
var minutes = Math.floor((sec_num - (hours * 3600)) / 60);
var seconds = sec_num - (hours * 3600) - (minutes * 60);
if (hours < 10) {hours = "0"+hours;}
if (minutes < 10) {minutes = "0"+minutes;}
if (seconds < 10) {seconds = "0"+seconds;}
return hours+':'+minutes+':'+seconds;
}
var purefile="20200811140938_0002.MP4";
var filename="/sourceDir"+purefile;
var output="/destinationDir"+purefile;
var progress = require('progress-stream');
var fs = require('fs');
const convertBytes = function(bytes) {
const sizes = ["Bytes", "KB", "MB", "GB", "TB"]
if (bytes == 0) {
return "n/a"
}
const i = parseInt(Math.floor(Math.log(bytes) / Math.log(1024)))
if (i == 0) {
return bytes + " " + sizes[i]
}
return (bytes / Math.pow(1024, i)).toFixed(1) + " " + sizes[i]
}
var copiedFileSize = fs.statSync(filename).size;
var str = progress({
length: copiedFileSize, // length(integer) - If you already know the length of the stream, then you can set it. Defaults to 0.
time: 200, // time(integer) - Sets how often progress events are emitted in ms. If omitted then the default is to do so every time a chunk is received.
speed: 1, // speed(integer) - Sets how long the speedometer needs to calculate the speed. Defaults to 5 sec.
// drain: true // drain(boolean) - In case you don't want to include a readstream after progress-stream, set to true to drain automatically. Defaults to false.
// transferred: false// transferred(integer) - If you want to set the size of previously downloaded data. Useful for a resumed download.
});
/*
{
percentage: 9.05,
transferred: 949624,
length: 10485760,
remaining: 9536136,
eta: 42,
runtime: 3,
delta: 295396,
speed: 949624
}
*/
str.on('progress', function(progress) {
console.log(progress.percentage+'%');
console.log('eltelt: '+progress.runtime.toString().toHHMMSS() + 's / hátra: ' + progress.eta.toString().toHHMMSS()+'s');
console.log(convertBytes(progress.speed)+"/s"+' '+progress.speed);
});
//const hwm = { highWaterMark: 1024 * 1024 } ;
var hrstart = process.hrtime(); // measure the copy time
var rs=fs.createReadStream(filename)
.pipe(str)
.pipe(fs.createWriteStream(output, {emitClose: true}).on("close", () => {
var hrend = process.hrtime(hrstart);
var timeInMs = (hrend[0]* 1000000000 + hrend[1]) / 1000000000;
var finalSpeed=convertBytes(copiedFileSize/timeInMs);
console.log('Done: file copy: '+ finalSpeed+"/s");
console.info('Execution time (hr): %ds %dms', hrend[0], hrend[1] / 1000000);
}) );

Refer to https://www.npmjs.com/package/fsprogress.
With that package, you can track progress while you are copying or moving files. The progress tracking is event and method call based so its very convenient to use.
You can provide options to do a lot of things. eg. total number of file for concurrent operation, chunk size to read from a file at a time.
It was tested for single file upto 17GB and directories up to i dont really remember but it was pretty large. And also :D, it is safe to use for large file(s).
So, go ahead and have a look at it whether it matches your expectations or if it is what you are looking for :D

Is there a Node.js console.log length limit?

Is there a limit the length of console.log output in Node.js? The following prints numbers up to 56462, then stops. This came up because we were returning datasets from MySQL and the output would just quit after 327k characters.
var out = "";
for (i = 0; i < 100000; i++) {
out += " " + i;
}
console.log(out);
The string itself seems fine, as this returns the last few numbers up to 99999:
console.log(out.substring(out.length - 23));
Returns:
99996 99997 99998 99999
This is using Node v0.6.14.

Have you tried writing that much on a machine with more memory?
According to Node source code console is writing into a stream: https://github.com/joyent/node/blob/cfcb1de130867197cbc9c6012b7e84e08e53d032/lib/console.js#L55
And streams may buffer the data into memory: http://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback
So if you put reeeaally a lot of data into a stream, you may hit the memory ceiling.
I'd recommend you split up your data and feed it into process.stdout.write method, here's an example: http://nodejs.org/api/stream.html#stream_event_drain

I would recommend using output to file when using node > 6.0
const output = fs.createWriteStream('./stdout.log');
const errorOutput = fs.createWriteStream('./stderr.log');
// custom simple logger
const logger = new Console(output, errorOutput);
// use it like console
var count = 5;
logger.log('count: %d', count);
// in stdout.log: count 5

Is http.ServerResponse.write() blocking?

Is it possible to write non-blocking response.write? I've written a simple test to see if other clients can connect while one downloads a file:
var connect = require('connect');
var longString = 'a';
for (var i = 0; i < 29; i++) { // 512 MiB
longString += longString;
}
console.log(longString.length)
function download(request, response) {
response.setHeader("Content-Length", longString.length);
response.setHeader("Content-Type", "application/force-download");
response.setHeader("Content-Disposition", 'attachment; filename="file"');
response.write(longString);
response.end();
}
var app = connect().use(download);
connect.createServer(app).listen(80);
And it seems like write is blocking!
Am I doing something wrong?
Update So, it doesn't block and it blocks in the same time. It doesn't block in the sense that two files can be downloaded simultaneously. And it blocks in the sense that creating a buffer is a long operation.

Any processing done strictly in JavaScript will block. response.write(), at least as of v0.8, is no exception to this:
The first time response.write() is called, it will send the buffered header information and the first body to the client. The second time response.write() is called, Node assumes you're going to be streaming data, and sends that separately. That is, the response is buffered up to the first chunk of body.
Returns true if the entire data was flushed successfully to the kernel buffer. Returns false if all or part of the data was queued in user memory. 'drain' will be emitted when the buffer is again free.
What may save some time is to convert longString to Buffer before attempting to write() it, since the conversion will occur anyways:
var longString = 'a';
for (...) { ... }
longString = new Buffer(longString);
But, it would probably be better to stream the various chunks of longString rather than all-at-once (Note: Streams are changing in v0.10):
var longString = 'a',
chunkCount = Math.pow(2, 29),
bufferSize = Buffer.byteLength(longString),
longBuffer = new Buffer(longString);
function download(request, response) {
var current = 0;
response.setHeader("Content-Length", bufferSize * chunkCount);
response.setHeader("Content-Type", "application/force-download");
response.setHeader("Content-Disposition", 'attachment; filename="file"');
function writeChunk() {
if (current < chunkCount) {
current++;
if (response.write(longBuffer)) {
process.nextTick(writeChunk);
} else {
response.once('drain', writeChunk);
}
} else {
response.end();
}
}
writeChunk();
}
And, if the eventual goal is to stream a file from disk, this can be even easier with fs.createReadStream() and stream.pipe():
function download(request, response) {
// response.setHeader(...)
// ...
fs.createReadStream('./file-on-disk').pipe(response);
}

Nope, it does not block, I tried one from IE and other from firefox. I did IE first but still could download file from firefox first.
I tried for 1 MB (i < 20) it works the same just faster.
You should know that whatever longString you create requires memory allocation. Try to do it for i < 30 (on windows 7) and it will throw FATAL ERROR: JS Allocation failed - process out of memory.
It takes time for memory allocation/copying nothing else. Since it is a huge file, the response is time taking and your download looks like blocking. Try it yourself for smaller values (i < 20 or something)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string