Node.js read big file with fs.readFileSync

Node.js read big file with fs.readFileSync - node.js

I try to load big file (~6Gb) into memory with fs.readFileSync on the server with 96GB RAM.
The problem is it fails with the following error message
RangeError: Attempt to allocate Buffer larger than maximum size: 0x3fffffff bytes
Unfortunately I didn't find how it is possible to increase Buffer, it seems like it's a constant.
How I can overcome this problem and load a big file with Node.js?
Thank you!

I have also with same problem when try to load 6.4G video file to create file hash.
I read whole file by fs.readFile() and it cause an error RangeError [ERR_FS_FILE_TOO_LARGE]. Then i use stream to do it:
let hash = crypto.createHash('md5'),
stream = fs.createReadStream(file_path);
stream.on('data', _buff => { hash.update(_buff, 'utf8'); });
stream.on('end', () => {
const hashCheckSum = hash.digest('hex');
// Save the hashCheckSum into database.
});
Hope it helped.

From a joyent FAQ:
What is the memory limit on a node process?
Currently, by default v8 has a memory limit of 512mb on 32-bit
systems, and 1gb on 64-bit systems. The limit can be raised by setting
--max_old_space_size to a maximum of ~1024 (~1 GiB) (32-bit) and ~1741 (~1.7GiB) (64-bit), but it is recommended that you split your single
process into several workers if you are hitting memory limits.
If you show more detail about what's in the file and what you're doing with it, we can probably offer some ideas on how to work with it in chunks. If it's pure data, then you probably want to be using a database and let the database handle getting things from disk as needed and manage the memory.
Here's a fairly recent discussion of the issue: https://code.google.com/p/v8/issues/detail?id=847
And, here's a blog post that claims you can edit the V8 source code and rebuilt node to remove the memory limit. Try this at your own discretion.

Related

rust application VSZ keeps growing

I have a rust application and it gets uploaded files from user and saves it to disk.
i am using axum and tokio. here is the part that i think have a problem.
in my request handler i get uploaded files and save it to disk like this:
while let Some(chunk) = field.next().await {
f.write_all(&chunk.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?)
.await
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
}
my app allocates memory for this operation but it looks like the memory does not go back to os. here is my docker top result after 5 days of server running:
i ran heaptrack and it said i have no leak.
i also used valgrind and it said i have no leak.

Node express, memory overflow on res.send

So I have an express server that tries to send back a lot of JSON data (625,763,181 chars if stringified) with res.send(data). This crashes the application with
`FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory``
(more info can be found on my last question on this topic)
Here's a few snapshots of the heap before and after it tries to send the data:
As you can see the memory goes from 28MB to 700MB and 97% of it is allocated as strings (and inspecting it I see that it's the JSON data stringified).
So my question is: Am I doing anything wrong here, should I compress the JSON somehow or use something other than res.send? or can I just not send that much data?
OBS: I know I can add --max-old-space-size=8192 but I'm trying to host the project on a server with limited space that runs out with or without that flag.
EDIT: I'm leaving this up because there might be optimisations I hadn't thought of, and I'm really interested. But I found a bug in my data causing it to grow in O(n*m) instead of just O(n+m), reducing the size from 625,763,181 chars to 131,700 chars.

Max writeable streams in Node JS

Prologue:
I'm experimenting with a "multi-tenant" flat file database system. On application start, prior to starting the server, each flat file database (which I'm calling a journal) is converted into a large javascript object in memory. From there the application starts it's service.
The application runtime behavior will be to serve requests from many different databases (one db per domain). All reads come from the in memory object alone. While any CRUDs both modify the in memory object as well as stream it to the journal.
Question:
If I have a N of these database objects in memory which are already loaded from flat files (let's say averaging around 1MB each), what kind of limitations would I be dealing with by having N number of write streams?

If you are using streams that have an open file handle behind them, then your limit for how many of them you can have open will likely be governed by the process limit on open file handles which will vary by OS and (in some cases) by how you have that OS configured. Each open stream also consumes some memory, both for the stream object and for read/write buffers associated with the stream.
If you are using some sort of custom stream that just reads/writes to memory, not to files, then there would be no file handle involved and you would just be limited by the memory consumed by the stream objects and their buffers. You could likely have thousands of these with no issues.
Some reference posts:
Node.js and open files limit in linux
How do I change the number of open files limit in Linux?
Check the open FD limit for a given process in Linux

node.js mongoose.js memory leak?

I'm creating bower package search site (everything is open sourced) and I hit the wall. I have some memory leak (or I think I have) and I honestly don't know why it is there.
You can download it and run on Your own, but simple hint will help me greatly.
I have narrowed it down to this function call here https://github.com/kamilbiela/bowereggs-backend/blob/master/main.js#L14 ( nest.fetchAndSave() ) which is all defined here: https://github.com/kamilbiela/bowereggs-backend/blob/master/lib/nest.js
Basically it downloads a package list from internet, Json.parse and inserts it into database, plus some when.js promises.
Running this function few times creates a 30mb of memory per run, that is not cleaned by garbage collector. Also note that this is my first "real" node.js project, so I'll be really grateful for any tip.

For anyone having the same problem:
https://github.com/c4milo/node-webkit-agent
After making few heap dumps I discovered that objects are garbage collected and the real memory usage isn't tied to it. I think that real memory usage is bigger because of using mongo and other non node.js stuff. Also real memory usage stabilizes at ~300mb, heap dumps at ~35mb.

Node JS, Highcharts Memory usage keeps climbing

I am looking after an app built with Node JS that's producing some interesting issues. It was originally running on Node JS v0.3.0 and I've since upgraded to v0.10.12. We're using Node JS to render charts on the server and we've noticed the memory usage keeps climbing chart after chart.
Q1: I've been monitoring the RES column in top for the Node JS process, is this correct or should I be monitoring something else?
I've been setting variables to null to try and reallocate memory back to the system resources (I read this somewhere as a solution) and it makes only a slight difference.
I've pushed the app all the way to 1.5gb and it then ceases to function and the process doesn't appear to die. No error messages which I found odd.
Q2: Is there anything else I can do?
Thanks
Steve

That is a massive jump in versions. You may want to share what code changes you may have made to get it working on latest stable. The api is not the same as back in v0.3, so that may be part of the problem.
If not then the issue you see it more likely from heap fragmentation than from an actual leak. In later v8 versions garbage collection is more liberal with cleanup to improve performance. (see http://code.google.com/p/chromium/issues/detail?id=112386 for some discussion on this)
You may try running the application with --max_old_space_size=32 which will limit the amount of memory v8 can use to around 32MB. Note the docs say "max size of the old generation", so it won't be exactly 32MB. Just around it, for lack of a better technical explanation.
Also you can track the amount of external memory usage with --trace_external_memory. This will allow you to know if external memory (i.e. Buffers) are being retained in your application.
You're note on the application hanging around 1.5GB would tell me you're probably on a 64-bit system. You only mentioned it ceases to function, but didn't note if the CPU is spinning during that time. Also since I don't have example code I'm not sure of what might be causing this to happen.
I'd try running on latest development (v0.11.3 at the time of this writing) and see if the issue is fixed. A lot of performance/memory enhancements are being worked on that may help your issue.

I guess you have somewhere a memory leak (in form of a closure?) that keeps the (not longer used?) diagrams(?) somewhere in memory.
The v8 sometimes needs a bit tweaking when it comes to > 1 GB of memory. Try out --noincremental_marking and/or --max_old_space_size=81920000 (if you have 8 GB available).
Check for more options with node --v8-options and go through the --trace*-parameters to find out what slows down/stops node.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Node.js read big file with fs.readFileSync - node.js

Related

rust application VSZ keeps growing

Node express, memory overflow on res.send

Max writeable streams in Node JS

node.js mongoose.js memory leak?

Node JS, Highcharts Memory usage keeps climbing

Categories

Resources