How to force nodejs to do garbage collection? [duplicate] - node.js

At startup, it seems my node.js app uses around 200MB of memory. If I leave it alone for a while, it shrinks to around 9MB.
Is it possible from within the app to:
Check how much memory the app is using ?
Request the garbage collector to run ?
The reason I ask is, I load a number of files from disk, which are processed temporarily. This probably causes the memory usage to spike. But I don't want to load more files until the GC runs, otherwise there is the risk that I will run out of memory.
Any suggestions ?

If you launch the node process with the --expose-gc flag, you can then call global.gc() to force node to run garbage collection. Keep in mind that all other execution within your node app is paused until GC completes, so don't use it too often or it will affect performance.
You might want to include a check when making GC calls from within your code so things don't go bad if node was run without the flag:
try {
if (global.gc) {global.gc();}
} catch (e) {
console.log("`node --expose-gc index.js`");
process.exit();
}

When you cannot pass the --expose-gc flag to your node process on start for any reason, you may try this:
import { setFlagsFromString } from 'v8';
import { runInNewContext } from 'vm';
setFlagsFromString('--expose_gc');
const gc = runInNewContext('gc'); // nocommit
gc();
Notes:
This worked for me in node 16.x
You may want to check process.memoryUsage() before and after running the gc
Use with care: Quote from the node docs v8.setFlagsFromString:
This method should be used with care. Changing settings after the VM has started may result in unpredictable behavior, including crashes and data loss; or it may simply do nothing.

One thing I would suggest, is that unless you need those files right at startup, try to load only when you need them.
EDIT: Refer to the post above.

Related

Node.js never Garbage Collects Buffers once they fall out of scope

I have researched a lot before posting this. This is a collection of all the things I have discovered about garbage collecting and at the end I'm asking for a better solution than the one I found.
Summary
I am hosting a Node.js app on Heroku, and when a particular endpoint of my server is hit, which uses a lot of buffers for image manipulation (using sharp, but this is a buffer issue, not a sharp one), it takes a very few requests for the buffers to occupy all the external and rss memory (used process.memoryUsage() for diagnostics) because even tho such variables have felt out of scope, or set to null, the OS never garbage collects them. The outcome is that external and rss memory grow exponentially and after a few requests my 512 dyno quota will be reached and my dyno will crash.
Now, I have made a minimal reproducible example, which shows that simply declaring a new buffer within a function, and calling that function 10 times, results in the buffers to never be garbage collected even when the functions have finished executing.
I'm writing to find a better way to make sure Node garbage collects the unreferenced buffers and to understand why it doesn't do so by default. The only solution I have found now is to call global.gc().
NOTE
In the minimal reproducible example I simply use a buffer, no external libraries, and it is enough to recreate the issue i am having with sharp because it's just an issue that node.js buffers have.
Also note, what increases is the external memory and rss. The arraybuffer memory, or heapused, or heaptotal are not affected. I have not found a way yet to trigger garbage collector for when a certain threshold of external memory is used.
Finally, my heroku server has been running, with no incoming requests, for up to 8 hours now, and the garbage collector hasn't yet cleared out the external memory and the RSS. so it is not a matter of waiting. same holds true for the minimal reproducible example, even with timers, the garbage collector doesn't do its job.
Minimal reproducible example - garbage collection is not triggered
This is the snippet of code that logs out the memory used after each function call, where the external memory and RSS memory keep building up without being freed:
async function getPrintFile() {
let buffer = Buffer.alloc(1000000);
return
}
async function test() {
for (let i = 0; i < 10; i++) {
console.log(process.memoryUsage())
await getPrintFile()
}
}
test()
console.log(process.memoryUsage())
Below I will share the endless list of things that I have tried in order to make sure those buffers get garbage collected, but without succeeding. First I'll share the only working solution that is not optimal.
Minimal reproducible example - garbage collection is triggered through code
To make this work, I have to call global.gb() in two parts of the code. For some weird reason, that I hope someone could explain to me, if I call global.gb() only at the end of the function that creates the buffer, or only just after calling that function, it won't garbage collect. However if I call global.gb() from both places, it will.
This is the only solution that has worked for me so far, but obviously it is not ideal as global.gb() is blocking.
async function getPrintFile() {
let buffer = Buffer.alloc(1000000);
global.gc()
return
}
async function test() {
for (let i = 0; i < 10; i++) {
console.log(process.memoryUsage())
await getPrintFile()
global.gc()
}
}
test()
console.log(process.memoryUsage())
What I have tried
I tried setting the buffers to null, logically if they are not referenced anymore, they should be garbage collected, but the garbage collector is very lazy apparently.
tried "delete buffer", or finding a way to resize or reallocate buffer memory, but it doesn't exist apparently in Node.js
tried buffer.fill(0), but that simply fills all the spaces with zeros, it doesn't resize it
installing a memory allocator like jemalloc on my heroku server, following this guide: Jemalloc Heroku Buildpack but it was pointless
running my script with: node --max-old-space-size=4 index.js however again, pointless, it didn't work even with the space size set at 4 MB.
I thought maybe it was because the functions were asynchronous, or I was using a loop, nope, I wrote 5 different version of that snippet, each and everyone of them had the same issue of the external memory growing like crazy.
Questions
By any remote chance, is there something super easy I'm missing, like a keyword, a function to use, that would easily sort this out? Or does anyone have anything that has worked for them so far? A library, a snippet, anything?
Why the hell do I have to call global.gb() TWO TIMES from within and outside the function for the garbace collector to work, and why is once not enough??
Why is it that the garbage collector for Buffers in Node.js is such dog s**t?
How is this not an issue? Literally, every single buffer ever declared on a running application NEVER gets garbage collected, and there is no way to find out if you are using buffers on your laptop because of the big memory, but as soon as you upload a snippet online, then by the time you realise it's probably too late. What's up with that?
I hope someone can give me a hand, as running process.gb() twice for each n type of request is not very efficient, and I'm not sure what repercussions it might have on my code.

Running garbage collection manually in node

I am using node and am considering manually running garbage collection in node. Is there any drawbacks on this? The reason I am doing this is that it looks like node is not running garbage collection frequently enough. Does anyone know how often V8 does its garbage collection routine in node?
Thanks!
I actually had the same problem running node on heroku with 1GB instances.
When running the node server on production traffic, the memory would grow constantly until it exceeded the memory limit, which caused it to run slowly.
This is probably caused by the app generating a lot of garbage, it mostly serves JSON API responses. But it wasn't a memory leak, just uncollected garbage.
It seems that node doesn't prioritize doing enough garbage collections on old object space for my app, so memory would constantly grow.
Running global.gc() manually (enabled with node --expose_gc) would reduce memory usage by 50MB every time and would pause the app for about 400ms.
What I ended up doing is running gc manually on a randomized schedule (so that heroku instances wouldn't do GC all at once). This decreased the memory usage and stopped the memory quota exceeded errors.
A simplified version would be something like this:
function scheduleGc() {
if (!global.gc) {
console.log('Garbage collection is not exposed');
return;
}
// schedule next gc within a random interval (e.g. 15-45 minutes)
// tweak this based on your app's memory usage
var nextMinutes = Math.random() * 30 + 15;
setTimeout(function(){
global.gc();
console.log('Manual gc', process.memoryUsage());
scheduleGc();
}, nextMinutes * 60 * 1000);
}
// call this in the startup script of your app (once per process)
scheduleGc();
You need to run your app with garbage collection exposed:
node --expose_gc app.js
I know this may be a bit of a tardy reply to help to OP, but i thought I would collaborate my recent experiences with Node JS memory allocation and garbage collection.
We are currently working on a node JS server running on a raspberry pi 3. Every so often it would crash due to running out of memory. I initially thought this was a memory leak, and after a week and a half of searching through my code and coming up with nothing, I thought the problem could have been exacerbated by the fact that Node JS allocates more memory than available on the Rpi3 for its processes before it does the GC.
I have been running new instances of my server with the following commands:
'node server.js --max-executable-size=96 --max-old-space-size=128 --max-semi-space-size=2'
This effectively limits the total amount of space that node is allowed to take up on the local machine and forces garbage collections to be done more frequently. Thus far, we are seeing a constant usage of memory and it confirms to me that my code was not leaking initially, but rather node was allocating more memory than possible.
EDIT: This link here outlines in more specific terms the issue I was dealing with.
-nodejs decrease v8 garbage collector memory usage
-https://github.com/nodejs/node/issues/2738
V8 Run garbage collection when he thinks it's useful. There is no fixed delay for that. You can read this article to learn about garbage collection V8: https://strongloop.com/strongblog/node-js-performance-garbage-collection/
Anyway, it's a bad idea to run manually the garbage collector in your project because it blocks completely the node process. So during the garbage collection, your program won't handle any requests.

Reducing Node memory usage when making HTTP requests in a loop

I've set up a simple loop to poll an IronMQ messaging system, and everything works fine... except that memory usage increases more and more until it finally stabilizes at over 250MB. I've read that it's normal for Node to use more memory over time when run in a (sort of) recursive loop like this, even when running setTimeout and doing nothing, but I still don't understand the exact mechanics behind this behavior, or whether there is any way to control it. When making HTTP requests within the loop, memory usage more than doubles.
The code is running on a Heroku worker with a limit of 512MB RAM, leaving no breathing room to use cluster to use the rest of the available CPU cores. The memory usage can increase slowly or extremely quickly, depending on the jobs that run after receiving the messages.
This is the simplest code that reproduces this.
(function loop() {
request.get('http://www.example.com', function(err, request, body) {
if (err) console.log(err);
setTimeout(loop, 200);
});
})();
I've tried many, many ways of restructuring this code to prevent memory from increasing so high, but nothing has made any changes. Only the received HTTP response seems to have any effect on the upper limit of RAM used.
Is there a way to rewrite this entirely, or am I stuck with V8's behavior? All examples I've found use the same basic structure for infinite async loops, from kue to the async library.

NodeJS Memory Leak when using VM to execute untrusted code

I am using the NodeJS VM Module to run untrusted code safely. I have noticed a huge memory leak that takes about 10M of memory on each execution and does not release it. Eventually, my node process ends up using 500M+ of memory. After some digging, I traced the problem to the constant creation of VMs. To test my theory, I commented out the code that creates the VMs. Sure enough, the memory usage dropped dramatically. I then uncommented the code again and placed global.gc() calls strategically around the problem areas and ran node with the--expose-gc flag. This reduced my memory usage dramatically and retained the functionality.
Is there a better way of cleaning up VMs after I am done using it?
My next approach is the cache the vm containing the given unsafe code and reusing it if it I see the unsafe code again (Background:I am letting users write their own parsing function for blocks of text, thus, the unsafe code be executed frequently or executed once and never seen again).
Some reference code.
async.each(items,function(i,cb){
// Initialize context...
var context = vm.createContext(init);
// Execute untrusted code
var captured = vm.runInContext(parse, context);
// This dramatically improves the usage, but isn't
// part of the standard API
// global.gc();
// Return Result via a callback
cb(null,captured);
});
When I see this right this was fixed in v5.9.0, see this PR. It appears that in those cases both node core maintainer nor programmers can do much - that we pretty much have to wait for a upstream fix in v8.
So no, you can't do anything more about it. Catching this bug was good though!

Memory leak in a node.js crawler application

For over a month I'm struggling with a very annoying memory leak issue and I have no clue how to solve it.
I'm writing a general purpose web crawler based on: http, async, cheerio and nano. From the very beginning I've been struggling with memory leak which was very difficult to isolate.
I know it's possible to do a heapdump and analyse it with Google Chrome but I can't understand the output. It's usually a bunch of meaningless strings and objects leading to some anonymous functions telling me exactly nothing (it might be lack of experience on my side).
Eventually I came to a conclusion that the library I had been using at the time (jQuery) had issues and I replaced it with Cheerio. I had an impression that Cheerio solved the problem but now I'm sure it only made it less dramatic.
You can find my code at: https://github.com/lukaszkujawa/node-web-crawler. I understand it might be lots of code to analyse but perhaps I'm doing something stupid which can be obvious strait away. I'm suspecting the main agent class which does HTTP requests https://github.com/lukaszkujawa/node-web-crawler/blob/master/webcrawler/agent.js from multiple "threads" (with async.queue).
If you would like to run the code it requires CouchDB and after npm install do:
$ node crawler.js -c conf.example.json
I know that Node doesn't go crazy with garbage collection but after 10min of heavy crawling used memory can go easily over 1GB.
(tested with v0.10.21 and v0.10.22)
For what it's worth, Node's memory usage will grow and grow even if your actual used memory isn't very large. This is for optimization on behalf of the V8 engine. To see your real memory usage (to determine if there is actually a memory leak) consider dropping this code (or something like it) into your application:
setInterval(function () {
if (typeof gc === 'function') {
gc();
}
applog.debug('Memory Usage', process.memoryUsage());
}, 60000);
Run node --expose-gc yourApp.js. Every minute there will be a log line indicating real memory usage immediately after a forced garbage collection. I've found that watching the output of this over time is a good way to determine if there is a leak.
If you do find a leak, the best way I've found to debug it is to eliminate large sections of your code at a time. If the leak goes away, put it back and eliminate a smaller section of it. Use this method to narrow it down to where the problem is occurring. Closures are a common source, but also check for anywhere else references may not be cleaned up. Many network applications will attach handlers for sockets that aren't immediately destroyed.

Resources