Running garbage collection manually in node - node.js

I am using node and am considering manually running garbage collection in node. Is there any drawbacks on this? The reason I am doing this is that it looks like node is not running garbage collection frequently enough. Does anyone know how often V8 does its garbage collection routine in node?
Thanks!

I actually had the same problem running node on heroku with 1GB instances.
When running the node server on production traffic, the memory would grow constantly until it exceeded the memory limit, which caused it to run slowly.
This is probably caused by the app generating a lot of garbage, it mostly serves JSON API responses. But it wasn't a memory leak, just uncollected garbage.
It seems that node doesn't prioritize doing enough garbage collections on old object space for my app, so memory would constantly grow.
Running global.gc() manually (enabled with node --expose_gc) would reduce memory usage by 50MB every time and would pause the app for about 400ms.
What I ended up doing is running gc manually on a randomized schedule (so that heroku instances wouldn't do GC all at once). This decreased the memory usage and stopped the memory quota exceeded errors.
A simplified version would be something like this:
function scheduleGc() {
if (!global.gc) {
console.log('Garbage collection is not exposed');
return;
}
// schedule next gc within a random interval (e.g. 15-45 minutes)
// tweak this based on your app's memory usage
var nextMinutes = Math.random() * 30 + 15;
setTimeout(function(){
global.gc();
console.log('Manual gc', process.memoryUsage());
scheduleGc();
}, nextMinutes * 60 * 1000);
}
// call this in the startup script of your app (once per process)
scheduleGc();
You need to run your app with garbage collection exposed:
node --expose_gc app.js

I know this may be a bit of a tardy reply to help to OP, but i thought I would collaborate my recent experiences with Node JS memory allocation and garbage collection.
We are currently working on a node JS server running on a raspberry pi 3. Every so often it would crash due to running out of memory. I initially thought this was a memory leak, and after a week and a half of searching through my code and coming up with nothing, I thought the problem could have been exacerbated by the fact that Node JS allocates more memory than available on the Rpi3 for its processes before it does the GC.
I have been running new instances of my server with the following commands:
'node server.js --max-executable-size=96 --max-old-space-size=128 --max-semi-space-size=2'
This effectively limits the total amount of space that node is allowed to take up on the local machine and forces garbage collections to be done more frequently. Thus far, we are seeing a constant usage of memory and it confirms to me that my code was not leaking initially, but rather node was allocating more memory than possible.
EDIT: This link here outlines in more specific terms the issue I was dealing with.
-nodejs decrease v8 garbage collector memory usage
-https://github.com/nodejs/node/issues/2738

V8 Run garbage collection when he thinks it's useful. There is no fixed delay for that. You can read this article to learn about garbage collection V8: https://strongloop.com/strongblog/node-js-performance-garbage-collection/
Anyway, it's a bad idea to run manually the garbage collector in your project because it blocks completely the node process. So during the garbage collection, your program won't handle any requests.

Related

How to force nodejs to do garbage collection? [duplicate]

At startup, it seems my node.js app uses around 200MB of memory. If I leave it alone for a while, it shrinks to around 9MB.
Is it possible from within the app to:
Check how much memory the app is using ?
Request the garbage collector to run ?
The reason I ask is, I load a number of files from disk, which are processed temporarily. This probably causes the memory usage to spike. But I don't want to load more files until the GC runs, otherwise there is the risk that I will run out of memory.
Any suggestions ?
If you launch the node process with the --expose-gc flag, you can then call global.gc() to force node to run garbage collection. Keep in mind that all other execution within your node app is paused until GC completes, so don't use it too often or it will affect performance.
You might want to include a check when making GC calls from within your code so things don't go bad if node was run without the flag:
try {
if (global.gc) {global.gc();}
} catch (e) {
console.log("`node --expose-gc index.js`");
process.exit();
}
When you cannot pass the --expose-gc flag to your node process on start for any reason, you may try this:
import { setFlagsFromString } from 'v8';
import { runInNewContext } from 'vm';
setFlagsFromString('--expose_gc');
const gc = runInNewContext('gc'); // nocommit
gc();
Notes:
This worked for me in node 16.x
You may want to check process.memoryUsage() before and after running the gc
Use with care: Quote from the node docs v8.setFlagsFromString:
This method should be used with care. Changing settings after the VM has started may result in unpredictable behavior, including crashes and data loss; or it may simply do nothing.
One thing I would suggest, is that unless you need those files right at startup, try to load only when you need them.
EDIT: Refer to the post above.

Node.js never Garbage Collects Buffers once they fall out of scope

I have researched a lot before posting this. This is a collection of all the things I have discovered about garbage collecting and at the end I'm asking for a better solution than the one I found.
Summary
I am hosting a Node.js app on Heroku, and when a particular endpoint of my server is hit, which uses a lot of buffers for image manipulation (using sharp, but this is a buffer issue, not a sharp one), it takes a very few requests for the buffers to occupy all the external and rss memory (used process.memoryUsage() for diagnostics) because even tho such variables have felt out of scope, or set to null, the OS never garbage collects them. The outcome is that external and rss memory grow exponentially and after a few requests my 512 dyno quota will be reached and my dyno will crash.
Now, I have made a minimal reproducible example, which shows that simply declaring a new buffer within a function, and calling that function 10 times, results in the buffers to never be garbage collected even when the functions have finished executing.
I'm writing to find a better way to make sure Node garbage collects the unreferenced buffers and to understand why it doesn't do so by default. The only solution I have found now is to call global.gc().
NOTE
In the minimal reproducible example I simply use a buffer, no external libraries, and it is enough to recreate the issue i am having with sharp because it's just an issue that node.js buffers have.
Also note, what increases is the external memory and rss. The arraybuffer memory, or heapused, or heaptotal are not affected. I have not found a way yet to trigger garbage collector for when a certain threshold of external memory is used.
Finally, my heroku server has been running, with no incoming requests, for up to 8 hours now, and the garbage collector hasn't yet cleared out the external memory and the RSS. so it is not a matter of waiting. same holds true for the minimal reproducible example, even with timers, the garbage collector doesn't do its job.
Minimal reproducible example - garbage collection is not triggered
This is the snippet of code that logs out the memory used after each function call, where the external memory and RSS memory keep building up without being freed:
async function getPrintFile() {
let buffer = Buffer.alloc(1000000);
return
}
async function test() {
for (let i = 0; i < 10; i++) {
console.log(process.memoryUsage())
await getPrintFile()
}
}
test()
console.log(process.memoryUsage())
Below I will share the endless list of things that I have tried in order to make sure those buffers get garbage collected, but without succeeding. First I'll share the only working solution that is not optimal.
Minimal reproducible example - garbage collection is triggered through code
To make this work, I have to call global.gb() in two parts of the code. For some weird reason, that I hope someone could explain to me, if I call global.gb() only at the end of the function that creates the buffer, or only just after calling that function, it won't garbage collect. However if I call global.gb() from both places, it will.
This is the only solution that has worked for me so far, but obviously it is not ideal as global.gb() is blocking.
async function getPrintFile() {
let buffer = Buffer.alloc(1000000);
global.gc()
return
}
async function test() {
for (let i = 0; i < 10; i++) {
console.log(process.memoryUsage())
await getPrintFile()
global.gc()
}
}
test()
console.log(process.memoryUsage())
What I have tried
I tried setting the buffers to null, logically if they are not referenced anymore, they should be garbage collected, but the garbage collector is very lazy apparently.
tried "delete buffer", or finding a way to resize or reallocate buffer memory, but it doesn't exist apparently in Node.js
tried buffer.fill(0), but that simply fills all the spaces with zeros, it doesn't resize it
installing a memory allocator like jemalloc on my heroku server, following this guide: Jemalloc Heroku Buildpack but it was pointless
running my script with: node --max-old-space-size=4 index.js however again, pointless, it didn't work even with the space size set at 4 MB.
I thought maybe it was because the functions were asynchronous, or I was using a loop, nope, I wrote 5 different version of that snippet, each and everyone of them had the same issue of the external memory growing like crazy.
Questions
By any remote chance, is there something super easy I'm missing, like a keyword, a function to use, that would easily sort this out? Or does anyone have anything that has worked for them so far? A library, a snippet, anything?
Why the hell do I have to call global.gb() TWO TIMES from within and outside the function for the garbace collector to work, and why is once not enough??
Why is it that the garbage collector for Buffers in Node.js is such dog s**t?
How is this not an issue? Literally, every single buffer ever declared on a running application NEVER gets garbage collected, and there is no way to find out if you are using buffers on your laptop because of the big memory, but as soon as you upload a snippet online, then by the time you realise it's probably too late. What's up with that?
I hope someone can give me a hand, as running process.gb() twice for each n type of request is not very efficient, and I'm not sure what repercussions it might have on my code.

How to use a lot of memory in node.js?

I am working on a system designed to keep processes behaving properly and I need to test its memory usage kill switch. I am yet to find a way to quickly and efficiently use a lot of system memory quickly with node.js. I have tried using arrays but it seems as if the garbage collector compresses them. Basically, I need to deliberately cause a fast memory leak.
EDIT: I need to reach around 4 gigabytes of used system memory by one process.
If garbage collector is compressing your array, maybe you are pushing always the same value inside it.
you should try shomething like
const a = []
for (let i = 0; i < 9000000000; i++)
a.push(i)
You also should run your process with a flag to increase memory limit. Note that with a limit of 9B, your array should be bigger than 4GB and your process will exit with non zero status code.
node --max_old_space_size=4096 yourscript.js

nodeJS when and how is the heap memory freed?

In order to understand the memory usage pattern of NodeJS's V8 engine, I wrote a simple web program as shown below:
contents of server.js:
var http = require("http");
var server = http.createServer(function(req, res) {
res.write("Hello World");
res.end();
});
server.listen(3000);
When the program is launched using node server.js, the initial memory snapshot is as below:
After I kept making multiple URL hits to this server, I could see a pattern of increased heap usage. To be more precise, for every 6 or 7 hits, there is an increase of 4K. I kept repeating the hits continuously for about 2 minutes, then this is the snapshot.
I didn't see any eventual decrease in heap usage, even if I kept it idle without load.
My question is:
Is this a normal behavior, or there is a memory leak in nodeJS?
Or, am I understanding or interpreting it incorrectly?
Node uses V8 under the hood, so the answer to this question most likely applies:
How does V8 manage its heap?
The code appears to be valid, so to test you could write a small application to repeatedly call your api and then examine Node's memory while running. Use this to help detect possible leakage (if there is any over 5 consecutive runs of the garbage collector): http://www.nearform.com/nodecrunch/self-detect-memory-leak-node/

Memory leak in a node.js crawler application

For over a month I'm struggling with a very annoying memory leak issue and I have no clue how to solve it.
I'm writing a general purpose web crawler based on: http, async, cheerio and nano. From the very beginning I've been struggling with memory leak which was very difficult to isolate.
I know it's possible to do a heapdump and analyse it with Google Chrome but I can't understand the output. It's usually a bunch of meaningless strings and objects leading to some anonymous functions telling me exactly nothing (it might be lack of experience on my side).
Eventually I came to a conclusion that the library I had been using at the time (jQuery) had issues and I replaced it with Cheerio. I had an impression that Cheerio solved the problem but now I'm sure it only made it less dramatic.
You can find my code at: https://github.com/lukaszkujawa/node-web-crawler. I understand it might be lots of code to analyse but perhaps I'm doing something stupid which can be obvious strait away. I'm suspecting the main agent class which does HTTP requests https://github.com/lukaszkujawa/node-web-crawler/blob/master/webcrawler/agent.js from multiple "threads" (with async.queue).
If you would like to run the code it requires CouchDB and after npm install do:
$ node crawler.js -c conf.example.json
I know that Node doesn't go crazy with garbage collection but after 10min of heavy crawling used memory can go easily over 1GB.
(tested with v0.10.21 and v0.10.22)
For what it's worth, Node's memory usage will grow and grow even if your actual used memory isn't very large. This is for optimization on behalf of the V8 engine. To see your real memory usage (to determine if there is actually a memory leak) consider dropping this code (or something like it) into your application:
setInterval(function () {
if (typeof gc === 'function') {
gc();
}
applog.debug('Memory Usage', process.memoryUsage());
}, 60000);
Run node --expose-gc yourApp.js. Every minute there will be a log line indicating real memory usage immediately after a forced garbage collection. I've found that watching the output of this over time is a good way to determine if there is a leak.
If you do find a leak, the best way I've found to debug it is to eliminate large sections of your code at a time. If the leak goes away, put it back and eliminate a smaller section of it. Use this method to narrow it down to where the problem is occurring. Closures are a common source, but also check for anywhere else references may not be cleaned up. Many network applications will attach handlers for sockets that aren't immediately destroyed.

Resources