Can you require infinite modules in node? - node.js

I'm working on a project which needs to load specific module based on url, and there are a lot of modules, I'm wondering is it going to blow up the memory if you keep requiring modules? If it is, is there anyway to free up memory after each module is executed? Or is there anyway better than requiring modules? like fs.readFile then eval it?

You can load as many modules as you want, but the hardware limitations will still be there. If your memory is going to blow up - will depend on the memory consumption of each module, the tasks your program is doing, and the amount of RAM you have. If you keep on requiring modules just to test if it breaks, I am sure it will break after a certain number of modules are loaded into the memory.
When you use memory intensive modules, you can scope them i.e. require() them inside a function like: to make sure they are unloaded from memory after the operation is complete.
// this code does image manipulation
function resizeImage(filename, cb) {
require('gm').open(filename).size(function (err, size) {
console.log(size);
// do something else with the image
cb(null);
// gm is queued for garbage collection
// after this operation
});
}
fs.readFile and eval will not be better than require().
If you have all memory optimizations in place and still your RAM is not enough, the application will break, and you will need to scale up your machine.

Related

Node.js never Garbage Collects Buffers once they fall out of scope

I have researched a lot before posting this. This is a collection of all the things I have discovered about garbage collecting and at the end I'm asking for a better solution than the one I found.
Summary
I am hosting a Node.js app on Heroku, and when a particular endpoint of my server is hit, which uses a lot of buffers for image manipulation (using sharp, but this is a buffer issue, not a sharp one), it takes a very few requests for the buffers to occupy all the external and rss memory (used process.memoryUsage() for diagnostics) because even tho such variables have felt out of scope, or set to null, the OS never garbage collects them. The outcome is that external and rss memory grow exponentially and after a few requests my 512 dyno quota will be reached and my dyno will crash.
Now, I have made a minimal reproducible example, which shows that simply declaring a new buffer within a function, and calling that function 10 times, results in the buffers to never be garbage collected even when the functions have finished executing.
I'm writing to find a better way to make sure Node garbage collects the unreferenced buffers and to understand why it doesn't do so by default. The only solution I have found now is to call global.gc().
NOTE
In the minimal reproducible example I simply use a buffer, no external libraries, and it is enough to recreate the issue i am having with sharp because it's just an issue that node.js buffers have.
Also note, what increases is the external memory and rss. The arraybuffer memory, or heapused, or heaptotal are not affected. I have not found a way yet to trigger garbage collector for when a certain threshold of external memory is used.
Finally, my heroku server has been running, with no incoming requests, for up to 8 hours now, and the garbage collector hasn't yet cleared out the external memory and the RSS. so it is not a matter of waiting. same holds true for the minimal reproducible example, even with timers, the garbage collector doesn't do its job.
Minimal reproducible example - garbage collection is not triggered
This is the snippet of code that logs out the memory used after each function call, where the external memory and RSS memory keep building up without being freed:
async function getPrintFile() {
let buffer = Buffer.alloc(1000000);
return
}
async function test() {
for (let i = 0; i < 10; i++) {
console.log(process.memoryUsage())
await getPrintFile()
}
}
test()
console.log(process.memoryUsage())
Below I will share the endless list of things that I have tried in order to make sure those buffers get garbage collected, but without succeeding. First I'll share the only working solution that is not optimal.
Minimal reproducible example - garbage collection is triggered through code
To make this work, I have to call global.gb() in two parts of the code. For some weird reason, that I hope someone could explain to me, if I call global.gb() only at the end of the function that creates the buffer, or only just after calling that function, it won't garbage collect. However if I call global.gb() from both places, it will.
This is the only solution that has worked for me so far, but obviously it is not ideal as global.gb() is blocking.
async function getPrintFile() {
let buffer = Buffer.alloc(1000000);
global.gc()
return
}
async function test() {
for (let i = 0; i < 10; i++) {
console.log(process.memoryUsage())
await getPrintFile()
global.gc()
}
}
test()
console.log(process.memoryUsage())
What I have tried
I tried setting the buffers to null, logically if they are not referenced anymore, they should be garbage collected, but the garbage collector is very lazy apparently.
tried "delete buffer", or finding a way to resize or reallocate buffer memory, but it doesn't exist apparently in Node.js
tried buffer.fill(0), but that simply fills all the spaces with zeros, it doesn't resize it
installing a memory allocator like jemalloc on my heroku server, following this guide: Jemalloc Heroku Buildpack but it was pointless
running my script with: node --max-old-space-size=4 index.js however again, pointless, it didn't work even with the space size set at 4 MB.
I thought maybe it was because the functions were asynchronous, or I was using a loop, nope, I wrote 5 different version of that snippet, each and everyone of them had the same issue of the external memory growing like crazy.
Questions
By any remote chance, is there something super easy I'm missing, like a keyword, a function to use, that would easily sort this out? Or does anyone have anything that has worked for them so far? A library, a snippet, anything?
Why the hell do I have to call global.gb() TWO TIMES from within and outside the function for the garbace collector to work, and why is once not enough??
Why is it that the garbage collector for Buffers in Node.js is such dog s**t?
How is this not an issue? Literally, every single buffer ever declared on a running application NEVER gets garbage collected, and there is no way to find out if you are using buffers on your laptop because of the big memory, but as soon as you upload a snippet online, then by the time you realise it's probably too late. What's up with that?
I hope someone can give me a hand, as running process.gb() twice for each n type of request is not very efficient, and I'm not sure what repercussions it might have on my code.

Best place to require modules which are only used in closures

If, as part of a NodeJS file, there are different closures:
const Library2 = require('Library2'); // should it be here?
doSomething().then(()=>{
const Library1 = require('Library1'); // or here?
return Library1.doSomething();
}).then(()=>{
return Library2.doSomething();
}).then(...) // etc.
Would it be better to require Library1 and Library2 in the scopes in which they are used? Or at the top of the file like most do?
Does it make a difference to how much memory is consumed either way?
It is best to load all modules needed at server startup time.
When a module is loaded for the first time, it is loaded with blocking, synchronous I/O. It is bad to ever use blocking, synchronous I/O during the run-time for your server because that interferes with the ability of your server to handle multiple requests at once and reduces scalability.
Modules loaded with require() are cached so fortunately, trying to require() in a module in the middle of a request handler really only hurts performance the very first time the request is run.
But, it's still best to load any modules in your startup code and NOT during the run-time request-handling of your server.

NodeJS Memory Leak when using VM to execute untrusted code

I am using the NodeJS VM Module to run untrusted code safely. I have noticed a huge memory leak that takes about 10M of memory on each execution and does not release it. Eventually, my node process ends up using 500M+ of memory. After some digging, I traced the problem to the constant creation of VMs. To test my theory, I commented out the code that creates the VMs. Sure enough, the memory usage dropped dramatically. I then uncommented the code again and placed global.gc() calls strategically around the problem areas and ran node with the--expose-gc flag. This reduced my memory usage dramatically and retained the functionality.
Is there a better way of cleaning up VMs after I am done using it?
My next approach is the cache the vm containing the given unsafe code and reusing it if it I see the unsafe code again (Background:I am letting users write their own parsing function for blocks of text, thus, the unsafe code be executed frequently or executed once and never seen again).
Some reference code.
async.each(items,function(i,cb){
// Initialize context...
var context = vm.createContext(init);
// Execute untrusted code
var captured = vm.runInContext(parse, context);
// This dramatically improves the usage, but isn't
// part of the standard API
// global.gc();
// Return Result via a callback
cb(null,captured);
});
When I see this right this was fixed in v5.9.0, see this PR. It appears that in those cases both node core maintainer nor programmers can do much - that we pretty much have to wait for a upstream fix in v8.
So no, you can't do anything more about it. Catching this bug was good though!

NodeJS in-memory cache with memory pressure awareness

I'm coming from Java world, and there are plenty implementations of (local) in-memory caches. Moreover in Java world there are SoftReference and WeakReference, and they're, by definition, ideal for cache implementation(s).
I know that JavaScript does not have anything similar, so I'm wondering is it possible to have some sort of cache functionality which will delete/release (all) cached objects if there are "low memory pressure". So far, I know for lru-cache module, but it's implementation holds objects up to some number/size, which is nice, but not good enough because, naturally, you'd expect from cache to release objects if there are not enough memory.
Is it even possible to get some event in NodeJS from system when process is running low on memory?
Or maybe some library that could raise an event, something like:
var cmmm = require('cool_memory_management_module');
cmmm.on('low_memory', function(){
//signaling to clear cache entries
});
So far, I've found npm memwatch, and npm usage modules, but still not able to combine all those pieces together.
There are no WeakReferences or similar in JS yet, but shall come in ES6 (Version List).
So far, now you could build something that just checks every few seconds if the memory is running out and clean up your map.
setInterval(function() {
/* check if memory low and do something */
}, 2000).unref();

Memory leak in a node.js crawler application

For over a month I'm struggling with a very annoying memory leak issue and I have no clue how to solve it.
I'm writing a general purpose web crawler based on: http, async, cheerio and nano. From the very beginning I've been struggling with memory leak which was very difficult to isolate.
I know it's possible to do a heapdump and analyse it with Google Chrome but I can't understand the output. It's usually a bunch of meaningless strings and objects leading to some anonymous functions telling me exactly nothing (it might be lack of experience on my side).
Eventually I came to a conclusion that the library I had been using at the time (jQuery) had issues and I replaced it with Cheerio. I had an impression that Cheerio solved the problem but now I'm sure it only made it less dramatic.
You can find my code at: https://github.com/lukaszkujawa/node-web-crawler. I understand it might be lots of code to analyse but perhaps I'm doing something stupid which can be obvious strait away. I'm suspecting the main agent class which does HTTP requests https://github.com/lukaszkujawa/node-web-crawler/blob/master/webcrawler/agent.js from multiple "threads" (with async.queue).
If you would like to run the code it requires CouchDB and after npm install do:
$ node crawler.js -c conf.example.json
I know that Node doesn't go crazy with garbage collection but after 10min of heavy crawling used memory can go easily over 1GB.
(tested with v0.10.21 and v0.10.22)
For what it's worth, Node's memory usage will grow and grow even if your actual used memory isn't very large. This is for optimization on behalf of the V8 engine. To see your real memory usage (to determine if there is actually a memory leak) consider dropping this code (or something like it) into your application:
setInterval(function () {
if (typeof gc === 'function') {
gc();
}
applog.debug('Memory Usage', process.memoryUsage());
}, 60000);
Run node --expose-gc yourApp.js. Every minute there will be a log line indicating real memory usage immediately after a forced garbage collection. I've found that watching the output of this over time is a good way to determine if there is a leak.
If you do find a leak, the best way I've found to debug it is to eliminate large sections of your code at a time. If the leak goes away, put it back and eliminate a smaller section of it. Use this method to narrow it down to where the problem is occurring. Closures are a common source, but also check for anywhere else references may not be cleaned up. Many network applications will attach handlers for sockets that aren't immediately destroyed.

Resources