NodeJS in-memory cache with memory pressure awareness - node.js

I'm coming from Java world, and there are plenty implementations of (local) in-memory caches. Moreover in Java world there are SoftReference and WeakReference, and they're, by definition, ideal for cache implementation(s).
I know that JavaScript does not have anything similar, so I'm wondering is it possible to have some sort of cache functionality which will delete/release (all) cached objects if there are "low memory pressure". So far, I know for lru-cache module, but it's implementation holds objects up to some number/size, which is nice, but not good enough because, naturally, you'd expect from cache to release objects if there are not enough memory.
Is it even possible to get some event in NodeJS from system when process is running low on memory?
Or maybe some library that could raise an event, something like:
var cmmm = require('cool_memory_management_module');
cmmm.on('low_memory', function(){
//signaling to clear cache entries
});
So far, I've found npm memwatch, and npm usage modules, but still not able to combine all those pieces together.

There are no WeakReferences or similar in JS yet, but shall come in ES6 (Version List).
So far, now you could build something that just checks every few seconds if the memory is running out and clean up your map.
setInterval(function() {
/* check if memory low and do something */
}, 2000).unref();

Related

Why does v8 report duplicate module strings in heap in my jest tests?

In the process of upgrading node (16.1.x => 16.5.0), I observed that I'm getting OOM issues from jest. In troubleshooting, I'm periodically taking heap snapshots. I'm regularly seeing entries in "string" for module source (same shallow/retained size). In this example screenshot, you can see that the exact same module (React) is listed 2x. Sometimes, the module string is listed even 4x for any given source module.
Upon expansion, it says "system / Map", which suggests to me I think? that theres some v8 wide reference to this module string? That makes sense--maybe. node has a require cache, jest has a module cache, v8 and node i'd assume... share module references? The strings and compiled code buckets do increase regularly, but I expect them to get GC'd. In fact, I can see that many do--expansion of the items show the refs belonging to GC Roots. But I suspect something is holding on to these module references, and I fear it's not at the user level, but at the tooling level. This is somewhat evidenced by observation that only the node.js upgrade induces the OOM failure mode.
Why would my jest test have multiple instances of the same module (i am using --runInBand, so I don't expect multiple workers)
What tips would you offer to diagnose further?
I do show multiple VM Contexts, which I think makes sense--I suppose jest is running some test suites in some sort of isolation.
I do not have a reproduction--I am looking for discussion, best-know-methods, diagnostic ideas.
I can offer some thoughts:
"system / Map" does not mean "some v8 wide reference". "Map" is the internal name for "hidden class", which you may have heard of. The details don't even matter here; TL;DR: some internal thing, totally normal, not a sign of a problem.
Having several copies of the same string on the heap is also quite normal, because strings don't get deduplicated by default. So if you run some string-producing operation twice (such as: reading an external file), you'll get two copies of the string. I have no idea what jest does under the hood, but it's totally conceivable that running tests in parallel in mostly-isolated environments has a side effect of creating duplicate strings. That may be inefficient in a sense, but as long as they get GC'ed after a while, it's not really a problem.
If the specific hypothesis implied above (there are several tests in each file, and jest creates an in-memory copy of the entire file for each executing test) holds, then a possible mitigation might be to split your test files into smaller chunks (1.8MB is quite a lot for a single file). I don't have much confidence in this, but maybe it'd be easy for you to try it and see.
More generally: in the screenshot, there are 36MB of memory used by strings. That's far from being an OOM reason.
It might be insightful to measure the memory consumption of both Node versions. If, for example, it used to consume 4GB and now crashes when it reaches 2GB, that would indicate that the limit has changed. If it used to consume 2GB and now crashes when it reaches 4GB, that would imply that something major has changed. If it used to consume 1.98GB and now crashes when it reaches 2.0GB, then chances are something tiny has changed and you just happened to get lucky with the old version.
Until contradicting evidence turns up, I would operate under the assumption that the resource consumption is normal and simply must be accommodated. You could try giving Node more memory, or reducing the number of parallel test executions.
This seems like a known issue of Jest at Node JS v16.11.0+ and has already been reported to GitHub.

How can I automatically test for memory leaks in Node?

I have some code in a library that has in the past leaked badly, and I would like to add regression tests to avoid that in the future. I understand how to find memory leaks manually, by looking at memory usage profiles or Valgrind, but I have had trouble writing automatic tests for them.
I tried using global.gc() followed by process.memoryUsage() after running the operation I was checking for leaks, then doing this repeatedly to try to establish a linear relationship between number of operations and memory usage, but there seems to be noise in the memory usage numbers that makes this hard to measure accurately.
So, my question is this: is there an effective way to write a test in Node that consistently passes when an operation leaks memory, and fails when it does not leak memory?
One wrinkle that I should mention is that the memory leaks were occurring in a C++ addon, and some of the leaked memory was not managed by the Node VM, so I was measuring process.memoryUsage().rss.
Automating and logging information to test for memory leaks in node js.
There is a great module called memwatch-next.
npm install --save memwatch-next
Add to app.js:
const memwatch = require('memwatch-next');
// ...
memwatch.on('leak', (info) => {
// Some logging code...
console.error('Memory leak detected:\n', info);
});
This will allow you to automatically measure if there is a memory leak.
Now to put it to a test:
Good tool for this is Apache jMeter. More information here.
If you are using http you can use jMeter to soak test the application's end points.
SOAK testing is done to verify system's stability and performance characteristics over an extended period of time, its good when you are looking for memory leaks, connection leaks etc.
Continuous integration software:
Prior to deployment to production if you are using a software for continuous integration like Jenkins, you can make a Jenkins job to do this for you, it will test the application with parameters provided after the test will ether deploy the application or report that there is a memory leak. ( Depending on your Jenkins job configuration )
Hope it helps, update me on how it goes;
Good luck,
Given some arbitrary program, is it always possible to determine if it will ever terminate? The halting problem describes this. Consider the following program:
function collatz(n){
if(n==1)
return;
if(n%2==0)
return collatz(n/2);
else
return collatz(3*n+1);
}
The same idea can be applied to data in memory. It's not always possible to identify what memory isn't needed anymore and can thus be garbage collected. There is also the case of the program being designed to consume a lot of memory in some situation. The only known option is coming up with some heuristic like you have done, but it will most likely result in false positives and negatives. It may be easier to determine the root cause of the leak so it can be corrected.

How to Synchronize object between multiple instance of Node Js application

Is there any to lock any object in Node JS application.
Is there are multiple instance for application is available some function shouldnt run concurrent. If instance A function is completed, it should unlock that object/key or some identifier and B instance of application should check if its unlock it should run some function.
Any Object or Key can be used for identifying the locking and unlocking the function.
How to do that in NodeJS application which have multiple instances.
As mentioned above Redis may be your answer, however, it really depends on the resources available to you. There are some other possibilities less complicated and certainly less powerful which may also do the trick.
node-cache may also do the trick, if you set it up correctly. It is not any where near as powerful as Redis, but on the bright side it does not require as much setup and interaction with your environment.
So there is Redis and node-cache for memory locks. I should mention there are quite a few NPM packages which do the cache. Depends on what you need, and how intricate your cache needs to be.
However, there are less elegant ways to do what you want, though less elegant is not necessarily worse.
You could use a JSON file based system and hold locks on the files for a TTL. lockfile or proper-lockfile will accomplish the task. You can read the information from the files when needed, delete when required, give them a TTL. Basically a cache system to disk.
The memory system is obviously faster. The file system requires just as much planning in your code as the memory system.
There is yet another way. This is possibly the most dangerous one, and you would have to think long and hard on the consequences in terms of security and need.
Node.js has its own process.env. As most know this holds the system global variables available to all by simply writing process.env.foo where foo would have been declared as a global system variable. A package such as .dotenv allows you to add to your system variables by way of a .env text file. Thus if you put in that file sam=mongoDB, then in your code where you write process.env.sam it will be interpreted as mongoDB. Tons of system wide variables can be set up here.
So what good does that do, you may ask? Well these are system wide variables, and they can be changed in mid-flight. So if you need to lock the variables and then change them it is a simple manner to do it with. Beware though of the gotcha here. Once the system goes down, or all processes stop, and is started again, your environment variables will return to the default in the .env file.
Additionally, unless you are running a system which is somewhat safe on AWS or Azure etc. I would not feel secure in having my .env file open to the world. There is a way around this one too. You can use a hash to encrypt all variables and put the hash in the file. When you call it, decrypt before actually requesting use of the full variable.
There are probably many wore ways to lock and unlock, not the least of which is to use the native Node.js structure. Combine File System events together with Crypto. But this demands a much deeper level of understanding of the actual Node.js library and structures.
Hope some of this helped.
I strongly recommend Redis in your case.
There are several ways to create a application/process shared object, using locks is one of them, as you mentioned.
But they're just complicated. Unless you really need to do that yourself, Redis will be good enough. Atomic ops cross multiple process, transaction and so on.
Old thread but I didn't want to use redis so I made my own open source solution which utilizes websocket connections:
https://github.com/OneAndonlyFinbar/sync-cache

NodeJS Memory Leak when using VM to execute untrusted code

I am using the NodeJS VM Module to run untrusted code safely. I have noticed a huge memory leak that takes about 10M of memory on each execution and does not release it. Eventually, my node process ends up using 500M+ of memory. After some digging, I traced the problem to the constant creation of VMs. To test my theory, I commented out the code that creates the VMs. Sure enough, the memory usage dropped dramatically. I then uncommented the code again and placed global.gc() calls strategically around the problem areas and ran node with the--expose-gc flag. This reduced my memory usage dramatically and retained the functionality.
Is there a better way of cleaning up VMs after I am done using it?
My next approach is the cache the vm containing the given unsafe code and reusing it if it I see the unsafe code again (Background:I am letting users write their own parsing function for blocks of text, thus, the unsafe code be executed frequently or executed once and never seen again).
Some reference code.
async.each(items,function(i,cb){
// Initialize context...
var context = vm.createContext(init);
// Execute untrusted code
var captured = vm.runInContext(parse, context);
// This dramatically improves the usage, but isn't
// part of the standard API
// global.gc();
// Return Result via a callback
cb(null,captured);
});
When I see this right this was fixed in v5.9.0, see this PR. It appears that in those cases both node core maintainer nor programmers can do much - that we pretty much have to wait for a upstream fix in v8.
So no, you can't do anything more about it. Catching this bug was good though!

Memory leak in a node.js crawler application

For over a month I'm struggling with a very annoying memory leak issue and I have no clue how to solve it.
I'm writing a general purpose web crawler based on: http, async, cheerio and nano. From the very beginning I've been struggling with memory leak which was very difficult to isolate.
I know it's possible to do a heapdump and analyse it with Google Chrome but I can't understand the output. It's usually a bunch of meaningless strings and objects leading to some anonymous functions telling me exactly nothing (it might be lack of experience on my side).
Eventually I came to a conclusion that the library I had been using at the time (jQuery) had issues and I replaced it with Cheerio. I had an impression that Cheerio solved the problem but now I'm sure it only made it less dramatic.
You can find my code at: https://github.com/lukaszkujawa/node-web-crawler. I understand it might be lots of code to analyse but perhaps I'm doing something stupid which can be obvious strait away. I'm suspecting the main agent class which does HTTP requests https://github.com/lukaszkujawa/node-web-crawler/blob/master/webcrawler/agent.js from multiple "threads" (with async.queue).
If you would like to run the code it requires CouchDB and after npm install do:
$ node crawler.js -c conf.example.json
I know that Node doesn't go crazy with garbage collection but after 10min of heavy crawling used memory can go easily over 1GB.
(tested with v0.10.21 and v0.10.22)
For what it's worth, Node's memory usage will grow and grow even if your actual used memory isn't very large. This is for optimization on behalf of the V8 engine. To see your real memory usage (to determine if there is actually a memory leak) consider dropping this code (or something like it) into your application:
setInterval(function () {
if (typeof gc === 'function') {
gc();
}
applog.debug('Memory Usage', process.memoryUsage());
}, 60000);
Run node --expose-gc yourApp.js. Every minute there will be a log line indicating real memory usage immediately after a forced garbage collection. I've found that watching the output of this over time is a good way to determine if there is a leak.
If you do find a leak, the best way I've found to debug it is to eliminate large sections of your code at a time. If the leak goes away, put it back and eliminate a smaller section of it. Use this method to narrow it down to where the problem is occurring. Closures are a common source, but also check for anywhere else references may not be cleaned up. Many network applications will attach handlers for sockets that aren't immediately destroyed.

Resources