v8 memory spike (rss) when defining more than 1000 function (does not reproduce when using --jitless) - node.js

I have a simple node app with 1 function that defines 1000+ functions inside it (without running them).
When I call this function (the wrapper) around 200 times the RSS memory of the process spikes from 100MB to 1000MB and immediately goes down. (The memory spike only happens after around 200~ calls, before that all the calls do not cause a memory spike, and all the calls after do not cause a memory spike)
This issue is happening to us in our node server in production, and I was able to reproduce it in a simple node app here:
https://github.com/gileck/node-v8-memory-issue
When I use --jitless pr --no-opt the issue does not happen (no spikes). but obviously we do not want to remove all the v8 optimizations in production.
This issue must be some kind of a specific v8 optimization, I tried a few other v8 flags but non of them fix the issue (only --jitless and --no-opt fix it)
Anyone knows which v8 optimization could cause this?
Update:
We found that --no-concurrent-recompilation fix this issue (No memory spikes at all).
but still, we can't explain it.
We are not sure why it happens and which code changes might fix it (without the flag).
As one of the answers suggests, moving all the 1000+ function definitions out of the main function will solve it, but then those functions will not be able to access the context of the main function which is why they are defined inside it.
Imagine that you have a server and you want to handle a request.
Obviously, The request handler is going to run many times as the server gets a lot of requests from the client.
Would you define functions inside the request handler (so you can access the request context in those functions) or define them outside of the request handler and pass the request context as a parameter to all of them? We chose the first option... what do you think?

anyone knows which v8 optimization could cause this?
Load Elimination.
I guess it's fair to say that any optimization could cause lots of memory consumption in pathological cases (such as: a nearly 14 MB monster of a function as input, wow!), but Load Elimination is what causes it in this particular case.
You can see for yourself when your run with --turbo-stats (and optionally --turbo-filter=foo to zoom in on just that function).
You can disable Load Elimination if you feel that you must. A preferable approach would probably be to reorganize your code somewhat: defining 2,000 functions is totally fine, but the function defining all these other functions probably doesn't need to be run in a loop long enough until it gets optimized? You'll avoid not only this particular issue, but get better efficiency in general, if you define functions only once each.
There may or may not be room for improving Load Elimination in Turbofan to be more efficient for huge inputs; that's a longer investigation and I'm not sure it's worth it (compared to working on other things that likely show up more frequently in practice).
I do want to emphasize for any future readers of this that disabling optimization(s) is not generally a good rule of thumb for improving performance (or anything else), on the contrary; nor are any other "secret" flags needed to unlock "secret" performance: the default configuration is very carefully optimized to give you what's (usually) best. It's a very rare special case that a particular optimization pass interacts badly with a particular code pattern in an input function.

Related

Why does v8 report duplicate module strings in heap in my jest tests?

In the process of upgrading node (16.1.x => 16.5.0), I observed that I'm getting OOM issues from jest. In troubleshooting, I'm periodically taking heap snapshots. I'm regularly seeing entries in "string" for module source (same shallow/retained size). In this example screenshot, you can see that the exact same module (React) is listed 2x. Sometimes, the module string is listed even 4x for any given source module.
Upon expansion, it says "system / Map", which suggests to me I think? that theres some v8 wide reference to this module string? That makes sense--maybe. node has a require cache, jest has a module cache, v8 and node i'd assume... share module references? The strings and compiled code buckets do increase regularly, but I expect them to get GC'd. In fact, I can see that many do--expansion of the items show the refs belonging to GC Roots. But I suspect something is holding on to these module references, and I fear it's not at the user level, but at the tooling level. This is somewhat evidenced by observation that only the node.js upgrade induces the OOM failure mode.
Why would my jest test have multiple instances of the same module (i am using --runInBand, so I don't expect multiple workers)
What tips would you offer to diagnose further?
I do show multiple VM Contexts, which I think makes sense--I suppose jest is running some test suites in some sort of isolation.
I do not have a reproduction--I am looking for discussion, best-know-methods, diagnostic ideas.
I can offer some thoughts:
"system / Map" does not mean "some v8 wide reference". "Map" is the internal name for "hidden class", which you may have heard of. The details don't even matter here; TL;DR: some internal thing, totally normal, not a sign of a problem.
Having several copies of the same string on the heap is also quite normal, because strings don't get deduplicated by default. So if you run some string-producing operation twice (such as: reading an external file), you'll get two copies of the string. I have no idea what jest does under the hood, but it's totally conceivable that running tests in parallel in mostly-isolated environments has a side effect of creating duplicate strings. That may be inefficient in a sense, but as long as they get GC'ed after a while, it's not really a problem.
If the specific hypothesis implied above (there are several tests in each file, and jest creates an in-memory copy of the entire file for each executing test) holds, then a possible mitigation might be to split your test files into smaller chunks (1.8MB is quite a lot for a single file). I don't have much confidence in this, but maybe it'd be easy for you to try it and see.
More generally: in the screenshot, there are 36MB of memory used by strings. That's far from being an OOM reason.
It might be insightful to measure the memory consumption of both Node versions. If, for example, it used to consume 4GB and now crashes when it reaches 2GB, that would indicate that the limit has changed. If it used to consume 2GB and now crashes when it reaches 4GB, that would imply that something major has changed. If it used to consume 1.98GB and now crashes when it reaches 2.0GB, then chances are something tiny has changed and you just happened to get lucky with the old version.
Until contradicting evidence turns up, I would operate under the assumption that the resource consumption is normal and simply must be accommodated. You could try giving Node more memory, or reducing the number of parallel test executions.
This seems like a known issue of Jest at Node JS v16.11.0+ and has already been reported to GitHub.

Node.js optimizing module for best performance

I'm writing a crawler module which is calling it-self recursively to download more and more links depending on a depth option parameter passed.
Besides that, I'm doing more tasks on the returned resources I've downloaded (enrich/change it depending on the configuration passed to the crawler). This process is going on recursively until it's done which might take a-lot of time (or not) depending on the configurations used.
I wish to optimize it to be as fast as possible and not to hinder on any Node.js application that will use it.I've set up an express server that one of its routes launch the crawler for a user defined (query string) host. After launching a few crawling sessions for different hosts, I've noticed that I can sometimes get real slow responses from other routes that only return simple text.The delay can be anywhere from a few milliseconds to something like 30 seconds, and it's seems to be happening at random times (well nothing is random but I can't pinpoint the cause).I've read an article of Jetbrains about CPU profiling using V8 profiler functionality that is integrated with Webstorm, but unfortunately it only shows on how to collect the information and how to view it, but it doesn't give me any hints on how to find such problems, so I'm pretty much stuck here.
Could anyone help me with this matter and guide me, any tips on what could hinder the express server that my crawler might do (A lot of recursive calls), or maybe how to find those hotspots I'm looking for and optimize them?
It's hard to say anything more specific on how to optimize code that is not shown, but I can give some advice that is relevant to the described situation.
One thing that comes to mind is that you may be running some blocking code. Never use deep recursion without using setTimeout or process.nextTick to break it up and give the event loop a chance to run once in a while.

Does node.js really not optimize calls to [].slice.call(arguments)?

In the bluebird docs, they have this as an anti-pattern that stops optimization.. They call it argument leaking,
function leaksArguments2() {
var args = [].slice.call(arguments);
}
I do this all the time in Node.js. Is this really a problem. And, if so, why?
Assume only the latest version of Node.js.
Disclaimer: I am the author of the wiki page
It's a problem if the containing function is called a lot (being hot). Functions that leak arguments are not supported by the optimizing compiler (crankshaft).
Normally when a function is hot, it will be optimized. However if the function contains unsupported features like leaking arguments, being a hot function doesn't help and it will continue running slow generic code.
The performance of an optimized function compared to an unoptimized one is huge. For example consider a function that adds 3 doubles together: http://jsperf.com/213213213 21x difference.
What if it added 6 doubles together? 29x difference Generally the more code the function has, the more severe the punishment is for that function to run in unoptimized mode.
For node.js stuff like this in general is actually a huge problem due to the fact that any cpu time completely blocks the server. Just by optimizing the url parser that is included in node core (my module is 30x faster in node's own benchmarks), improves the requests per second of mysql-express from 70K rps to 100K rps in a benchmark that queries a database.
Good news is that node core is aware of this
Is this really a problem
For application code, no. For almost any module/library code, no. For a library such as bluebird that is intended to be used pervasively throughout an entire codebase, yes. If you did this in a very hot function in your application, then maybe yes.
I don't know the details but I trust the bluebird authors as credible that accessing arguments in the ways described in the docs causes v8 to refuse to optimize the function, and thus it's something that the bluebird authors consider worth using a build-time macro to get the optimized version.
Just keep in mind the latency numbers that gave rise to node in the first place. If your application does useful things like talking to a database or the filesystem, then I/O will be your bottleneck and optimizing/caching/parallelizing those will pay vastly higher dividends than v8-level in-memory micro-optimizations such as above.

Memory leak in a node.js crawler application

For over a month I'm struggling with a very annoying memory leak issue and I have no clue how to solve it.
I'm writing a general purpose web crawler based on: http, async, cheerio and nano. From the very beginning I've been struggling with memory leak which was very difficult to isolate.
I know it's possible to do a heapdump and analyse it with Google Chrome but I can't understand the output. It's usually a bunch of meaningless strings and objects leading to some anonymous functions telling me exactly nothing (it might be lack of experience on my side).
Eventually I came to a conclusion that the library I had been using at the time (jQuery) had issues and I replaced it with Cheerio. I had an impression that Cheerio solved the problem but now I'm sure it only made it less dramatic.
You can find my code at: https://github.com/lukaszkujawa/node-web-crawler. I understand it might be lots of code to analyse but perhaps I'm doing something stupid which can be obvious strait away. I'm suspecting the main agent class which does HTTP requests https://github.com/lukaszkujawa/node-web-crawler/blob/master/webcrawler/agent.js from multiple "threads" (with async.queue).
If you would like to run the code it requires CouchDB and after npm install do:
$ node crawler.js -c conf.example.json
I know that Node doesn't go crazy with garbage collection but after 10min of heavy crawling used memory can go easily over 1GB.
(tested with v0.10.21 and v0.10.22)
For what it's worth, Node's memory usage will grow and grow even if your actual used memory isn't very large. This is for optimization on behalf of the V8 engine. To see your real memory usage (to determine if there is actually a memory leak) consider dropping this code (or something like it) into your application:
setInterval(function () {
if (typeof gc === 'function') {
gc();
}
applog.debug('Memory Usage', process.memoryUsage());
}, 60000);
Run node --expose-gc yourApp.js. Every minute there will be a log line indicating real memory usage immediately after a forced garbage collection. I've found that watching the output of this over time is a good way to determine if there is a leak.
If you do find a leak, the best way I've found to debug it is to eliminate large sections of your code at a time. If the leak goes away, put it back and eliminate a smaller section of it. Use this method to narrow it down to where the problem is occurring. Closures are a common source, but also check for anywhere else references may not be cleaned up. Many network applications will attach handlers for sockets that aren't immediately destroyed.

Limiting work in progress of parallel operations of a streamed resource

I've found myself recently using the SemaphoreSlim class to limit the work in progress of a parallelisable operation on a (large) streamed resource:
// The below code is an example of the structure of the code, there are some
// omissions around handling of tasks that do not run to completion that should be in production code
SemaphoreSlim semaphore = new SemaphoreSlim(Environment.ProcessorCount * someMagicNumber);
foreach (var result in StreamResults())
{
semaphore.Wait();
var task = DoWorkAsync(result).ContinueWith(t => semaphore.Release());
...
}
This is to avoid bringing too many results into memory and the program being unable to cope (generally evidenced via an OutOfMemoryException). Though the code works and is reasonably performant, it still feels ungainly. Notably the someMagicNumber multiplier, which although tuned via profiling, may not be as optimal as it could be and isn't resilient to changes to the implementation of DoWorkAsync.
In the same way that thread pooling can overcome the obstacle of scheduling many things for execution, I would like something that can overcome the obstacle of scheduling many things to be loaded into memory based on the resources that are available.
Since it is deterministically impossible to decide whether an OutOfMemoryException will occur, I appreciate that what I'm looking for may only be achievable via statistical means or even not at all, but I hope that I'm missing something.
Here I'd say that you're probably overthinking this problem. The consequences for overshooting are rather high (the program crashes). The consequences for being too low are that the program might be slowed down. As long as you still have some buffer beyond a minimum value, further increases to the buffer will generally have little to no effect, unless the processing time of that task in the pipe is extraordinary volatile.
If your buffer is constantly filling up it generally means that the task before it in the pipe executes quite a bit quicker than the task that follows it, so even without a fairly small buffer it is likely to always ensure the task following it has some work. The buffer size needed to get 90% of the benefits of a buffer is usually going to be quite small (a few dozen items maybe) whereas the side needed to get an OOM error are like 6+ orders of magnate higher. As long as you're somewhere in-between those two numbers (and that's a pretty big range to land in) you'll be just fine.
Just run your static tests, pick a static number, maybe add a few percent extra for "just in case" and you should be good. At most, I'd move some of the magic numbers to a config file so that they can be altered without a recompile in the event that the input data or the machine specs change radically.

Resources