Does node.js really not optimize calls to [].slice.call(arguments)?

Does node.js really not optimize calls to [].slice.call(arguments)? - node.js

In the bluebird docs, they have this as an anti-pattern that stops optimization.. They call it argument leaking,
function leaksArguments2() {
var args = [].slice.call(arguments);
}
I do this all the time in Node.js. Is this really a problem. And, if so, why?
Assume only the latest version of Node.js.

Disclaimer: I am the author of the wiki page
It's a problem if the containing function is called a lot (being hot). Functions that leak arguments are not supported by the optimizing compiler (crankshaft).
Normally when a function is hot, it will be optimized. However if the function contains unsupported features like leaking arguments, being a hot function doesn't help and it will continue running slow generic code.
The performance of an optimized function compared to an unoptimized one is huge. For example consider a function that adds 3 doubles together: http://jsperf.com/213213213 21x difference.
What if it added 6 doubles together? 29x difference Generally the more code the function has, the more severe the punishment is for that function to run in unoptimized mode.
For node.js stuff like this in general is actually a huge problem due to the fact that any cpu time completely blocks the server. Just by optimizing the url parser that is included in node core (my module is 30x faster in node's own benchmarks), improves the requests per second of mysql-express from 70K rps to 100K rps in a benchmark that queries a database.
Good news is that node core is aware of this

Is this really a problem
For application code, no. For almost any module/library code, no. For a library such as bluebird that is intended to be used pervasively throughout an entire codebase, yes. If you did this in a very hot function in your application, then maybe yes.
I don't know the details but I trust the bluebird authors as credible that accessing arguments in the ways described in the docs causes v8 to refuse to optimize the function, and thus it's something that the bluebird authors consider worth using a build-time macro to get the optimized version.
Just keep in mind the latency numbers that gave rise to node in the first place. If your application does useful things like talking to a database or the filesystem, then I/O will be your bottleneck and optimizing/caching/parallelizing those will pay vastly higher dividends than v8-level in-memory micro-optimizations such as above.

Related

v8 memory spike (rss) when defining more than 1000 function (does not reproduce when using --jitless)

I have a simple node app with 1 function that defines 1000+ functions inside it (without running them).
When I call this function (the wrapper) around 200 times the RSS memory of the process spikes from 100MB to 1000MB and immediately goes down. (The memory spike only happens after around 200~ calls, before that all the calls do not cause a memory spike, and all the calls after do not cause a memory spike)
This issue is happening to us in our node server in production, and I was able to reproduce it in a simple node app here:
https://github.com/gileck/node-v8-memory-issue
When I use --jitless pr --no-opt the issue does not happen (no spikes). but obviously we do not want to remove all the v8 optimizations in production.
This issue must be some kind of a specific v8 optimization, I tried a few other v8 flags but non of them fix the issue (only --jitless and --no-opt fix it)
Anyone knows which v8 optimization could cause this?
Update:
We found that --no-concurrent-recompilation fix this issue (No memory spikes at all).
but still, we can't explain it.
We are not sure why it happens and which code changes might fix it (without the flag).
As one of the answers suggests, moving all the 1000+ function definitions out of the main function will solve it, but then those functions will not be able to access the context of the main function which is why they are defined inside it.
Imagine that you have a server and you want to handle a request.
Obviously, The request handler is going to run many times as the server gets a lot of requests from the client.
Would you define functions inside the request handler (so you can access the request context in those functions) or define them outside of the request handler and pass the request context as a parameter to all of them? We chose the first option... what do you think?

anyone knows which v8 optimization could cause this?
Load Elimination.
I guess it's fair to say that any optimization could cause lots of memory consumption in pathological cases (such as: a nearly 14 MB monster of a function as input, wow!), but Load Elimination is what causes it in this particular case.
You can see for yourself when your run with --turbo-stats (and optionally --turbo-filter=foo to zoom in on just that function).
You can disable Load Elimination if you feel that you must. A preferable approach would probably be to reorganize your code somewhat: defining 2,000 functions is totally fine, but the function defining all these other functions probably doesn't need to be run in a loop long enough until it gets optimized? You'll avoid not only this particular issue, but get better efficiency in general, if you define functions only once each.
There may or may not be room for improving Load Elimination in Turbofan to be more efficient for huge inputs; that's a longer investigation and I'm not sure it's worth it (compared to working on other things that likely show up more frequently in practice).
I do want to emphasize for any future readers of this that disabling optimization(s) is not generally a good rule of thumb for improving performance (or anything else), on the contrary; nor are any other "secret" flags needed to unlock "secret" performance: the default configuration is very carefully optimized to give you what's (usually) best. It's a very rare special case that a particular optimization pass interacts badly with a particular code pattern in an input function.

Why does v8 report duplicate module strings in heap in my jest tests?

In the process of upgrading node (16.1.x => 16.5.0), I observed that I'm getting OOM issues from jest. In troubleshooting, I'm periodically taking heap snapshots. I'm regularly seeing entries in "string" for module source (same shallow/retained size). In this example screenshot, you can see that the exact same module (React) is listed 2x. Sometimes, the module string is listed even 4x for any given source module.
Upon expansion, it says "system / Map", which suggests to me I think? that theres some v8 wide reference to this module string? That makes sense--maybe. node has a require cache, jest has a module cache, v8 and node i'd assume... share module references? The strings and compiled code buckets do increase regularly, but I expect them to get GC'd. In fact, I can see that many do--expansion of the items show the refs belonging to GC Roots. But I suspect something is holding on to these module references, and I fear it's not at the user level, but at the tooling level. This is somewhat evidenced by observation that only the node.js upgrade induces the OOM failure mode.
Why would my jest test have multiple instances of the same module (i am using --runInBand, so I don't expect multiple workers)
What tips would you offer to diagnose further?
I do show multiple VM Contexts, which I think makes sense--I suppose jest is running some test suites in some sort of isolation.
I do not have a reproduction--I am looking for discussion, best-know-methods, diagnostic ideas.

I can offer some thoughts:
"system / Map" does not mean "some v8 wide reference". "Map" is the internal name for "hidden class", which you may have heard of. The details don't even matter here; TL;DR: some internal thing, totally normal, not a sign of a problem.
Having several copies of the same string on the heap is also quite normal, because strings don't get deduplicated by default. So if you run some string-producing operation twice (such as: reading an external file), you'll get two copies of the string. I have no idea what jest does under the hood, but it's totally conceivable that running tests in parallel in mostly-isolated environments has a side effect of creating duplicate strings. That may be inefficient in a sense, but as long as they get GC'ed after a while, it's not really a problem.
If the specific hypothesis implied above (there are several tests in each file, and jest creates an in-memory copy of the entire file for each executing test) holds, then a possible mitigation might be to split your test files into smaller chunks (1.8MB is quite a lot for a single file). I don't have much confidence in this, but maybe it'd be easy for you to try it and see.
More generally: in the screenshot, there are 36MB of memory used by strings. That's far from being an OOM reason.
It might be insightful to measure the memory consumption of both Node versions. If, for example, it used to consume 4GB and now crashes when it reaches 2GB, that would indicate that the limit has changed. If it used to consume 2GB and now crashes when it reaches 4GB, that would imply that something major has changed. If it used to consume 1.98GB and now crashes when it reaches 2.0GB, then chances are something tiny has changed and you just happened to get lucky with the old version.
Until contradicting evidence turns up, I would operate under the assumption that the resource consumption is normal and simply must be accommodated. You could try giving Node more memory, or reducing the number of parallel test executions.

This seems like a known issue of Jest at Node JS v16.11.0+ and has already been reported to GitHub.

Node.js optimizing module for best performance

I'm writing a crawler module which is calling it-self recursively to download more and more links depending on a depth option parameter passed.
Besides that, I'm doing more tasks on the returned resources I've downloaded (enrich/change it depending on the configuration passed to the crawler). This process is going on recursively until it's done which might take a-lot of time (or not) depending on the configurations used.
I wish to optimize it to be as fast as possible and not to hinder on any Node.js application that will use it.I've set up an express server that one of its routes launch the crawler for a user defined (query string) host. After launching a few crawling sessions for different hosts, I've noticed that I can sometimes get real slow responses from other routes that only return simple text.The delay can be anywhere from a few milliseconds to something like 30 seconds, and it's seems to be happening at random times (well nothing is random but I can't pinpoint the cause).I've read an article of Jetbrains about CPU profiling using V8 profiler functionality that is integrated with Webstorm, but unfortunately it only shows on how to collect the information and how to view it, but it doesn't give me any hints on how to find such problems, so I'm pretty much stuck here.
Could anyone help me with this matter and guide me, any tips on what could hinder the express server that my crawler might do (A lot of recursive calls), or maybe how to find those hotspots I'm looking for and optimize them?

It's hard to say anything more specific on how to optimize code that is not shown, but I can give some advice that is relevant to the described situation.
One thing that comes to mind is that you may be running some blocking code. Never use deep recursion without using setTimeout or process.nextTick to break it up and give the event loop a chance to run once in a while.

What are the performance implications of interopping with other languages via system calls?

Suppose I'm writing a program in node.js (or perhaps another typical back-end scripting language). Suppose further I have a C function f (or a python function, or what have you) that does some pure data transformation.
If I want to use f in my node program, there are two approaches:
Bind f via something like node-gyp that makes it callable from JavaScript land.
Make f into a binary (or, in the case of a language like python, a single f.py interface) that sits on the file system, and then call it from node as if were any other system command (so that one can then take the output from the system call as a string, convert it into node.js data, and then use it).
Question: What are the performance implications of choosing (2) over (1)?
This is important because if you are using a language like C to make some aspect of your application run significantly faster, then using (2) would seem pointless if it slowed things down past some threshold.

The cost of 1 is the cost of loading the native code, transfering arguments (ffi), calling the native code, and transfering arguments back. With loading being done only once.
The cost of 2 is always going to be the cost to startup the process, running the process, converting the results back from strings.
If the cost of f is high, you may never see a difference between 1 and 2. If the cost of f is low, then 2 will take longer because the process startup overhead will dominate.
However, depending on the complexity of f (it might be a very large data-processing application in C), it's almost always faster to create a native binding like 1. Avoiding process startup overhead is important, it also reduces the total amount of memory needed to run your application.
Alternatively you could do option:
Have the C code talk over a local network socket. Accepting requests and responding with answers when the computation is done.
This has the benefit of scaling out to multiple nodes if you need it.

Benchmarking both for your use case is the only way to be sure but method 1 is
likely to be faster.
The startup cost of calling a binary and starting an interpreter for python/perl/blah would likely kill any performance gain you might get using their Foreign Function Interface (FFI). Startup cost is one of the reasons why Apache has mod_python, mod_perl and why FastCGI exists.
Another thing to consider is that you're adding another language to the mix and this might kill performance of the team ie now everyone needs to know two languages and two FFI methods etc. If your app is in Node, keep it in Node and use node to call native methods.

Nodejs require: once or on demand?

What is currently the best practice for loading models (and goes for all required files I guess)?
I'm thinking these two ways to achieve the solution (nonsense code to illustrate follows):
var Post = require('../models/post');
function findById(id) {
return new Post(id);
}
function party() {
return Post.getParty();
}
vs
function findById(id) {
return new require('../models/post')(id);
}
function party() {
return require('../models/post').getParty();
}
Is one of these snippets preferred? Are there considerable memory and time tradeoffs? Or is it just a premature optimization?

It's a premature optimization (calls to require() are cached and idempotent), but I'd personally call your first style better (loading dependencies during initialization rather than subsequent processing) since it's easier to get your head around what you're doing. Loading everything at the start will slightly slow down your startup (which is hardly ever an issue) in return for making most requests run slightly faster (which you shouldn't worry about unless you've identified a bottleneck and done some hardcore profiling).

You should definititely use the version with the single call to requireat the beginning. Although it does not make any difference regarding to how often the modules are loaded (they are only loaded once however you do it), there are performance issues in the second way of doing it.
The problem is that require is one of only a few functions in Node.js that is blocking. That means, as long as it runs, Node.js is not able to fulfill any incoming requests. On startup this is no problem: It only takes a while until your application is up and running.
But for sure you don't want to have blocking moments while your application is already running for a while.
So, if you do not have VERY special reasons for the second option, go with the first one.

I believe the second case is useful to avoid circular module dependencies, since the require() happens at run-time rather than load time.
Otherwise, I believe the first is (slightly?) faster and to me, quite a bit more readable.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string