Nodejs require: once or on demand? - node.js

What is currently the best practice for loading models (and goes for all required files I guess)?
I'm thinking these two ways to achieve the solution (nonsense code to illustrate follows):
var Post = require('../models/post');
function findById(id) {
return new Post(id);
}
function party() {
return Post.getParty();
}
vs
function findById(id) {
return new require('../models/post')(id);
}
function party() {
return require('../models/post').getParty();
}
Is one of these snippets preferred? Are there considerable memory and time tradeoffs? Or is it just a premature optimization?

It's a premature optimization (calls to require() are cached and idempotent), but I'd personally call your first style better (loading dependencies during initialization rather than subsequent processing) since it's easier to get your head around what you're doing. Loading everything at the start will slightly slow down your startup (which is hardly ever an issue) in return for making most requests run slightly faster (which you shouldn't worry about unless you've identified a bottleneck and done some hardcore profiling).

You should definititely use the version with the single call to requireat the beginning. Although it does not make any difference regarding to how often the modules are loaded (they are only loaded once however you do it), there are performance issues in the second way of doing it.
The problem is that require is one of only a few functions in Node.js that is blocking. That means, as long as it runs, Node.js is not able to fulfill any incoming requests. On startup this is no problem: It only takes a while until your application is up and running.
But for sure you don't want to have blocking moments while your application is already running for a while.
So, if you do not have VERY special reasons for the second option, go with the first one.

I believe the second case is useful to avoid circular module dependencies, since the require() happens at run-time rather than load time.
Otherwise, I believe the first is (slightly?) faster and to me, quite a bit more readable.

Related

v8 memory spike (rss) when defining more than 1000 function (does not reproduce when using --jitless)

I have a simple node app with 1 function that defines 1000+ functions inside it (without running them).
When I call this function (the wrapper) around 200 times the RSS memory of the process spikes from 100MB to 1000MB and immediately goes down. (The memory spike only happens after around 200~ calls, before that all the calls do not cause a memory spike, and all the calls after do not cause a memory spike)
This issue is happening to us in our node server in production, and I was able to reproduce it in a simple node app here:
https://github.com/gileck/node-v8-memory-issue
When I use --jitless pr --no-opt the issue does not happen (no spikes). but obviously we do not want to remove all the v8 optimizations in production.
This issue must be some kind of a specific v8 optimization, I tried a few other v8 flags but non of them fix the issue (only --jitless and --no-opt fix it)
Anyone knows which v8 optimization could cause this?
Update:
We found that --no-concurrent-recompilation fix this issue (No memory spikes at all).
but still, we can't explain it.
We are not sure why it happens and which code changes might fix it (without the flag).
As one of the answers suggests, moving all the 1000+ function definitions out of the main function will solve it, but then those functions will not be able to access the context of the main function which is why they are defined inside it.
Imagine that you have a server and you want to handle a request.
Obviously, The request handler is going to run many times as the server gets a lot of requests from the client.
Would you define functions inside the request handler (so you can access the request context in those functions) or define them outside of the request handler and pass the request context as a parameter to all of them? We chose the first option... what do you think?
anyone knows which v8 optimization could cause this?
Load Elimination.
I guess it's fair to say that any optimization could cause lots of memory consumption in pathological cases (such as: a nearly 14 MB monster of a function as input, wow!), but Load Elimination is what causes it in this particular case.
You can see for yourself when your run with --turbo-stats (and optionally --turbo-filter=foo to zoom in on just that function).
You can disable Load Elimination if you feel that you must. A preferable approach would probably be to reorganize your code somewhat: defining 2,000 functions is totally fine, but the function defining all these other functions probably doesn't need to be run in a loop long enough until it gets optimized? You'll avoid not only this particular issue, but get better efficiency in general, if you define functions only once each.
There may or may not be room for improving Load Elimination in Turbofan to be more efficient for huge inputs; that's a longer investigation and I'm not sure it's worth it (compared to working on other things that likely show up more frequently in practice).
I do want to emphasize for any future readers of this that disabling optimization(s) is not generally a good rule of thumb for improving performance (or anything else), on the contrary; nor are any other "secret" flags needed to unlock "secret" performance: the default configuration is very carefully optimized to give you what's (usually) best. It's a very rare special case that a particular optimization pass interacts badly with a particular code pattern in an input function.

Do multiple node.js "requires" impact production run time?

We are integrating Amazon's node.js SDK into our project and while I do not think it matters due to require's cache and the fact that everything is compiled, I could not find a site that definitively states that multiple requires will not affect performance in run time.
Obviously it depends on what files you are requiring, the contents of those files, and whether or not they could block the event loop or have other code inside of them to slow performance.
I prefer to structure code based on functionality rather than just having a 10000+ line file that does not really relate to the task at hand. I just want to make sure I'm not shooting myself in the foot by break out functionality into separate modules and then requiring on an as needed basis.
Well, require() is a synchronous operation so it should ONLY be used during server initialization, never during an actual request. Therefore, the performance of require() should only affect your server startup time, not your request handling time.
Second, require() does have a cache behind it. It matches the fully resolved path of the module you are attempting to load. So, if you call require(somePath) and a module at that same path has previously been loaded, then the module handle is just immediately returned from the cache. No module is loaded from disk a second time. The module code is not executed a second time.
Obviously it depends on what files you are requiring, the contents of those files, and whether or not they could block the event loop or have other code inside of them to slow performance.
If you are requiring a module for the first time, it WILL block the event loop while loading that module because require() uses blocking, synchronous I/O when the module is not yet cached. That's why you should be doing this at server initialization time, not during a request handler.
I prefer to structure code based on functionality rather than just having a 10000+ line file that does not really relate to the task at hand. I just want to make sure I'm not shooting myself in the foot by break out functionality into separate modules and then requiring on an as needed basis.
Breaking code into logical modules is good for ease of maintenance, ease of testing and ease of reuse, so it's definitely a good thing.
I have seen people go too far where there are so many modules each with only a few lines of code in them that it backfires and makes the project unwieldly to work on, find things in, design test suites for, etc... So, there is a balance.

NodeJS: Most efficient way to use "require"

What is the best way to use NodeJS's require function? By this, I'm referring to the placement of the require statement. Isf it better to load all dependencies from the beginning of the script, as you need them, or does it not make a notable difference whatsoever?
This article has a lot useful information regarding how require works, though I still can't come to a definitive conclusion as to which method would be most efficient.
Assuming you're using node.js for some sort of server environment, several things are generally true about that server environment:
You want fast response time to any given request.
The code that runs for processing requests should not use synchronous I/O operations because that seriously lessens the scalability of the server.
Server startup time is generally not something you need to optimize for (within reason) so if you're going to pay an initialization cost somewhere, it is usually better paid once at server startup time.
So, given that require() uses synchronous I/O when the module has not yet been cached, that means you really don't generally want to be doing require() operations inside a request handler. And, you want fast response times for your request handlers so you don't want require() calls inside your handler anyway.
All of these leads to a general rule of thumb that you load necessary modules at startup time into a module level variable that you can reuse from one request to the next and you don't load modules inside your request handlers.
In addition to all of this, if you put all your require() statements in a block near the top of your module, it makes your module a lot more self-documenting about what other modules it depends on and how it initializes those modules. If require() statements are sprinkled all over the code, then it makes it a lot harder for a developer to see what this module is using without a lot more study of the code.
It depends what performance characteristics you're looking for.
require() is not cheap; it has to read the JS file from disk, parse it, and execute any top-level code (and do all of that recursively for all files require()d by that file).
If you put all of your require()s on top, your code may take more time to start, but it won't suddenly slow down later. (note that moving the require() further down in the synchronous top-level code will make no difference beyond order of execution).
If you only require() other modules when first used asynchronously, your first request may be noticeably slower, as Node parses all of your dependencies. This also means that any errors from dependencies won't be caught until later. (note that all require() calls are cached, so the second request won't do any more work)
The other disadvantage to scattering require() calls throughout your code is that it makes it less readable; it's very nice to easily see exactly what each file depends on up top.

Does node.js really not optimize calls to [].slice.call(arguments)?

In the bluebird docs, they have this as an anti-pattern that stops optimization.. They call it argument leaking,
function leaksArguments2() {
var args = [].slice.call(arguments);
}
I do this all the time in Node.js. Is this really a problem. And, if so, why?
Assume only the latest version of Node.js.
Disclaimer: I am the author of the wiki page
It's a problem if the containing function is called a lot (being hot). Functions that leak arguments are not supported by the optimizing compiler (crankshaft).
Normally when a function is hot, it will be optimized. However if the function contains unsupported features like leaking arguments, being a hot function doesn't help and it will continue running slow generic code.
The performance of an optimized function compared to an unoptimized one is huge. For example consider a function that adds 3 doubles together: http://jsperf.com/213213213 21x difference.
What if it added 6 doubles together? 29x difference Generally the more code the function has, the more severe the punishment is for that function to run in unoptimized mode.
For node.js stuff like this in general is actually a huge problem due to the fact that any cpu time completely blocks the server. Just by optimizing the url parser that is included in node core (my module is 30x faster in node's own benchmarks), improves the requests per second of mysql-express from 70K rps to 100K rps in a benchmark that queries a database.
Good news is that node core is aware of this
Is this really a problem
For application code, no. For almost any module/library code, no. For a library such as bluebird that is intended to be used pervasively throughout an entire codebase, yes. If you did this in a very hot function in your application, then maybe yes.
I don't know the details but I trust the bluebird authors as credible that accessing arguments in the ways described in the docs causes v8 to refuse to optimize the function, and thus it's something that the bluebird authors consider worth using a build-time macro to get the optimized version.
Just keep in mind the latency numbers that gave rise to node in the first place. If your application does useful things like talking to a database or the filesystem, then I/O will be your bottleneck and optimizing/caching/parallelizing those will pay vastly higher dividends than v8-level in-memory micro-optimizations such as above.

Testing background processes in nodejs (using tape)

This is a general question about testing, but I will frame it in the context of Node.js. I'm not as concerned with a particular technology, but it may matter.
In my application, I have several modules that are called upon to do work when my web server receives a request. In the case of some of these modules, I close the request before I call upon them.
What is a good way to test that these modules are doing what they are supposed to do?
The advice here for RSpec is to mock out the work these modules are doing and just ensure that the appropriate methods are being called. This makes sense to me, but in Node.js, since my modules are not global, I don't think I cannot mock out functions without changing my program architecture so that every instance receives instances of objects that it needs1.
[1] This is a well known programming paradigm, but I cannot remember its name right now.
The other option I see is to use setTimeout and take my best guess at when these modules are done with their work.
Neither of these seems ideal.
Am I missing something? Are background processes not tested?
Since you are speaking of integration tests of these background components, a few strategies come to mind.
Take all the asynchronicity out of their operation for test mode. I'm imagining you have some sort of queueing process (that could be a faulty assumption), you toss work into the queue, and then your modules pick up that work and do their task. You could rework your test harness such that the test harness stands in as the queuing mechanism and you effectively get direct control over when the modules execute.
Refactor your modules to take some sort of next callback function. They would end up functioning a bit like Express's middleware layer or how async's each function works, but into each module you'd pass some callback that you call when that module's task is complete. Once all of the modules have reported in, then you can check the state of the program.
Exactly what you already suggested-- wait some amount of time, and if it still isn't done, consider that a failure. Mocha sort of does that, in that if a given test is over a definable threshold, then it's a failure. I don't like this way though, because if you add more tests, they all have to wait the same amount of time.

Resources