RequireJS: To Bundle or Not to Bundle

RequireJS: To Bundle or Not to Bundle - requirejs

I'm using RequireJS for my web application. I'm using EmberJS for the application framework. I've come to a point where, I think, I should start bundling my application into a single js file. That is where I get a little confused:
If I finally bundle everything into one file for deployment, then my whole application loads in one shot, instead of on demand. Isn't bundling contradictory to AMD in general and RequireJS in particular?
What further confuses me, is what I found on the RequireJS website:
Once you are finished doing development and want to deploy your code for your end users, you can use the optimizer to combine the JavaScript files together and minify it. In the example above, it can combine main.js and helper/util.js into one file and minify the result.
I found this similar thread but it doesn't answer my question.

If I finally bundle everything into one file for deployment, then my whole application loads in one shot, instead of on demand. Isn't bundling contradictory to AMD in general and RequireJS in particular?
It is not contradictory. Loading modules on demand is only one benefit of RequireJS. A greater benefit in my book is that modularization helps to use a divide-and-conquer approach. We can look at it in this way: even though all the functions and classes we put in a single file do not benefit from loading on demand, we still write multiple functions and multiple classes because it helps break down the problem in a structured way.
However, the multiplicity of modules we create in development do not necessarily make sense when running the application in a browser. The greatest cost of on-demand loading is sending multiple HTTP requests over the wire. Let's say your application has 10 modules and you send 10 requests to load it because you load these modules individually. Your total cost is going to be the cost you have to pay to load the bytes from the 10 files (let's call it Pc for payload cost), plus an overhead cost for each HTTP request (let's call it Oc, for overhead cost). The overhead has to do with the data and computations that have to occur to initiate and close these requests. They are not insignificant. So you are paying Pc + 10*Oc. If you send everything in one chunk you pay Pc + 1*Oc. You've saved 9*Oc. In fact the savings are probably greater because (since compression is often used at both ends to reduce the size of the data transmitted) compression is going to provide greater benefits if the entire data is compressed together than if it is compressed as 10 chunks. (Note: the above analysis omits details that are not useful to cover.)
Someone might object: "But you are comparing loading all the modules in separately versus loading all the modules in one chunk. If we load on demand then we won't load all the modules." As a matter of fact, most applications have a core of modules that will always be loaded, no matter what. These are the modules without which the application won't work at all. For some small applications this means all modules, so it make sense to bundle all of them together. For bigger applications, this means that a core set of modules will be used every single time the application runs, but a small set will be used only on occasion. In the latter case, the optimization should create multiple bundles. I have an application like this. It is an editor with modes for various editing needs. A good 90% of the modules belong to the core. They are going to be loaded and used anyway so it makes sense to bundle them. The code for the modes themselves is not always going to be used but all the files for a given mode are going to be needed if the mode is loaded at all so each mode should be its own bundle. So in this case a model with one core bundle and a series of mode bundles makes sense to a) optimize the deployed application but b) keep some of the benefits of loading on demand. That's the beauty of RequireJS: it does not require to do one or the other exclusively.

While developing you want to have single-focused, small files. This causes their number to increase. When running in production, many HTTP requests really harm performance. Then again you do not want to load the entire application upfront - this is also not optimal.
To address this, I have created a small project in GitHub, require-lazy, you can call it plugin to the builder - r.js. It can lazy load parts of your application with a simple syntax and then create separately donloadable bundles during the build process; so if your application consists of 2 views that need to be independently loaded, require-lazy will (ideally) build 3 js files: (1) the bootstrap code and common libraries, (2) view 1 with all its private scripts and (3) view 2 with all its private scripts.
Lazy loading is simply defined as:
define(["lazy!view1"], function(view1) { .... });
And view1 must be accessed with a promise:
view1.get().done(function(realView1) {
...
});
The project is available through npm, the build process through grunt and there is a bower component.
Comments are more than welcome.

Related

Why does v8 report duplicate module strings in heap in my jest tests?

In the process of upgrading node (16.1.x => 16.5.0), I observed that I'm getting OOM issues from jest. In troubleshooting, I'm periodically taking heap snapshots. I'm regularly seeing entries in "string" for module source (same shallow/retained size). In this example screenshot, you can see that the exact same module (React) is listed 2x. Sometimes, the module string is listed even 4x for any given source module.
Upon expansion, it says "system / Map", which suggests to me I think? that theres some v8 wide reference to this module string? That makes sense--maybe. node has a require cache, jest has a module cache, v8 and node i'd assume... share module references? The strings and compiled code buckets do increase regularly, but I expect them to get GC'd. In fact, I can see that many do--expansion of the items show the refs belonging to GC Roots. But I suspect something is holding on to these module references, and I fear it's not at the user level, but at the tooling level. This is somewhat evidenced by observation that only the node.js upgrade induces the OOM failure mode.
Why would my jest test have multiple instances of the same module (i am using --runInBand, so I don't expect multiple workers)
What tips would you offer to diagnose further?
I do show multiple VM Contexts, which I think makes sense--I suppose jest is running some test suites in some sort of isolation.
I do not have a reproduction--I am looking for discussion, best-know-methods, diagnostic ideas.

I can offer some thoughts:
"system / Map" does not mean "some v8 wide reference". "Map" is the internal name for "hidden class", which you may have heard of. The details don't even matter here; TL;DR: some internal thing, totally normal, not a sign of a problem.
Having several copies of the same string on the heap is also quite normal, because strings don't get deduplicated by default. So if you run some string-producing operation twice (such as: reading an external file), you'll get two copies of the string. I have no idea what jest does under the hood, but it's totally conceivable that running tests in parallel in mostly-isolated environments has a side effect of creating duplicate strings. That may be inefficient in a sense, but as long as they get GC'ed after a while, it's not really a problem.
If the specific hypothesis implied above (there are several tests in each file, and jest creates an in-memory copy of the entire file for each executing test) holds, then a possible mitigation might be to split your test files into smaller chunks (1.8MB is quite a lot for a single file). I don't have much confidence in this, but maybe it'd be easy for you to try it and see.
More generally: in the screenshot, there are 36MB of memory used by strings. That's far from being an OOM reason.
It might be insightful to measure the memory consumption of both Node versions. If, for example, it used to consume 4GB and now crashes when it reaches 2GB, that would indicate that the limit has changed. If it used to consume 2GB and now crashes when it reaches 4GB, that would imply that something major has changed. If it used to consume 1.98GB and now crashes when it reaches 2.0GB, then chances are something tiny has changed and you just happened to get lucky with the old version.
Until contradicting evidence turns up, I would operate under the assumption that the resource consumption is normal and simply must be accommodated. You could try giving Node more memory, or reducing the number of parallel test executions.

This seems like a known issue of Jest at Node JS v16.11.0+ and has already been reported to GitHub.

Do multiple node.js "requires" impact production run time?

We are integrating Amazon's node.js SDK into our project and while I do not think it matters due to require's cache and the fact that everything is compiled, I could not find a site that definitively states that multiple requires will not affect performance in run time.
Obviously it depends on what files you are requiring, the contents of those files, and whether or not they could block the event loop or have other code inside of them to slow performance.
I prefer to structure code based on functionality rather than just having a 10000+ line file that does not really relate to the task at hand. I just want to make sure I'm not shooting myself in the foot by break out functionality into separate modules and then requiring on an as needed basis.

Well, require() is a synchronous operation so it should ONLY be used during server initialization, never during an actual request. Therefore, the performance of require() should only affect your server startup time, not your request handling time.
Second, require() does have a cache behind it. It matches the fully resolved path of the module you are attempting to load. So, if you call require(somePath) and a module at that same path has previously been loaded, then the module handle is just immediately returned from the cache. No module is loaded from disk a second time. The module code is not executed a second time.
Obviously it depends on what files you are requiring, the contents of those files, and whether or not they could block the event loop or have other code inside of them to slow performance.
If you are requiring a module for the first time, it WILL block the event loop while loading that module because require() uses blocking, synchronous I/O when the module is not yet cached. That's why you should be doing this at server initialization time, not during a request handler.
I prefer to structure code based on functionality rather than just having a 10000+ line file that does not really relate to the task at hand. I just want to make sure I'm not shooting myself in the foot by break out functionality into separate modules and then requiring on an as needed basis.
Breaking code into logical modules is good for ease of maintenance, ease of testing and ease of reuse, so it's definitely a good thing.
I have seen people go too far where there are so many modules each with only a few lines of code in them that it backfires and makes the project unwieldly to work on, find things in, design test suites for, etc... So, there is a balance.

NodeJS: Most efficient way to use "require"

What is the best way to use NodeJS's require function? By this, I'm referring to the placement of the require statement. Isf it better to load all dependencies from the beginning of the script, as you need them, or does it not make a notable difference whatsoever?
This article has a lot useful information regarding how require works, though I still can't come to a definitive conclusion as to which method would be most efficient.

Assuming you're using node.js for some sort of server environment, several things are generally true about that server environment:
You want fast response time to any given request.
The code that runs for processing requests should not use synchronous I/O operations because that seriously lessens the scalability of the server.
Server startup time is generally not something you need to optimize for (within reason) so if you're going to pay an initialization cost somewhere, it is usually better paid once at server startup time.
So, given that require() uses synchronous I/O when the module has not yet been cached, that means you really don't generally want to be doing require() operations inside a request handler. And, you want fast response times for your request handlers so you don't want require() calls inside your handler anyway.
All of these leads to a general rule of thumb that you load necessary modules at startup time into a module level variable that you can reuse from one request to the next and you don't load modules inside your request handlers.
In addition to all of this, if you put all your require() statements in a block near the top of your module, it makes your module a lot more self-documenting about what other modules it depends on and how it initializes those modules. If require() statements are sprinkled all over the code, then it makes it a lot harder for a developer to see what this module is using without a lot more study of the code.

It depends what performance characteristics you're looking for.
require() is not cheap; it has to read the JS file from disk, parse it, and execute any top-level code (and do all of that recursively for all files require()d by that file).
If you put all of your require()s on top, your code may take more time to start, but it won't suddenly slow down later. (note that moving the require() further down in the synchronous top-level code will make no difference beyond order of execution).
If you only require() other modules when first used asynchronously, your first request may be noticeably slower, as Node parses all of your dependencies. This also means that any errors from dependencies won't be caught until later. (note that all require() calls are cached, so the second request won't do any more work)
The other disadvantage to scattering require() calls throughout your code is that it makes it less readable; it's very nice to easily see exactly what each file depends on up top.

BreezeJS RequireJS optimized for production

Using BreezeJS, RequireJS, AngularJS with NodeJS and MongoDB as backend, I'm building a fat client application, with great success so far, as BreezeJS takes away the work to keep my domain model persisted. But it's growing and it takes now over five seconds to load all the files if they are not cached on localhost, catastrophical if you are trying to do a quick demo using a remote server..
R optimizer Warning:
bower_components/breezejs/breeze.debug.js has more than one anonymous define.
May be a built file from another build system like, Ender. Skipping
normalization.
Trying to run the compiled production file throws:
Uncaught Error: Mismatched anonymous define() module: function (){ return definition(global); }
(breeze.debug.js L10)
Has anyone gotten BreezeJS+RequireJS into production?

Take a look at the Todo-Require sample in the breeze.samples.js GitHub repo.

The Todo-KO-Require sample shows you how to code with require but it doesn't show you how to package things for production. You will suffer if you're asking require to download every individual file on demand.
You need to optimize with bundling and minification ... a topic outside of the breeze purview and not something we are in a hurry to produce. Perhaps you'd like to take that bull by the horns and share with the rest of us.
Why worry?
[update, 2 July 2014]
Let's take a step back and rediscover the point of all this. What is require doing for you?
I've used it with KO as a vehicle for dependency injection. That's its role in Durandal.
Angular comes with its own DI which reduces the role of require in an Ng app to asynchronous file loader. That's usually "meh" for me, in part because one soon encounters the file-loading-flurry that you describe. That leads to bundling which is a headache and can as easily be done with other tooling.
I see the value in a large applications with dynamically loaded modules. But Ng is woeful in this regard quite apart from the async file loading. Something they'll address in v.2.
I'm happy to leave you to a contrary opinion. So let's consider what would happen if we can't fix this problem. What if breeze cannot be optimized with r?
My instinct is that it isn't really optimal to bundle breeze with anything else anyway!
The minimized breeze is rather large in itself. It is not evident to me that you would gain any performance advantage at all by bundling it with your application assets. Sure you want to keep the number of server requests down. But are two requests with 1/2 the payload slower than one big request? Do you know for your target environment?
I'm not the kind of pedant who insists that every script be delivered by require. It's trivial to load BreezeJS separately with a script tag and then make it available to other require-aware modules (I shall assume you know how to do this). What would be horrible about that?
While we look forward to your repro sample (see my comment below), I may have difficulty justifying priority attention to this issue. Convince me otherwise.

I managed to compile my projecte leaving out breeze. With a small adjustment to the breeze mongo dataservice file header. Using r optimizer config
paths: {
'breeze': 'empty:',
'breeze-dataservice-mongo': 'empty:'
}
Breeze mongodataservice can be included as soon as it conforms like lib/breeze-angular.
(function () {
"use strict";
requirejs.config({
paths: {
'breeze': 'bower_components/breezejs/breeze.debug',
'breeze-dataservice-mongo': 'lib/breeze.dataService.mongo'
}
});
require(['angular', 'jquery', 'core/logger', 'fastclick', 'core/index', 'domready!'], function (angular, $, logger, fastClick) {
logger.info('iaGastro client is booting');
fastClick.attach(document.body);
angular.bootstrap(document, ['iaApp']);
});
})();
Leaving out SaveQueueing completely, I think I can find a different solution for my concurrent save error..
#Ward:
RequireJS does static file loading, like my domain classes, also templates and json files. Now it also concatenates all my files and minifies them with one more parameter. It's probably the docs, which are not the best, because I feel I'm not the only one sometimes misunderstanding RequireJS..
Also it's error messages can be frustrating (circular dependencies..).

Working with large Backbone collections

We're designing a backbone application, in which each server-side collection has the potential to contain tens of thousands of records. As an analogy - think of going into the 'Sent Items' view of an email application.
In the majority of Backbone examples I've seen, the collections involved are at most 100-200 records, and therefore fetching the whole collection and working with it in the client is relatively easy. I don't believe this would be the case with a much larger set.
Has anyone done any work with Backbone on large server-side collections?
Have you encountered performance issues (especially on mobile devices) at a particular collection size?
What decision(s) did you take around how much to fetch from the server?
Do you download everything or just a subset?
Where do you put the logic around any custom mechanism (Collection prototype for example?)

Yes, at about 10,000 items, older browsers could not handle the display well. We thought it was a bandwidth issue, but even locally, with as much bandwidth as a high-performance machine could throw at it, Javascript just kinda passed out. This was true on Firefox 2 and IE7; I haven't tested it on larger systems since.
We were trying to fetch everything. This didn't work for large datasets. It was especially pernicious with Android's browser.
Our data was in a tree structure, with other data depending upon the presence of data in the tree structure. The data could change due to actions from other users, or other parts of the program. Eventually, we made the tree structure fetch only the currently visible nodes, and the other parts of the system verified the validity of the datasets on which they dependent independently. This is a race condition, but in actual deployment we never saw any problems. I would have liked to use socket.io here, but management didn't understand or trust it.
Since I use Coffeescript, I just inherited from Backbone.Collection and created my own superclass, which also instantiated a custom sync() call. The syntax for invoking a superclass's method is really useful here:
class Dataset extends BaseAccessClass
initialize: (attributes, options) ->
Dataset.__super__.initialize.apply(#, arguments)
# Customizations go here.

Like Elf said you should really paginate loading data from the server. You'd save a lot of load on the server from downloading items you may not need. Just creating a collection with 10k models locally in Chrome take half a second. It's a huge load.
You can put the work on another physical CPU thread by using a worker and then use transient objects to sent it to the main thread in order to render it on the DOM.
Once you have a collection that big rendering in the DOM lazy rendering will only get you so far. The memory will slowly increase until it crashes the browser (that will be quick on tablets). You should use object pooling on the elements. It will allow you to set a small max size for the memory and keep it there.
I'm building a PerfView for Backbone that can render 1,000,000 models and scroll at 120FPS on Chrome. The code is all up on Github https://github.com/puppybits/BackboneJS-PerfView. It;s commented so theres a lot of other optimizations you'd need to display large data sets.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string