System-wide deduplication of identical node modules - node.js

I'm working on a Javascript project, and as it so happens one of my dependencies pulls in puppeteer, which in turn downloads a whole copy of Chromium into my node_modules. My larger project is split into multiple Javascript packages, so I end up with multiple identical copies of Chromium among other stuff.
Is there a way to deduplicate these packages system wide? Note, npm dedupe seems to do something completely different to what I want.
I imagine there would be a module repository in my home directory which contains every package I need (in every version needed), and then in the local node_modules directories would contain only symlinks to the repository. This seems like an incredibly obvious optimisation, but I can't find any way to do it in npm. If not in npm, is it maybe possible in yarn?
As an added complication, this should also work on Windows (where symbolic link support has historically been not so good).

It seems the following command does what I want:
npm config set link -g
Then delete node_modules, and do npm install again. It should be much smaller now.
The documentation says:
If true, then local installs will link if there is a suitable globally installed package.
Note that this means that local installs can cause things to be installed into the global space at the same time. The link is only done if one of the two conditions are met:
The package is not already installed globally, or
the globally installed version is identical to the version that is being installed locally.
I am not sure if this has any negative side-effects - for example clobbering the global namespace with commands I don't want. For now, it seems to work fine.

Related

Can npm symlink node modules to a master directory instead of redownloading?

With npm, when a package requires other packages it creates a tree structure of dependencies. Sometimes a lot of these dependencies depend on the same packages from other packages.
I was wondering, would it be possible to make npm so all packages are stores in the global node_modules and any dependency is just symlinked back to the top of the global node_modules. I understand the version issue, and that can just be handled by storing the package with the version name appended, then symlinking to the proper version.
I feel this would speed up installs and reduce disk usage for duplicate files.
(Is this what npm3 is supposed to do?)
Yes, what you propose would be possible (at least on Linux the symlinks are resolved as expected).
npm (in none of its versions) however does not benefit from symlinks. To gain some of the benefits you proposed, newer versions of npm work as follows: if some package is needed multiple times, npm installs the package as high as possible in the dependency tree. This enables using the same dependency by multiple packages.
For example, no matter how many (sub-)dependencies depend on somedep v. ^1.x.x you got only one copy of somedep. This will probably be placed directly in the root node_modules, so that any sub-dependency can require it.
Older versions of npm do not do this automatically, however, you can invoke the similar effect by running 'npm dedupe'.
Note however, that this approach is weaker than proposed in the question: If 3 of your dependencies depend on somedep v. ^1.x.x and 3 other dependencies depend on somedep v. ^2.x.x, npm obviously cannot put both of these somedeps to the parent node_modules.
Also, check out ied project: https://github.com/alexanderGugel/ied . It does something similar to what you propose, but sadly, it installs only one version of each dependency, which is quite limiting.

nvm vs nave vs n | packages processing comparsion

Node.js is sometimes confusing when it comes to version management...
I am trying to arrange various projects as i am doing with ruby projects. For example:
With ruby i can create file such as .rvmrc and fill with something like rvm --create use 1.9.3#my-app
This thing creates and uses all gems specifically to configured gemset. Which allows to have various options for any kind of project, and switch easily among them. So ruby does this in one place.
I want to achieve this for node.js projects.
Node works differently. I want to know the details about that, and especially of each node version management tool.
The point is to know which version management tool for which goal...
And why there are so many.
More accurately: i want npm install <package-name> to chosen node version. And after switching to other versions, this installed package to be missing, or have different version installed before (or certain one). Just like gemset is working.
I've been looking for clarification too:
Both allow switching & installing between node versions.
nvm will symlink the different versions to /usr/local/bin/node, and n will move your node installs to the path (/usr/local/bin/node).
n downloads and installs binary files, and nvm downloads, compiles, installs from the source.
I don't fully understand the latter part of your question, but in regards having control with node projects/apps, you can use npm install [package_name] --save-dev to save your npms within your 'project'.
These npm module versions (^semver) get detailed in your package.json file, for example "gulp": "^3.8.5" is different from "gulp": "3.8.5" (the later being specific to v3.8.5, and the ^3.8.5 means allowing any future version of 3, but not 4.0.0)
The differences between npm and gem is that npm installs the specified packages in the local node_modules folder (the current working directory using the --save-dev), so you have less worries with cross project module versions.
Important note: Running --save (instead of --save-dev) installs any missing dependencies.
I hope that helps a little :o)
Just tried to install nvm and it works for switching from one version to another. In header of nave.sh it says "# This program contains parts of narwhal's "sea" program, as well as bits borrowed from Tim Caswell's "nvm"", so you might try both and see the tiniest difference. Also check the "popularity" of each and contributors to get some insight). There is also a nodeenv which uses python, but I don't any reason why use python here. So, my answer would be no big difference.

Understanding npm and Node.js install location for modules

I've been using Node.js and npm for a few weeks with great success and have started to question the best practice for installing local modules. I understand the Global vs Local argument, however, my question has more to do with where to place a local install. Let's say that I have a project located at ~/ProjectA/ which is version controlled and worked on by multiple developers. When initially playing with Node.js and npm I wasn't aware of the default local installation paths and just simply installed the necessary modules in a default terminal which resulted in a installation path of ~/node_modules. What this ended up doing is requiring all the other developers working on the project to install the modules on their own machines in order to run the application. Having seen where some of the developers ran npm install I'm still really surprised that it worked on their machines at all (I guess it relates to how Node.js and require() looks for modules), but needless to say, it worked.
Now that the project is getting past the "toying around" stage, I would like to setup the project folder correctly. So, my question is, should the modules be installed at ~/ProjectA/node_modules and therefore be part of the version controlled project files, or should it continue to be located at a developer-machine specific location...or does it not really matter at all?
I'm just looking for a little "best-practice" guidance on this one and what others do when setting up your projects.
I think that the "best practice" here is to keep the dependencies within the project folder.
Almostly all Node projects I've seen so far (I'm a Node developer has about 8 months now) do that.
You don't need to version control the dependencies. That's how I manage my Node projects:
Keep the versions locked in the package.json file, so everyone gets the same working version, or use the npm shrinkwrap command in your project root.
Add the node_modules folder to your VCS ignore file (I use git, so mine is .gitignore)
Be happy, you're done!

How to manage internal Node.JS modules

What would be the preferred way to handle internal modules within a node.js application?
I mean, currently we have a fairly big application and there are several dependencies, installed via npm. We have node_modules in .gitignore. Therefore it's not in the repository. We use npm shrinkwrap to freeze the modules versions, so, upon deployment, an npm install puts together eveything we need into node_modules.
The problem is, since out app is getting bigger, we want to split it to smaller modules. Now, if I create a foo module and put it in node_modues, I need to allow it in the repo, which does not seem so neat to have both ignored and checked out modules in node_modules.
We can not publish these on npm registry because they are not really "public".
Is there any obvious solution that I'm not aware of?
I agree with your aesthetic of not mixing 3rd-party non-repo stuff with your in-house revision-controlled contents.
The NODE_PATH search path variable can help you here, you can use a different directory with contents under revision control.
http://nodejs.org/api/modules.html
Depending on your appetite for flexibility vs complexity, you can consider a build step that actually copies your various node modules from wherever you keep them, into an output directory (probably "named node_modules" but not necessarily).
A common solution is to store the modules in a private Git repository somewhere, and then use a git:// URL as a dependency; npm knows how to clone repositories as dependencies.

Advantages of bundledDependencies over normal dependencies in npm

npm allows us to specify bundledDependencies, but what are the advantages of doing so? I guess if we want to make absolutely sure we get the right version even if the module we reference gets deleted, or perhaps there is a speed benefit with bundling?
Anyone know the advantages of bundledDependencies over normal dependencies?
For the quick reader : this QA is about the package.json bundledDependencies field, not about the package.
What bundledDependencies do
"bundledDependencies" are exactly what their name implies. Dependencies that should be inside your project. So the functionality is basically the same as normal dependencies. They will also be packed when running npm pack.
When to use them
Normal dependencies are usually installed from the npm registry.
Thus bundled dependencies are useful when:
you want to re-use a third party library that doesn't come from the npm registry or that was modified
you want to re-use your own projects as modules
you want to distribute some files with your module
This way, you don't have to create (and maintain) your own npm repository, but get the same benefits that you get from npm packages.
When not to use bundled dependencies
When developing, I don't think that the main point is to prevent accidental updates though. We have better tools for that, namely code repositories (git, mercurial, svn...) or now lock files.
To pin your package versions, you can use:
Option1: Use the newer NPM version 5 that comes with node 8. It uses a package-lock.json file (see the node blog and the node 8 release)
Option2: use yarn instead of npm.
It is a package manager from facebook, faster than npm and it uses a yarn.lock file. It uses the same package.json otherwise.
This is comparable to lockfiles in other package managers like Bundler
or Cargo. It’s similar to npm’s npm-shrinkwrap.json, however it’s not
lossy and it creates reproducible results.
npm actually copied that feature from yarn, amongst other things.
Option3: this was the previously recommended approach, which I do not recommend anymore. The idea was to use npm shrinkwrap most of the time, and sometimes put the whole thing, including the node_module folder, into your code repository. Or possibly use shrinkpack. The best practices at the time were discussed on the node.js blog and on the joyent developer websites.
See also
This is a bit outside the scope of the question, but I'd like to mention the last kind of dependencies (that I know of): peer dependencies. Also see this related SO question and possibly the docs of yarn on bundledDependencies.
One of the biggest problems right now with Node is how fast it is changing. This means that production systems can be very fragile and an npm update can easily break things.
Using bundledDependencies is a way to get round this issue by ensuring, as you correctly surmise, that you will always deliver the correct dependencies no matter what else may be changing.
You can also use this to bundle up your own, private bundles and deliver them with the install.
Other advantage is that you can put your internal dependencies (application components) there and then just require them in your app as if they were independent modules instead of cluttering your lib/ and publishing them to npm.
If/when they are matured to the point they could live as separate modules, you can put them on npm easily, without modifying your code.
I'm surprised I didn't see this here already, but when carefully selected, bundledDependencies can be used to produce a distributable package from npm pack that will run on a system where npm is not configured. This is helpful if you have e.g. a system that's not networked / not on the internet: bring your package over on a thumb drive (or whatever) and unpack the tarball, then npm run or node index.js and it Just Works.
Maybe there's a better way to bundle up your application to run "offline", but if there is I haven't found it.
Operationally, I look at bundledDependencies as a module's private module store, where dependencies is more public, resolved among your module and its dependencies (and sub-dependencies). Your module may rely on an older version of, say, react, but a dependency requires latest-and-greatest. Your package/install will result in your pinned version in node_modules/$yourmodule/node_modules/react, while your dependency will get their version in node_modules/react (or node_modules/$dependency/node_modules/react if they're so inclined).
A caveat: I recently ran into a dependency that did not properly configure its dependency on react, and having react in bundledDependencies caused that dependent module to fail at runtime.

Resources