NodeJS and NPM : problems following recommendation to check modules into git

NodeJS and NPM : problems following recommendation to check modules into git - node.js

I'm having problems following the 'official' recommendation to check in all external dependencies into git (article http://www.mikealrogers.com/posts/nodemodules-in-git.html linked fron FAQ)
How do you make sure that not only top-level dependencies are checked-in? Most npm modules do currently not follow the recommendation. They all have their node_modules in .gitignore . Just Deleting their .gitignore seems risky.
For compiled module the article recommends to check-in only the sources and run 'npm rebuild' and deploy time. Unfortunately 'npm rebuild' does not do a 'clean make' for all modules (despite bugfix https://github.com/isaacs/npm/issues/1872 being included in npm version 1.0.106 i'm using). This means that I have to prevent compile targets from being checked in (otherwise i would have object code compiled for the developer machine on the production machine without being overwritten by npm rebuild). But: how do i do this? Unfortunately the modules don't have a common compile output directory, so just git-ignoring "node_modules//build" and "/node_modules//out/" (as mentioned in this good article eng.yammer.com/blog/2012/1/4/managing-nodejs-dependencies-and-deployments-at-yammer.html won't help in every case.
Short version: how do you make sure that production servers use the exact same version of all dependent modules as you use during development?

UPDATE: there is now npm shrinkwrap which solves the problem of locking down exact dependency versions, even of dependencies' dependencies! More info here.
Checking in node_modules can be problematic, as the environment it's running on may differ from user to user - so what is compiled on some environment may not work on another. Plus it would fill up your changelogs and repositories with 3rd party code. Which I take it is the conslusion you've come to with your short version of the question, so let me address that.
Short version: how do you make sure that production servers use the exact same version of all dependent modules as you use during development?
Inside your package.json, there will be dependencies: {}, if it is not there, then add it. To accomplish what you want, add your dependencies as the key, and their exact versions as the value. E.g. dependencies: { docpad: '2.5.0', mocha: '1.1.0' }
However, generally (it depends on the author) upgrades to the revision number (the x.x.X number) are just bugfixes and safe. You can allow minor changes by doing dependencies: { docpad: '2.5.x', mocha: '1.1.x' } which saves you from having to update your package.json and do a release everytime there is a bugfix release. You can even do things like 2.x if you wish.
This is the solution I've come to use for all of my modules, as it ensures that even 6 months later or whatever the module will still work - whereas doing something like >= 2.0.0 means when v3 of a dependency comes out, your module will probably be unusable at that time. Ensuring you stick to specific versions "guarantees" stability over time.
For reference you can see how I've done it in my open-source node.js modules here

About the .gitignore of your dependencies (in "node_modules"), npm 1.1 ignores the .gitignore files, so that they are not installed;
npm 1.1 will exclude .gitignore files from the things it installs.
npm 1.0 did not have this feature, so you have to be careful about that.
Deleting them recursively is fine:
find node_modules -name .gitignore | xargs rm
But, in npm 1.1, you never have to do this, because it excludes them
from the install automatically.
That's coming from the chief himself (Isaac), and it's here and seems to cover pretty much everything. The "extraneous" problem I have must be something silly I've done, I'll try a clean setup.

Related

Configuring package-lock.json to be source of dependency truth

I had the exact same question as Do I need both package-lock.json and package.json? (tldr; "what's the difference between package.json and package-lock.json?") and found some really great answers in there. However it leaves me with a few other very similar-related questions that I don't see answered elsewhere.
For instance, what if package.json and package-lock.json conflict with one another? Say package.json says to use some-lib-2.* (any 2.x version of some-lib) but package-lock.json is configured to use some-lib-1.18.4? Is there an error? Is preference given to either file as the "source of dependency truth"?
I like the idea of one file to manage my specific dependencies, and so I feel like I'm leaning towards:
Not specifying libraries or version in package.json at all; and
Using package-lock.json to specify the exact versions of each module/library my project uses
Is this possible to do? If so are there any special configurations that I need to make? Do I track both files in version control, or is there ever any reasons why I would not want to track these in git/VCS?

You use the the command line (npm install [optional args]) to update both files
NPM -- and your command line invocation -- decide what the acceptable ranges of dependency versions there are for module and define those ranges in package.json. It then picks a version within that range -- uses it for buildtime/runtime -- and writes that exact version in package-lock.json
So you want to place both files in version control so you have repeatable builds and any developers checking out your project will immediately be able to build the project with the same versions of the same dependencies
And the only time you edit package.json directly is if you don't want to allow a range of versions for a particular dependency and want to cherry pick the exact version to use. You make the edit, you save, you run npm install [options] and package-lock.json will be updated to use that version as well
For what it's worth, this is terribly confusing and advocates the anti-pattern of not managing your dependencies. It allows developers to think its OK to just pull in the latest version of a given dependency, even if that version changes from build to build. That leads to bug creep in your application, non-repeatable builds and all sorts of headaches.
I would strongly advocate for always specifying the exact version you want for all your direct dependencies: no more ranges or wildcards please.

How do I exclude insecure package.json transient dependencies?

I have a package.json that gives a load of security warnings. Looking at the first critical item I see its open#0.0.5 which hasn't been updated for five years. Looking at npm ll it is included by npm#6.5.0 where I am using the latest that was updated about two weeks ago.
I would like to remove the insecure dependencies. In the Java world the maven package manager lets you put exclude certain transitive dependencies. Ideally, with npm or another node package manager, I should be able to exclude dependencies with vulnerabilities. Then I can retest that my app works and not see any security errors. Is there a way to quickly exclude anything that has a security vulnerability from my package.json? If there isn't a way to do this what approaches can a take to ensure that no insecure packages are used by my application?
Update: Although "npm": "^6.5.0" is specified in the package.json I was building it with an older npm which was picking up the critical issue mentioned above. I fixed all the issues with ./node_modules/.bin/npm audit fix --force

By definition, you can't exclude a package that a dependency you are using relies on. In other words, if you require package A, and package A claims it is dependent on package B, then removing package B will cause A to either stop working altogether or begin behaving erratically.
Unfortunately this does happen, and your options include:
Ignoring the security warning.
Replacing package A with something else (applies in some cases and not others).
Asking the maintainer of package A to upgrade the version of package B they rely on, possibly opening a pull request yourself.
In your case, though, I'm not sure if your investigation is complete yet - I don't see open in npm's dependency list. Might be worth scrapping your node_modules and re-running npm install, then check again to see who is using open.

This specific warning is targeting at your lockfile, and can be easily fixed by removing the yarn.lock or package-lock.json and reinstall dependencies.

Tarn package manager has feature resulution by which you can set fixed libraries to insecure thirdparties.
See
How do I override nested dependencies with `yarn`?
NPM has something similar.

Can npm symlink node modules to a master directory instead of redownloading?

With npm, when a package requires other packages it creates a tree structure of dependencies. Sometimes a lot of these dependencies depend on the same packages from other packages.
I was wondering, would it be possible to make npm so all packages are stores in the global node_modules and any dependency is just symlinked back to the top of the global node_modules. I understand the version issue, and that can just be handled by storing the package with the version name appended, then symlinking to the proper version.
I feel this would speed up installs and reduce disk usage for duplicate files.
(Is this what npm3 is supposed to do?)

Yes, what you propose would be possible (at least on Linux the symlinks are resolved as expected).
npm (in none of its versions) however does not benefit from symlinks. To gain some of the benefits you proposed, newer versions of npm work as follows: if some package is needed multiple times, npm installs the package as high as possible in the dependency tree. This enables using the same dependency by multiple packages.
For example, no matter how many (sub-)dependencies depend on somedep v. ^1.x.x you got only one copy of somedep. This will probably be placed directly in the root node_modules, so that any sub-dependency can require it.
Older versions of npm do not do this automatically, however, you can invoke the similar effect by running 'npm dedupe'.
Note however, that this approach is weaker than proposed in the question: If 3 of your dependencies depend on somedep v. ^1.x.x and 3 other dependencies depend on somedep v. ^2.x.x, npm obviously cannot put both of these somedeps to the parent node_modules.
Also, check out ied project: https://github.com/alexanderGugel/ied . It does something similar to what you propose, but sadly, it installs only one version of each dependency, which is quite limiting.

Check in node_modules vs. shrinkwrap

Checking in node_module was the community standard but now we also have an option to use shrinkwrap. The latter makes more sense to me but there is always the chance that someone did "force publish" and introduced a bug. Are there any additional drawbacks?

My favorite post/philosophy on this subject goes all the way back (a long time in node.js land) to 2011:
https://web.archive.org/web/20150116024411/http://www.futurealoof.com/posts/nodemodules-in-git.html
To quote directly:
If you have an application, that you deploy, check in all your dependencies in to node_modules. If you use npm do deploy, only define bundleDependencies for those modules. If you have dependencies that need to be compiled you should still check in the code and just run $ npm rebuild on deploy.
Everyone I’ve told this too tells me I’m an idiot and then a few weeks later tells me I was right and checking node_modules in to git has been a blessing to deployment and development. It’s objectively better, but here are some of the questions/complaints I seem to get.
I think this is still the best advice.
The force-publish scenario is rare and npm shrinkwrap would probably work for most people. But if you're deploying to a production environment, nothing gives you the peace-of-mind like checking in the entire node_modules directory.
Alternately, if you really, really don't want to check in the node_modules directory but want a better guarantee there hasn't been a forced push, I'd follow the advice in npm help shrinkwrap:
If you want to avoid any risk that a byzantine author replaces a package you're using with code that breaks your application, you could modify the shrinkwrap file to use git URL references rather than version numbers so that npm always fetches all packages from git.
Of course, someone could run a weird git rebase or something and modify a git commit hash... but now we're just getting crazy.

npm FAQ directly answers this:
Check node_modules into git for things you deploy, such as websites
and apps.
Do not check node_modules into git for libraries and modules
intended to be reused.
Use npm to manage dependencies in your dev
environment, but not in your deployment scripts.
cited from npm FAQ

Advantages of bundledDependencies over normal dependencies in npm

npm allows us to specify bundledDependencies, but what are the advantages of doing so? I guess if we want to make absolutely sure we get the right version even if the module we reference gets deleted, or perhaps there is a speed benefit with bundling?
Anyone know the advantages of bundledDependencies over normal dependencies?

For the quick reader : this QA is about the package.json bundledDependencies field, not about the package.
What bundledDependencies do
"bundledDependencies" are exactly what their name implies. Dependencies that should be inside your project. So the functionality is basically the same as normal dependencies. They will also be packed when running npm pack.
When to use them
Normal dependencies are usually installed from the npm registry.
Thus bundled dependencies are useful when:
you want to re-use a third party library that doesn't come from the npm registry or that was modified
you want to re-use your own projects as modules
you want to distribute some files with your module
This way, you don't have to create (and maintain) your own npm repository, but get the same benefits that you get from npm packages.
When not to use bundled dependencies
When developing, I don't think that the main point is to prevent accidental updates though. We have better tools for that, namely code repositories (git, mercurial, svn...) or now lock files.
To pin your package versions, you can use:
Option1: Use the newer NPM version 5 that comes with node 8. It uses a package-lock.json file (see the node blog and the node 8 release)
Option2: use yarn instead of npm.
It is a package manager from facebook, faster than npm and it uses a yarn.lock file. It uses the same package.json otherwise.
This is comparable to lockfiles in other package managers like Bundler
or Cargo. It’s similar to npm’s npm-shrinkwrap.json, however it’s not
lossy and it creates reproducible results.
npm actually copied that feature from yarn, amongst other things.
Option3: this was the previously recommended approach, which I do not recommend anymore. The idea was to use npm shrinkwrap most of the time, and sometimes put the whole thing, including the node_module folder, into your code repository. Or possibly use shrinkpack. The best practices at the time were discussed on the node.js blog and on the joyent developer websites.
See also
This is a bit outside the scope of the question, but I'd like to mention the last kind of dependencies (that I know of): peer dependencies. Also see this related SO question and possibly the docs of yarn on bundledDependencies.

One of the biggest problems right now with Node is how fast it is changing. This means that production systems can be very fragile and an npm update can easily break things.
Using bundledDependencies is a way to get round this issue by ensuring, as you correctly surmise, that you will always deliver the correct dependencies no matter what else may be changing.
You can also use this to bundle up your own, private bundles and deliver them with the install.

Other advantage is that you can put your internal dependencies (application components) there and then just require them in your app as if they were independent modules instead of cluttering your lib/ and publishing them to npm.
If/when they are matured to the point they could live as separate modules, you can put them on npm easily, without modifying your code.

I'm surprised I didn't see this here already, but when carefully selected, bundledDependencies can be used to produce a distributable package from npm pack that will run on a system where npm is not configured. This is helpful if you have e.g. a system that's not networked / not on the internet: bring your package over on a thumb drive (or whatever) and unpack the tarball, then npm run or node index.js and it Just Works.
Maybe there's a better way to bundle up your application to run "offline", but if there is I haven't found it.

Operationally, I look at bundledDependencies as a module's private module store, where dependencies is more public, resolved among your module and its dependencies (and sub-dependencies). Your module may rely on an older version of, say, react, but a dependency requires latest-and-greatest. Your package/install will result in your pinned version in node_modules/$yourmodule/node_modules/react, while your dependency will get their version in node_modules/react (or node_modules/$dependency/node_modules/react if they're so inclined).
A caveat: I recently ran into a dependency that did not properly configure its dependency on react, and having react in bundledDependencies caused that dependent module to fail at runtime.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string