Someday I just got curious about node_module in framework or UI libraries such as React. After searching some stuff, I found there should be no changes in node_modules unless the user really needs to, so here's my questions.
Why there shouldn't be changes in node_modules?
Even I change the code, there were no change in result. Why does this happen? Even deleting the file or folder inside node_modules there were no changes. (I thought it should show an error, but it worked Ok...)
When we start the framework (like npm start in React), does NPM downloads the external files for example from Github everytime and places in the DOM? If that's right, the files in node_modules are just readable ones?
Could someone give me an answer?
node_modules are the libraries / packages / modules (whatever name you call) written by the open source community. They can be inter-depending. If you change one of those files without reviewing the impact to their dependent, the execution of code may crash.
However, not every single file or every single line of codes are required for each execution of code. Most of the time, one package can do things way more than what your code truly needed. If your code doesn't depend on the files that you changed, your project can still run happily.
npm start doesn't download files automatically. npm install does. So files in node_modules are not readable only. However, in many case, files in node_modules were ignored from git commit. In server environment, packages are freshly pulled from remote, instead of from your local machine. Therefore your changes to packages would not be deployed unless you explicitly do so.
Technically you can modify the files in node_modules and NOT running npm update forever - not a good commercial practice. Acceptable for personal project, if you are the sole programmer and can fully control when to update packages.
Well, if you change your node module an npm update will eventually overwrite your code and you will lose your functionality possibly even without knowing where the problem is.
Related
Using Angular with an ASP.NET MVC project and ive moved the codebase to another path on my hard drive. When I build, I get errors complaining about not being able to find packages. I dont think this is so much of an Angular issue (using System.js module loader), but rather a Node issue related to finding packages.
The fix so far has been to simply delete everything in node_modules and get them again. Is there a way to avoid having to do this? Otherwise, if I check my code into our source control system and someone else pulls it down, they will run into this issue as well.
[update]
When I am making a copy of the project, it includes the node_modules as well. I intend to check in these into source control as well, so that we can control when packages get updated and the dependency issues that might be caused.
[update 2]
Well I think I need to go back and review what Im doing. I never liked the idea of keeping node_modules in source to begin with and if I can find a way to manage "breaking changes" due to package updates, then I can forego the mess and bloat from keeping node_modules in my source control system.
https://www.sitepoint.com/beginners-guide-node-package-manager/
When someone else pull the code or when you move your code in other folder YOU HAVE TO RUN A NPM NISTALL command .. so it can download the packages ..
if you have already downloded them ..don't worry to do a NPM INSTALL cause it takes them from the cache ....
for other people they need first time to download them for first time
Hope it helps you!!
I lately helped out on a project, where I added a really small dependency - in fact, it only contained a regular expression (https://www.npmjs.com/package/is-unc-path).
The feedback I got from the developer of the project was that he tries to minimize third-party dependencies if they can be implemented easily - whereby he - if I understand it correctly - asks me to just copy the code instead of adding another dependency.
To me, adding a new dependency looks just like putting some lines of code into an extra file in the repo. In addition, the developers will get informed by an update if the code needs a change.
Is it just a religious thought that drives a developer to do this? Are there maybe any costs (performance- or space-wise, etc) when adding a dependency?
I also had some disputes with my managers once concerning the third party libraries, the problem was even greater he got into believing that you should version the node_modules folder.
The source of any conflict usually is the ignorance.
His arguments were:
you should deliver to the client a working product not needing for him to do any other jobs like npm install
if github, npm is down in the moment when you run npm install on the server what you will do ?
if the library that you install has a bug who will be responsible
My arguments were:
versioning node_modules is not going to work due to how package dependencies work, each library will download his own node_modules dependencies and then your git repository will grow rapidly to hundreds of mb. Deploy will become more and more slow, downloading each time half a gb of code take time. NPM does use a module caching mechanism if there are no changes it will not download code uselessly.
the problem with left-pad was painfull but after that npm implemented a locking system and now for each package you just lock to a specific commit hash.
And Github, and npm does not have just a single instance service, they run in cloud.
When installing a dependency you always have some ideas behind and there are community best practices, usually they resume to: 1. Does the repo has unit tests. 2. The download number 3. When was the latest update.
Node.js ecosystem is built on modularity, it is not that node is so popular cause of some luck, but cause of how it was designed to create modules and reuse them. Sometimes working in node.js environment feels like putting lego pieces together and building your toy. This is the main cause of super fast development in node.js. People just reuse stuff.
Finally he stayed on his own ideas, and I left the project :D.
I assume that when developing an NPM project, that every git branch (or whatever version control system you use) probably points to a different set of node_modules on the filesystem. Is that true? How does that work? Does it pose any problems for diskspace etc?
Or perhaps, since node_modules is most commonly .gitignore'd, then the node_modules files are shared between Git branches? Again, how would/does that work?
*Note that Node.js / NPM is fundamentally different than other platforms/languages since dependencies are typically stored locally to a proejct rather than in some central location on a machine.
By convention, one should not add any files, libraries or binaries which can be generated or pulled in from an external source. This includes things like node_modules; since that is made readily available* once you do npm install, there's no reason or incentive** to want to put that into your source control. At worst, it will also bloat your repository, filling your diffs with things you simply don't control and don't necessarily want to review.
I would not expect different Git branches of an NPM project to contain different node_modules folders. I'd only expect the one node_modules folder, and if a branch gave me fits about dependencies, I'd look to reinstall the dependencies (and note it down to be sure that something else hadn't gone awry).
As an addendum, any files or folders in .gitignore are simply not indexed or tracked by Git. If the contents of those files or folders change, Git is none the wiser. This also means, when switching between branches, the contents of the files or folders in .gitignore remain the same.
*: Provided that the library you're using isn't suddenly yanked. Or the repository is not impacted by a colossal DDoS.
**: There may be some incentive to do this given that the reliability of certain NPM packages hasn't been 100% this year, but that's a team and architecture-driven decision, and I doubt that placing it into source control is the most ideal and convenient way to deal with it.
There are two schools of thought, and both have merit.
1) Never check in node_modules and rebuild on deploy/install
The approach relies heavily on NPM and the connectivity of your deploy environment. node_modules are downloaded and installed (and/or compiled) each time the deploy is run.
Positives:
Your repository is much smaller.
NPM modules are installed in the environment they will run on.
Concerns:
Tied to 3rd party for sources - Go read about that whole left-pad thing. If one dependency cannot be downloaded, your entire build system is hung out to dry. "Cranky and paranoid old timers" will cite this as the reason to check everything in (or run your own private NPM somewhere).
Branch management - Like you mentioned in the question, some branches might not have the same dependencies. Dev1 adds a new features and used a new package. Now Dev2 runs the dev branch or whatever, and everything is broken and they need to know to npm install the new package. More subtle is the case where a npm package is version changed (now you need npm update as npm install will say nothing has changed), or where their node_modules are upgraded to work on "new feature 10" but they need to clear everything out to "downgrade" to go fix "prior bug 43". If you are in active development with a team of more than 2-3, watch out for this one.
Build Time - If it is a concern, it takes a little longer to download and install everything. Or a lot longer.
2) Always check in everything you can
This approach includes node_modules as part of the repo.
Positives:
Not dependent on 3rd party sources. You have what you need to run. You code can live on its own forever, and it does not matter if npm is down or a repo is deleted.
Branches are independent, so new features from Dev1 are auto included when Dev2 switches to that branch
Deploy time is shorter because not much needs to be installed.
Concerns:
Repository is much larger. Clones of code take longer as there are many more files.
Pull Requests need extra care. If a package is updated (or installed) along with core code, the PR is a mess and sometimes unintelligible. "500 files changed", but really you updated a package and changed two lines of core code. It can help to break down into two PRs - one that is is a mess (the package update) and one that is actually reviewable (the core code change). Again, be prepared for this one. The packages will not change too often, but your code review takes a little longer (or a little more care) when they do.
OS Dependent Packages can break. Basically anything that is installed/compiled with gyp can be OS dependent (among others). Most packages are "pure JS" and, being just scripts, run everywhere. Imagine all your devs run and test on OSX while you deploy to Linux, you cannot check in those packages that were compiled on a MAC because they will not run on Linux. An odd workaround for this is to define most packages as "dev dependencies" (--save-dev) and the ones that need compiled as normal ("production", --save), then you run npm install --production so the dev dependencies are not installed (and are already present), but the others are.
Conclusions
It depends. (Don't you hate hearing that all the time? : )
Depending on your team and your concerns, you might go either approach. Both have their merits, and you will decide which is more beneficial to you. Both have drawbacks as well, so be aware of those before you get bit!
Personally I ignore .node_modules but I have different package.json in different branch and when I switch I reinstall the dependencies
Two branches having different set of node modules is in scenario where one branch is in development phase and other is your production branch. In such cases development branch will have more node modules than production. If I am not wrong any other scenario might get you in trouble.
Pushing node_modules to remote version control repository is bad practice hence just rely on npm install whenever you clone a branch or pull the code to download any new node module added to package.json.
Apparently, since you don't have your node_modules in your actual repository, you need to install node modules again and each branch might have its own requirement, as you might update your server.js with new dependency and you also need to make sure you have these newly added node dependencies in your production server as well.
To be completely specific:
I am writing a Node.js app that is intended to be a websocket bot for Slack.
A Node project exists that abstracts the majority of the Slack API. (It is NOT an npm module.)
I'm not overly familiar with grunt, etc. but I can get the dependencies to install and utilize all this code by placing my own mybot.js in the root folder of this git clone and running node mybot.js with mybot.js being based on the files in the example folder.
Committing to my own repository, I don't want to commit any of the aforementioned project code -- it's not mine! I do, however, want it as a dependency. Unfortunately, this code by Slack is not an npm module that makes it easy to do. The project has a /bin folder and a /src folder full of coffee script, etc. that grunt builds to .js files.
The Slack project code has its own dependencies. In my way of thinking, those are sub-dependencies for me, or cascading dependencies. My project only depends on whatever the Slack project depends on.
I would like to be able to update my project with updates (manually, or via build) from the git repo of the Slack project as needed.
It seems there must be a way for me to include this project as a dependency, and once built, properly reference it's bin and src folder objects (bin/slack, src/message, client, channel, user, etc.) without committing it to my own repository. Especially great if it could be in a subfolder separate from my own model definitions. In a way, this seems no different to me than including jQuery in my website layout via a CDN. I'm only asking for the jQuery project and depending on my link flavor, I can get a specific version or the latest version, etc.
So, it turns out the comment by Ben pointing me to the npmjs.com slack-client npm module was the help I really needed. I just didn't really know how to ask the right question, I think.
And while I hate to look a gift horse in the mouth, a little more than a link, Ben, would've saved me another three hours, probably. Perhaps: "It is an npm module, not just a project from github." But thank you, even if it took me a while to decipher what you were saying.
What exactly should I put in .npmignore?
Tests? Stuff like .travis.yml, .jshintrc? Anything that isn't needed when running the module (except the readme)?
I can't find any guidance on this.
As you probably found, NPM doesn't really state specifically what should go in there, rather they have a list of ignored-by-default files. Many people don't even use it as everything in your .gitignore is ignored in npm by default if .npmignore doesn't exist. Additionally, many files are already ignored by default regardless of settings and some files are always excluded from being ignored, as outlined in the link above.
There is not much official on what always should be there because it is basically a subset of .gitignore, but from what I gather from using node for 5-ish years, here's what I've come up with.
Note: By production I mean any time where your module is used by someone and not to develop on the module itself.
Pre-release cross-compiled sources
Pros: If you are using a language that cross-compiles into JavaScript, you can precompile before release and not include .coffee files in your package but keep tracking them in your git repository.
Build file leftovers
Pros: People using things like node-gyp might have object files that get generated during a build that never should go into the package.
Cons: This should always go into the .gitignore anyway. You must place these things inside here if you are using a .npmignore file already as it overrides .gitignore from npm's point of view.
Tests
Pros: Less baggage in your production code.
Cons: You cannot run tests on live environments in the slim chance there is a system-specific failure, such as an out of date version of node running that causes a test to fail.
Continuous integration settings/Meta files
Pros: Again, less baggage. Things such as .travis.yml are not required for using, testing, or viewing the code.
Non-readme docs and code examples
Pros: Less baggage. Some people exist in the school-of-thought where if you cannot express at least minimum viable functionality in your Readme, your module is too big.
Cons: People cannot see exhaustive documentation and code examples on their own file system. They would have to visit the repository (which also requires an internet connection).
Github-pages objects
Pros: You certainly don't need to litter your releases with CNAME files or placeholder index.htmls if you use your module serves double-duty as a gh-pages repository as well.
bower.json and friends
Pros: If you decide to build in your dependencies prior to release, you don't need the end-user to install bower then install more things with that. I would, personally, keep that stuff in the package. When I do an npm install, I should only be relying on npm and no other external sources.
Basically, you should ever use it if there is something you wish to keep out of your npm package but checked-in to your module's repo. It's not a long list of items, but npm would rather build in the functionality than having people stuck with irrelevant objects in their package.
I agree with lante's short and syntetic answer and SamT's big answer:
You should not include your tests in your package.
Your package should only contains production runtime files.
That will make your package more straightforward and faster to be dowloaded.
My contribution to those answers:
.npmignore is the blacklist way to achieve package file selection. But in a more practical way, you can whitelist files you need to include in your package using the files field in your package.json:
{
"files": [
"lib/",
"index.js"
]
}
I think that's simpler, future proof and have better semantics ;)
Just to clarify, anytime someone do npm install your-library, npm will download all source files that the package includes. Those files that were included in the .npmignore file in the source code of the package your-library will be excluded when publishing the lib, so users of your-library won't download them.
Know that people installing your library will need just your library running, anything else will be not necessary.
For example, when someone installs a library, its probably that he/she doesn't care about your .travis.yml or your .jshintrc files, or even some images, Grunt files, documentation, etc.
.npmignore could let your npm package to have less files, and faster to be downloaded
Don't include your tests. Oftentimes tests are like 5x the size of the actual codebase. As long as your tests are on Github, etc, that's good enough.
But what you absolutely should do is test your NPM package in its published format. Create some smoke tests that reside in the actual codebase, but are not part of the test suite.
You can read about testing your package after tarballing it, here:
https://github.com/ORESoftware/r2g
How to test an `npm publish` result, without actually publishing to NPM?