How to collaborate with multiple npm publishers on one module - node.js

How can we organize our file-system and processes if there are multiple publishers on an npm module? Do we need a common repository (ex. GIT) or is there a smart way to use npm's own publishing & updating process?
The main issue I can't get my head around is that the initial publisher of the package is not able to get the latest version from within the package itself, is he? Unless he installs it as a dependency on another package and then updates & publishes from within that dependency.

It's certainly possible to use npm in this way, although it's probably not what it's intended for. The initial publisher of the package will create the necessary structure, e.g. using npm init, then do an npm publish to make it available to your other publishers.
A couple of things to consider. First, unless you have a tightly controlled system for taking it in turns to edit, you'll almost certainly get merge conflicts, where multiple publishers make changes simultaneously. A version control system like Git can help resolve these conflicts.
Secondly, you may not want your intermediate versions to be publicly available, for many reasons - it's likely you'll want to push out some (possibly incomplete) changes for your co-publishers to build on. Or you just might not want your code to be out in the wild during development. So you may wish to consider a private repository if you do go down this route - e.g. Sinopia, or one of the hosted solutions.
Hope that helps. For info, I use a combination of mercurial (for version control) and a private npm repo (sinopia).

Related

When to add a dependency? Are there cases where I should rather copy the functionality?

I lately helped out on a project, where I added a really small dependency - in fact, it only contained a regular expression (https://www.npmjs.com/package/is-unc-path).
The feedback I got from the developer of the project was that he tries to minimize third-party dependencies if they can be implemented easily - whereby he - if I understand it correctly - asks me to just copy the code instead of adding another dependency.
To me, adding a new dependency looks just like putting some lines of code into an extra file in the repo. In addition, the developers will get informed by an update if the code needs a change.
Is it just a religious thought that drives a developer to do this? Are there maybe any costs (performance- or space-wise, etc) when adding a dependency?
I also had some disputes with my managers once concerning the third party libraries, the problem was even greater he got into believing that you should version the node_modules folder.
The source of any conflict usually is the ignorance.
His arguments were:
you should deliver to the client a working product not needing for him to do any other jobs like npm install
if github, npm is down in the moment when you run npm install on the server what you will do ?
if the library that you install has a bug who will be responsible
My arguments were:
versioning node_modules is not going to work due to how package dependencies work, each library will download his own node_modules dependencies and then your git repository will grow rapidly to hundreds of mb. Deploy will become more and more slow, downloading each time half a gb of code take time. NPM does use a module caching mechanism if there are no changes it will not download code uselessly.
the problem with left-pad was painfull but after that npm implemented a locking system and now for each package you just lock to a specific commit hash.
And Github, and npm does not have just a single instance service, they run in cloud.
When installing a dependency you always have some ideas behind and there are community best practices, usually they resume to: 1. Does the repo has unit tests. 2. The download number 3. When was the latest update.
Node.js ecosystem is built on modularity, it is not that node is so popular cause of some luck, but cause of how it was designed to create modules and reuse them. Sometimes working in node.js environment feels like putting lego pieces together and building your toy. This is the main cause of super fast development in node.js. People just reuse stuff.
Finally he stayed on his own ideas, and I left the project :D.

Does every git branch of an NPM project have different node_modules dependencies?

I assume that when developing an NPM project, that every git branch (or whatever version control system you use) probably points to a different set of node_modules on the filesystem. Is that true? How does that work? Does it pose any problems for diskspace etc?
Or perhaps, since node_modules is most commonly .gitignore'd, then the node_modules files are shared between Git branches? Again, how would/does that work?
*Note that Node.js / NPM is fundamentally different than other platforms/languages since dependencies are typically stored locally to a proejct rather than in some central location on a machine.
By convention, one should not add any files, libraries or binaries which can be generated or pulled in from an external source. This includes things like node_modules; since that is made readily available* once you do npm install, there's no reason or incentive** to want to put that into your source control. At worst, it will also bloat your repository, filling your diffs with things you simply don't control and don't necessarily want to review.
I would not expect different Git branches of an NPM project to contain different node_modules folders. I'd only expect the one node_modules folder, and if a branch gave me fits about dependencies, I'd look to reinstall the dependencies (and note it down to be sure that something else hadn't gone awry).
As an addendum, any files or folders in .gitignore are simply not indexed or tracked by Git. If the contents of those files or folders change, Git is none the wiser. This also means, when switching between branches, the contents of the files or folders in .gitignore remain the same.
*: Provided that the library you're using isn't suddenly yanked. Or the repository is not impacted by a colossal DDoS.
**: There may be some incentive to do this given that the reliability of certain NPM packages hasn't been 100% this year, but that's a team and architecture-driven decision, and I doubt that placing it into source control is the most ideal and convenient way to deal with it.
There are two schools of thought, and both have merit.
1) Never check in node_modules and rebuild on deploy/install
The approach relies heavily on NPM and the connectivity of your deploy environment. node_modules are downloaded and installed (and/or compiled) each time the deploy is run.
Positives:
Your repository is much smaller.
NPM modules are installed in the environment they will run on.
Concerns:
Tied to 3rd party for sources - Go read about that whole left-pad thing. If one dependency cannot be downloaded, your entire build system is hung out to dry. "Cranky and paranoid old timers" will cite this as the reason to check everything in (or run your own private NPM somewhere).
Branch management - Like you mentioned in the question, some branches might not have the same dependencies. Dev1 adds a new features and used a new package. Now Dev2 runs the dev branch or whatever, and everything is broken and they need to know to npm install the new package. More subtle is the case where a npm package is version changed (now you need npm update as npm install will say nothing has changed), or where their node_modules are upgraded to work on "new feature 10" but they need to clear everything out to "downgrade" to go fix "prior bug 43". If you are in active development with a team of more than 2-3, watch out for this one.
Build Time - If it is a concern, it takes a little longer to download and install everything. Or a lot longer.
2) Always check in everything you can
This approach includes node_modules as part of the repo.
Positives:
Not dependent on 3rd party sources. You have what you need to run. You code can live on its own forever, and it does not matter if npm is down or a repo is deleted.
Branches are independent, so new features from Dev1 are auto included when Dev2 switches to that branch
Deploy time is shorter because not much needs to be installed.
Concerns:
Repository is much larger. Clones of code take longer as there are many more files.
Pull Requests need extra care. If a package is updated (or installed) along with core code, the PR is a mess and sometimes unintelligible. "500 files changed", but really you updated a package and changed two lines of core code. It can help to break down into two PRs - one that is is a mess (the package update) and one that is actually reviewable (the core code change). Again, be prepared for this one. The packages will not change too often, but your code review takes a little longer (or a little more care) when they do.
OS Dependent Packages can break. Basically anything that is installed/compiled with gyp can be OS dependent (among others). Most packages are "pure JS" and, being just scripts, run everywhere. Imagine all your devs run and test on OSX while you deploy to Linux, you cannot check in those packages that were compiled on a MAC because they will not run on Linux. An odd workaround for this is to define most packages as "dev dependencies" (--save-dev) and the ones that need compiled as normal ("production", --save), then you run npm install --production so the dev dependencies are not installed (and are already present), but the others are.
Conclusions
It depends. (Don't you hate hearing that all the time? : )
Depending on your team and your concerns, you might go either approach. Both have their merits, and you will decide which is more beneficial to you. Both have drawbacks as well, so be aware of those before you get bit!
Personally I ignore .node_modules but I have different package.json in different branch and when I switch I reinstall the dependencies
Two branches having different set of node modules is in scenario where one branch is in development phase and other is your production branch. In such cases development branch will have more node modules than production. If I am not wrong any other scenario might get you in trouble.
Pushing node_modules to remote version control repository is bad practice hence just rely on npm install whenever you clone a branch or pull the code to download any new node module added to package.json.
Apparently, since you don't have your node_modules in your actual repository, you need to install node modules again and each branch might have its own requirement, as you might update your server.js with new dependency and you also need to make sure you have these newly added node dependencies in your production server as well.

How to deal with local package dependencies in nodejs with npm

How should we deal with local packages that are a dependency in other local packages?
For simplicities sake, say we have the follow packages
api - express application
people - a package to deal with people
data-access - a package that deals with data access
And then the dependencies are
api depends on people
people depends on data-access
Currently we have these dependencies setup as file dependencies.
I.e. api package.json would have
"dependencies": {
"people": "file:../people"
}
Trouble with this is that we're finding it a PITA when we make updates to one package and want those changes in the other packages that depend on it.
The options we have thought of are:
npm install - but this won't overwrite previously installed packages if changes are made, so we have to delete the old one from the node_modules directory and re-run npm install... which can be niggly if the package dependency is deep.
npm link - we're not sold on the idea because it doesn't survive version control... Just thinking about it now, maybe we have some kind of local build script that would run the npm link commands for us... this way it could survive version control. Would that be a grunt job?
grunt - we haven't dived too deep into this one yet, but it feels like a good direction. A little bit of googling we came accross this: https://github.com/ahutchings/grunt-install-dependencies
So, what option would work best for our situation?
Are there other options that we haven't thought of yet?
Ps. we're a .NET shop doing a PoC in node, so assume we know nothing!
Pps. if you strongly believe we're setting up our project incorrectly and we shouldn't have smaller individual packages, let me know in the comments with a link to some reading on the subject.
So, I agree that going with 'many small packages' is usually a good idea. Check out 12factor.net if you haven't already.
That said, in specific answer to your question I'd say your best bet is to consider mainly how you want to maintain them.
If the 'subcomponents' are all just parts of your app (as, for example, data-access implies), then I'd keep them in the same folder structure, not map them in package.json at all, and just require them where you need them. In this case, everything versions together and is part of the same git repository.
If you really want to or need to keep them all in separate git repositories, then you can do npm link, but to be honest I've found it more useful to just use the URL syntax in package.json:
dependencies: {
"people" : "git://path.to.git:repo#version.number"
}
Then, when you want to explicitly update one of your dependencies, you just have to bump the version number in your package.json and run npm install again.

Does the case for not including Node modules in version control also apply to Composer packages?

In doing research on whether Node's node_modules should be checked into your version control repository, the most recent consensus seems to be that you should include it for applications you deploy.
Sources:
Check in node_modules vs. shrinkwrap
Should I check in node_modules to git when creating a node.js app on Heroku?
https://www.npmjs.org/doc/misc/npm-faq.html#Should-I-check-my-node_modules-folder-into-git
In reading these arguments, it lead me to question whether Composer's /vendors directory should also be checked into version control. Composer's documentation suggests that you don't:
Should I commit the dependencies in my vendor directory?
The general recommendation is no. The vendor directory [...] should be added to .gitignore/svn:ignore/etc.
The best practice is to then have all the developers use Composer to install the dependencies. Similarly, the build server, CI, deployment tools etc should be adapted to run Composer as part of their project bootstrapping.
While it can be tempting to commit it in some environment, it leads to a few problems:
Large VCS repository size and diffs when you update code.
Duplication of the history of all your dependencies in your own VCS.
[...]
Contrasting that argument is this one (source):
Doesn’t checking in node_modules create a lot of noise in the source tree that isn’t related to my app?
No, you’re wrong, this code is used by your app, it’s part of your app, pretending it’s not will get you in to trouble. You depend on other people’s code and they are just as likely to write new bugs as you are, probably more so. Checking all of that code in to source control gives you a way to audit every line that ever changed in your application. It allows you to use $ git bisect locally and be ensured that it’s the same as in production and that every machine in production is identical. No more tracking down unknown changes in dependencies, all the changes, in every line, are viewable in source control.
In summary, the question is this: Why would one gitignore (i.e. not version control) node_modules but not do the same for Composer's vendor/ directory?
The reason to commit external dependencies is
it's easier to deploy with git pull
the code used is directly included in the commit anyone checks out
Arguments against this are
Git is no deployment tool
all dependency managers do have a way to make exactly sure the code used can be fetched
I don't know anything about npm, but for Composer that last point is implemented by committing composer.lock.
I don't think the "audit code" argument is a valid one in every case. If you do write software that get's audited by itself, and subsequently needs all libraries to be audited, then probably all code changes need to be conserved. This isn't true for the general case.
git bisect works still as well with a committed composer.lock. It does require installing the dependencies with every bisect step, but this is just one simple step, probably already done in the automatic test suite run.
The last thing to worry about is when used packages go offline. This really is more of a problem with Composer, because there is no central hosting of the downloadable releases (npm probably does this). If this is a problem, either commit the code (and try to figure out how to update that missing package in the future), or setup an instance of Satis to create a local copy of the code you use.
Putting all your modules in you VCS makes it really heavy to download or upload. For example, I work on two node.js projects and the total node_modules directories size is between 250MB and 500MB whereas the whole code with assets is generally less than 40MB. Every Node.js developer likes Node lightness, so the code must stay easy to download and share.
For the second point, an alternative to avoid regressions is to be more restrictive in your package.json dependencies versions. You will find more information here.
Finally the best argument is to take a look on the famous modules everybody know :
express ignores
node_modules
mocha ignores
node_modules
q ignores node_modules
...
The more I research this the more I'm starting to think that there is no one correct answer to this, just different opinions as well as pros and cons of each method.
This blog, looking from the context of Bower, does a good job of weighing the pros and cons of each: http://addyosmani.com/blog/checking-in-front-end-dependencies/.
In short: At least for right now, weigh the pros and cons and decide what best fits your situation.

Can I run a private npm repository without replicating the public repository?

I'm writing a number of pieces of code (for internal use) using node.js and want to store the modules (packaged up for npm) in a package repository for each distribution to the various machines they will be installed on.
Ideally, I'd like a solution similar to Debian's apt repositories in which I can run a private repository server and configure npm to use a list of repositories to install from (When installing "foo", if "foo" is known by my private server install it from there, otherwise install it from the public server).
However, it looks like the npm registry configuration key only accepts a single URL.
Is there a way to achieve what I want?
The closest I've been able to find have been:
Mirroring the public repository locally and adding my packages on top of it… but I don't want to keep that amount of data (2.5G and still downloading) replicated on AWS.
Hosting all my packages in git repositories and installing from there (which is more of a hassle).
Hosting static packages on HTTP (as far as I can tell, this would prevent me from automatically getting "the latest version". I suppose I could do something with symlinks, but that is still less flexible than git, requires full URLs (which need to be kept up to date), and doesn't give a searchable repository.
I just set this up for my work. Here's what I did:
Setup empty NPM registry: I followed the instructions from this fork of npmjs.org, which adds much improved documentation.
Setup Kappa: I used Kappa, a great npm proxy from Paypal. (I'm guessing they have a very similar use case to most people who want a private repository; this was exactly what I wanted).
Setup npm_lazy (optional): I wanted a nice cache of frequently used packages in case npmjs.org went down, so I added npm_lazy in front of the whole thing, as a caching layer.
Whole thing took two days(ish) to get up and running. As a side note, if you're worried about people pushing to the public registry by accident, I recommend adding this to your package.json:
"publishConfig": { "registry": "http://my-registry.example.com" },
This really is just a bit of paranoia; once you setup your npm to point to your Kappa/npm_lazy instance, Kappa handles publishing to your private repository for you.
Note: Kappa will only every publish to the first repository in it's config. If you need to publish to both your private registry, and the public, you will need to work out your own solution.
In your package.json, you can use any url that points to a valid npm packed module. I use an s3 with a bucket name that is hard to guess.
npm pack
s3cmd put *.tgz s3://path-to-your bucket
S3 is just an example, you could use any mean that can place a file on a web server, it can even be protected via basic auth.
I believe Paypal's Kappa project would suit your need.
Here is an article describing Paypal's Kraken project and how Kappa fits in.
I understand it wasn't available at the time of Quentin's question, but perhaps this will be useful for others that come along here.
npm-registry-client GitHub issue #42 lists several ways how to create your own repository mirror. namely:
https://github.com/cnpm/cnpmjs.org
https://github.com/rlidwka/sinopia
https://github.com/rlidwka/sinopia#similar-existing-things
https://github.com/davglass/registry-static
Overall it seems to me that you can get best answers by searching through issues in repositories owned by https://github.com/npm or by asking your question there
Based only on listening to a recent episode of NodeUp (#37?), I think you may want to have a look at irisnpm. From what I remember it's a service which gives you a merged set of the public modules and your own private modules.
As Dominic Barnes suggested, we can replicate only _design documents (CouchDB table schemes)
How to replicate design documents only?
However, it needs to check if some data needed.
You could replicate the modules you need and then write a proxy server which looks for a module in your replication. If a module doesn't exist, it could pipe the request to NPM and return the result from there.

Resources