pre-cache node_modules in Docker container - node.js

It frustrates me that CI builds for projects which use Node tool chains such as Grunt and Gulp take quite a long time, the bulk of which is consumed by npm install.
I've tried to set up a Docker image, pre-baked with all of the node_module dependencies in the npm cache (each at the same fixed release as declared in my package.json file), but even then the build still takes a few minutes when all it really should need to do is to copy a few directories from the npm cache into my project's node_modules.
I've set cache-min to 9999999, but it still seems to take much longer than it shoul need to.
I've looked local-npm and npm_lazy but they seem over the top, and the former takes ages to install - I suspect that it's trying to download every single npm module in existence - I only need a limited number and don't need to be running a web server to serve them from within the Docker container.
...am I missing something? There must be a faster way to run a CI build...

I was able to get it to work by using .npmrc to point to the npm cache within the docker container. I would suggest you to docker exec into your container and run npm config list | grep cache to ensure that the cache is used.

Related

Is it mandatory for each deployment to production for remove node modules and run npm install?

I use vuetify (vue)
Is it mandatory for each deployment to production for remove node modules and run npm install? Or just run npm run build?
I have two option :
Option 1 : Every deployment, I run the npm run build directly
Option 2 :
Delete the contents of dist folder
Delete node_modules folder
npm install
npm run build
Which is the best option?
npm install
This command installs a package, and any packages that it depends on. If the package has a package-lock or shrinkwrap file, the installation of dependencies will be driven by that, with an npm-shrinkwrap.json taking precedence if both files exist. See package-lock.json and npm-shrinkwrap.
If you did not install or update the package before releasing the project, you do not need to execute npm install, otherwise, you need to execute it to ensure that dependent packages on the production environment is consistent with your local dependent package version.
If you are using an automatic build deployment tool like jenkins, for convenience you can execute the install command before each build. It's okay.
Imagine more environments, not just a production:
development
testing1
staging
uat
production
Can we upload the npm run build result (compressed js) or node_modules to our git repository? ANSWER IS NOT!!. So if you need to have a version of your app running in any of these environments, you must to execute npm run build. And this command needs the classic npm run install. I think this last sentence, answer your question.
(ADVICE) Docker to the rescue
assumption 1 your client-side app (vue) is not complex(no login, no session, no logout, etc ), you could publish it using a basic nginx, apache, basic-nodejs.
assumption 2 you are able to have one more server for docker private repository. Also if you are in google, amazon o azure, this service is ready to use, of course a payment is required
In one line, with docker you must execute just one time npm install and npm run build. Complete flow is:
developer push some changes to the git repository
manually or automatically a docker build in launched.
inside Dockerfile, npm install and npm run build is executed. Also a minimal server with nodejs (example) is configured pointing to your builded assets
your new docker image is uploaded to your docker private repository
that's all
If your quality assurance team needs to perform some tests to your new app, just a docker image download is required. If everything is ok, you pass to the next stage (staging or uat) or production. Steps will be the same: just download the docker image.
Optimizations
Use docker stages to split build and start steps
If your app does not have complex flows(no login, no session, no logout, etc ), replace node basic server with a simple nginx
I need login and logout
In this case, nginx or apache does not helps you because they are a simple static servers.
You could use a minimal nodejs code like this:
https://github.com/jrichardsz/nodejs-static-pages/blob/master/server.js
Adding /login , /logout, etc
Or use my server:
https://github.com/utec/geofrontend-server
which has a /login, /logout and other cool features for example: How are you planning to pass your backend api urls to your vue app in any of your environments?.

Continuous integration: Where to build the project?

I have a Jenkins server on which I observe a private git repository for changes, which then triggers a pipeline script (the repository contains a nodejs app). In this pipeline script I need to do the following steps:
Install dependencies (npm install)
Build my application (npm run build, which creates a dist folder)
Build a docker container (docker build) and run the container (which runs a script in the dist folder)
Which of the following two options would be the recommended way to do this, and why?
Option A: Run npm install and npm run build in the jenkins pipeline and copy the dist folder to the docker container during the docker build. This would allow me to only install runtime dependencies in the docker container using npm install --only=production, therefore reducing the image size significantly.
Option B: Run npm install and npm run build during docker build (In the Dockerfile). This would allow me to run the docker container outside the CI server if I have to (I don't have a use case for it now, but it seems cleaner because it is more independent). However, the image size would significantly increase and I am not sure if this is the recommended way.
Any suggestions?
I would choose option B.
The reason behind it would be that there are some npm packages that runs a node-gyp, gcc, and other platform-dependent builds.
Look at the popular bcrypt package as an example.
Going with option A would mean that your docker and Jenkins machine need to hold the same infra for such builds which is not common, to say the least.

Nodejs private module and Docker containers

I've got a nodejs project that references a module I wrote and hosted by private github repo. Dependencies in package.json look something like this:
"dependencies": {
... other stuff ...
"my_module": "git+https://github.com/me/mymodule.git",
}
That's fine, but I'd like to create a Docker container for the application, but I don't want git inside the container. I know I can host via private npm repos, but I'd love to find a way to have the build process pull the source (including that module) and then copy it to the container.
I'm fine with doing an npm install in the container, but it will not like the git dependency. Alternatively, I don't want to do an npm install on the build machine because I want the freedom to choose any container I want... I don't want the build machine to snag windows binaries to a mongo module, for example, and copy those to my debian container.
One option I considered was putting the dependency to "my_module" in devDependencies, then within the Docker container do "npm install --production", then copy the one module over. That's just inconsistent with the intent of devDependencies.
Any better/recommended solutions? I'm open to not hosting the module in github if there's a better way (but I use it on a few projects that only make sense for this client).
Theres a pretty easy solution to this. Build the node application
npm install etc
Then in your dockerfile include the COPY command, telling it where the node projects install directory is, and where you want it to copy to.
Edit:
To address the issue brought up by #angelok you should use npm rebuild once it's copied into the docker image so that it builds with the correct dependencies relative to the OS of the Docker image instead of the OS in which the node packages were initially installed. See docs for rebuild here.

Docker and node_modules - put them in a layer, or a volume?

I'm planning a docker dev environment and doubtful whether running npm install as a cached layer is a good idea.
I understand that there are ways to optimize dockerfiles to avoid rebuilding node_modules unless package.json changes, however I don't want to completely rebuild node_modules every time package.json changes either. A fresh npm install takes over 5 minutes for us, and changes to package.json happen reasonably frequently. For someone reviewing pull requests and switching branches quite often, they could have to suffer through an infuriating amount of 5 minute npm installs each day.
Wouldn't it be better in cases like mine to somehow install node_modules into a volume so that it persists across builds, and small changes to package.json don't result in the entire dependency tree being rebuilt?
Yes. Don't rebuild node_modules over and over again. Just stick them in a data container and mount it read only. You can have a central process rebuild node_modules now and then.
As an added benefit, you get a much more predictable build because you can enforce that everyone uses the same node modules. This is critical if you want to be sure that you actually test the same thing that you're planning to put in production.
Something like this (untested!):
docker build -t my/module-container - <<END_DOCKERFILE
FROM busybox
RUN mkdir -p /usr/local/node
VOLUME /usr/local/node
END_DOCKERFILE
docker run --name=module-container my/module-container
docker run --rm --volumes-from=module-container \
-v package.json:/usr/local/node/package.json \
/bin/bash -c "cd /usr/local/node; npm install"
By now, the data container module-container will contain the modules specified by package.json in /usr/local/node/node_modules. It should now be possible to mount it in the production containers using --volume-from=module-container.

Speeding up the npm install

I am trying to speed up the npm install during the build process phase. My package.json has the list of packages pretty much with locked revisions in it. I've also configured the cache directory using the command
npm config set cache /var/tmp/npm-cache --global
However, on trying to install using npm install -g --cache, I find that this step isn't reducing the time to install by just loading the packages from cache as I would expect. In fact, I doubt if it's even using the local cache to look up packages first.
Proposing two more modern approches:
1) npm ci
Use npm ci, which is available from npm version 5.7.0 (although I recommend 5.7.1 and upwards because of the broken release) - this requires package-lock.json to be present and it skips building your dependency tree off of your package.json file, respecting the already resolved dependency URLs in your lock file.
A very quick
boost for your CI/CD envs (our build time was cut down to a quarter of the original!) and/or to make sure all your developers sit on the same versions of dependencies during development (without having to hard-code strict versions in your package.json file).
Note however that npm ci removes the node_modules/ directory before installing, so it won't benefit from any caching strategies.
2) npm i --prefer-offline
Use the --prefer-offline flag with your regular npm install / npm i. With this approach, you need to make sure you've cached your node_modules/ directory between builds (in a CI/CD environment). If it fails to find packages locally with the specific version, it falls back to the network safely.
You can also add --no-audit --progress=false to reduce pre-install checks and remove the progress bar (latter is only a very slight improvement)
For pure npm solution, you may try
npm install --prefer-offline --no-audit --progress=false
Prefer offline may not be useful for the first run.
As suggested by #Daniel Serodio
You could also include your node_modules folder inside your repository but you should probably zip it first than add to repo, and while installing you can unzip it and just
npm rebuild
(which works cross platform) it is quite fast.
This would also give you the benefit of full control over all your dependencies.
Also you can set the process flag to false to increase your speed by 2x.
npm set progress=false
Read source for more info
Update:
You can also use pnpm for this
npm i -g pnpm
This basically use local cached modules (i have heard its better then YARN)
It's better to install pnpm package using the following command:
npm i -g pnpm
pnpm uses hard links and symlinks to save one version of a module only ever once on a disk. When using npm or Yarn for example, if you have 100 projects using the same version of lodash, you will have 100 copies of lodash on disk. With pnpm, lodash will be saved in a single place on the disk and a hard link will put it into the node_modules where it should be installed.
As an example I can mention that whenever you want to install the dependencies of package.json file, what you should do is simply that enter the pnpm i and it handles the other things by itself.
UPDATE: The original answer is from 2014. I wouldnt recommend checking in node_modules, as there are definitly better options around speeding up the install especially for a ci pipeline, eg. npm ci --only=production
You could also include your node_modules folder inside your repository (you are probably using git), and just npm rebuild (which works cross platform) on build/deploy processes, and is pretty fast.
This would also give you the benefit of full control over all your dependencies (I know that's what shrinkwrap usually should be used for)
Edit:
Also you can set the progress flag to false to increase your speed by at least 20%. This works only with npm#v3.x.x, and there will be hopefully fixes for that soon (see second link)
npm set progress=false
Tweet about finding
Github Issue Cause identification
As very modern solution you can start to use Docker.
Docker allows you virtualize and pre-define as image the current state of your code, including installed npm-modules and other goodies.
Once the docker image for your infrastructure/env is built locally, or retrieved from remote repository, it will be stored on the host machine, and you can spin server in seconds.
Another benefit of it is that you use same virtualized code infrastructure on any machine where you deploy your code.
Docker speeds up install/deployment processes and is widely used technology.
To start using docker is enough to (all the snippets are just mock/example for pre-setup and are not by any means most robust/elegant solution) :
1) Install docker and docker-compose using manuals and get some basic understanding of it at docker.com
2) Write Dockerfile file in root of your application
FROM node:6.9.5
RUN mkdir /usr/local/app
WORKDIR /usr/local/app
COPY package.json package.json
RUN npm install
3) create docker-compose.yml in the root of your project with such content:
version: "2"
server:
hostname: server
container_name: server
image: server
build: .
command: sh -c 'NODE_ENV=development PORT=8080 node app.js'
ports:
- "8080:8080"
volumes: #list of folders and files to use
- ${PWD}/server:/usr/local/server
- ${PWD}/app.js:/usr/local/app.js
4) To start server you will need to docker-compose up -d. To see the logs docker-compose logs -f server. If you will restart your server it will do it in seconds once it built the image already at once.
Then it will cache build layers locally so next run will take only few seconds.
I know this might be bit of a robust solution, but I am sure it is have most potential/flexibility and is widely used in industry. And while it requires some learning for anyone who did not use Docker before, in my humble oppinion, it is the best one for your problem.
Nothing helped me more than disabling antivirus (Windows Defender in my case) I got from 2:30 to 1 minute.
With npm-cache package I got to ~30 secs.
I tried to use yarn, which is very fast, but was randomly failing in my case.
We have been trying to solve this problem to speed up our deployments.
We have settled on using pac, which follows the principles in the other answers. It zips the npm modules and includues them in your repo so you don't have a million files in your commits and code reviews and you can just unzip/rebuild for the target machine.
https://www.npmjs.com/package/pac

Resources