How to cache node_modules on Docker build with version? - node.js

To cache node_modules I add the package.json first then I run npm i inside docker image.
which is works great. but I also need to have version inside the package.json, and each deploy/build I increment the version number.
Because package.json has been changed, docker is not cache mode_modules because of it.
How can I cache node_modules in this senirio?
FROM node
# If needed, install system dependencies here
# Add package.json before rest of repo for caching
ADD package.json /app/
WORKDIR /app
RUN npm install
ADD . /app
# If needed, add additional RUN commands here

You can achieve this cache using BUILD_VERSION along with package.json version.
ARG BUILD_VERSION=0.0.0
Set some default value to BUILD_VERSION, keep the same value from BUILD_VERSION as package.json version to ignore the npm installation process.
Suppose you have the version in package.json is 0.0.0 and build version should be 0.0.0 to ignore installation.
FROM node:alpine
WORKDIR /app
ARG BUILD_VERSION=0.0.0
copy package.json /app/package.json
RUN echo BUILD_VERSION is $BUILD_VERSION and Package.json version is $(node -e "console.log(require('/app/package.json').version);")
RUN if [ "${BUILD_VERSION}" != $(node -e "console.log(require('/app/package.json').version);") ];then \
echo "ARG version and Package.json is different, installing node modules";\
npm install;\
else \
echo "npm installation process ignored";\
fi
To ignore npm installation during the build, run build command with
docker build --no-cache --build-arg BUILD_VERSION=0.0.0 -t test-cache-image .
Now, if you want to install node_modules just update the run command and it will work as you are expecting but more control as compared to caching track.
docker build --no-cache --build-arg BUILD_VERSION=0.0.1 -t test-cache-image .
This will install node_modules if the package.json version did not match with build-version.

Related

What is the optimal way to exploit Docker multi-stage builds?

I'm using multi-stage builds in my Dockerfile (the first stage is a BUILD, and the second is the RUN).
I want to know if I should, in my second stage, copy the node_modules folder or if I should run an npm i. What is the optimal way ?
Note: All the apk packages that I install in the first stage are required to run npm ci properly (I had many errors : node-gyp, etc)
# Build container stage
FROM node:alpine AS BUILD_IMAGE
RUN apk --no-cache add -u --virtual build-dependencies \
g++ gcc libgcc libstdc++ linux-headers make python3
WORKDIR /app
COPY package*.json ./
RUN npm ci && npm clean cache --force && apk del build-dependencies
COPY . .
RUN npm run lint
RUN npm run tsc
RUN npm prune --production
# Run container stage
FROM node:alpine AS app
WORKDIR /app
COPY /package*.json ./
# Should I copy the `node_modules` folder or
# should I run an `npm i` ? What is the optimal method?
COPY --from=BUILD_IMAGE /app/dist ./dist
COPY --from=BUILD_IMAGE /app/node_modules ./node_modules
# Clean dev packages
EXPOSE 8080
# Run the container with a non-root User
USER node
CMD [ "node", "dist/src/app.js" ]
I always run npm install (ci) inside of the docker when I want to build the image, because if you copy node_modules from dev environment to docker image, depend on you OS you use for developing, many of the binary files inside node_modules are wrong ( Windows/mac => Linux ) also if you use ubuntu and your image is based on Alpine again you will have some problem.
The best option is always to make a layer just for node_modules build and after that, make an application layer based on your node_module layer => you will use cache and faster build.
Note: if you wanna copy node_modules on Windows use WSL2 (Ubuntu) and also build your images based on Ubuntu(Debian) then you don't worry about errors

Run npm update in docker without using the cache on that specific update

Background:
I'm writing code in node.js, using npm and docker. I'm trying to get my docker file to use cache when I build it so it doesn't take too long.
We have a "common" repo that we use to keep logic that is used in a variety of repositories and this gets propagated is npm packages.
The problem:
I want the docker file NOT use the cache on my "common" package.
Docker file:
FROM node:12-alpine as X
RUN npm i npm#latest -g
RUN mkdir /app && chown node:node /app
WORKDIR /app
RUN apk add --no-cache python3 make g++ tini \
&& apk add --update tzdata
USER node
COPY package*.json ./
COPY .npmrc .npmrc
RUN npm install --no-optional && npm cache clean --force
ENV PATH /app/node_modules/.bin:$PATH
COPY . .
package.json has this line:
"dependencies": {
"#myorg/myorg-common-repo": "~1.0.13",
I have tried adding these lines in a variety of places and nothing seems to work:
RUN npm uninstall #myorg/myorg-common-repo && npm install #myorg/myorg-common-repo
RUN npm update #myorg/myorg-common-repo --force
Any ideas on how I can get docker to build and not use the cache on #myorg/myorg-common-repo ?
So I finally managed to solve this using this answer:
What we want to do is invalidate the cache for a specific block in the Docker file and then run our update command. This is done by adding a build argument to the command (CLI or Makefile) like so:
docker-compose -f docker-compose-dev.yml build --build-arg CACHEBUST=0
And then Adding this additional block to the Docker file:
ARG CACHEBUST=1
USER node
RUN npm update #myorg/myorg-common-repo
This does what we want.
The ARG CACHEBUST=1 invalidates the cache and the npm update command runs without it.

npm unable to find correct version of package within Docker

I am attempting to perform npm install within a docker image. As part of the package.json, I need version 1.8.8 of react-pattern-library. Within the docker image, only version 0.0.1 appears to be available.
If I locally run
npm view react-pattern-library versions
I can see version 1.8.8
However the same command within my docker file only show version 0.0.1
Can anyone tell me what configuration setting I need to be able to find the correct version when attempting my docker build?
docker build -t jhutc/molly-ui
Contents of Dockerfile
FROM node:10
# Create app directory
WORKDIR /usr/src/app
# Install app dependencies
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available (npm#5+)
#COPY package*.json ./
COPY package.json ./
RUN npm set strict-ssl false
ENV HTTP_PROXY="http://proxy.company.com:8080"
ENV HTTPS_PROXY="https://proxy.company.com:8080"
RUN echo $HTTP_PROXY
RUN echo $HTTPS_PROXY
RUN npm view react-pattern-library versions
#RUN npm install
Try deleting the package-lock.json and running npm install again.

Multi stage Dockerfile leads to running out of space

As my code (nodeJS-application) is changing more often than the (npm) dependencies do, I've tried to build something like a cache in my CI.
I'm using a multi-stage Dockerfile. In that I run npm install for all, and only, prod dependencies. Later they are copied to the final image so that it is much smaller. Great.
Also the build get super fast if no dependency has been changed.
However, over time the hd gets full so I have to run docker prune ... to get the space back. But, when I do this, the cache is gone.
So if I run a prune after each pipeline in my CI, I do not get the 'cache functionality' of the multi-stage Dockerfile.
### 1. Build
FROM node:10.13 AS build
WORKDIR /home/node/app
COPY ./package*.json ./
COPY ./.babelrc ./
RUN npm set progress=false \
&& npm config set depth 0 \
&& npm install --only=production --silent \
&& cp -R node_modules prod_node_modules
RUN npm install --silent
COPY ./src ./src
RUN ./node_modules/.bin/babel ./src/ -d ./dist/ --copy-files
### 2. Run
FROM node:10.13-alpine
RUN apk --no-cache add --virtual \
builds-deps \
build-base \
python
WORKDIR /home/node/app
COPY --from=build /home/node/app/prod_node_modules ./node_modules
COPY --from=build /home/node/app/dist .
EXPOSE 3000
ENV NODE_ENV production
CMD ["node", "app.js"]
If your CI system lets you have multiple docker build steps, you could split this into two Dockerfiles.
# Dockerfile.dependencies
# docker build -f Dockerfile.dependencies -t me/dependencies .
FROM node:10.13
...
RUN npm install
# Dockerfile
# docker build -t me/application .
FROM me/dependencies:latest AS build
COPY ./src ./src
RUN ./node_modules/.bin/babel ./src/ -d ./dist/ --copy-files
FROM node:10.13-alpine
...
CMD ["node", "app.js"]
If you do this, then you can delete unused images after each build:
docker image prune
The most recent build of the dependencies image will have a label, so it won't be "dangling" and won't appear in the image listing. On each build its label will get "taken from" the previous build (if it changed) and so this sequence will clean up previous builds. This will also delete the "build" images, though as you note if anything changed to trigger a build it will probably be in the src tree and so forcing a rebuild there is reasonable.
In this specific circumstance, just using the latest tag is appropriate. If the final built images have some more unique tag (based on a version number or timestamp, say) and they're stacking up then you might need to do some more creative filtering of that image list to clean them up.

Bumping package.json version without invalidating docker cache

I'm using a pretty standard Dockerfile to containerize a Node.js application:
# Simplified version
FROM node:alpine
# Copy package.json first for docker build's layer caching
COPY package.json package-lock.json foo/
RUN npm install
COPY src/ foo/
RUN npm run build
Breaking up my COPY into two parts was advantageous because it allowed Docker to cache the (long) npm install step.
Recently, however, I started bumping my package.json version using semver. This had the side effect of invalidating the Docker cache for the npm install step, lengthening my build times significantly.
Is there an alternative caching strategy I can use so that npm install only runs when my dependencies change?
Here's my take on this, based off other answers, but shorter and with usage of jq:
Dockerfile:
FROM endeveit/docker-jq AS deps
# https://stackoverflow.com/a/58487433
# To prevent cache invalidation from changes in fields other than dependencies
COPY package.json /tmp
RUN jq '{ dependencies, devDependencies }' < /tmp/package.json > /tmp/deps.json
FROM node:12-alpine
WORKDIR /app
COPY --from=deps /tmp/deps.json ./package.json
COPY package-lock.json .
RUN npm ci
# https://docs.npmjs.com/cli/ci.html#description
COPY . .
RUN npm run build
LABEL maintainer="Alexey Vishnyakov <n3tn0de#gmail.com>"
I extract dependencies and devDependencies fields to a separate file, then on next build step I copy it from the previous step as package.json (COPY --from=deps /tmp/deps.json ./package.json).
After RUN npm ci, COPY . . will overwrite gutted package.json with the original one (you can test it by adding RUN cat package.json after COPY . . command.
Note that npm-scripts commands like postinstall won't run since they're not present in the file during npm ci and also if npm ci is running from root and without --unsafe-perm
Either run commands after COPY . . or/and (if needed) include them via jq (changing command will invalidate cache layer) or add --unsafe-perm
Dockerfile:
FROM endeveit/docker-jq AS deps
COPY package.json /tmp
RUN jq '{ dependencies, devDependencies, peerDependencies, scripts: (.scripts | { postinstall }) }' < /tmp/package.json > /tmp/deps.json
# keep postinstall script
FROM node:12-alpine
WORKDIR /app
COPY --from=deps /tmp/deps.json ./package.json
COPY package-lock.json .
# RUN npm ci --unsafe-perm
# allow postinstall to run from root (security risk)
RUN npm ci
# https://docs.npmjs.com/cli/ci.html#description
RUN npm run postinstall
...
You can add an additional "preparation" step in your Dockerfile that creates a temporary package.json where the "version" field is fixed. This file is then used while installing dependencies and afterwards replaced by the "real" package.json.
As all of this happens during the Docker build process, your actual source repository is not touched (so you can use the environment variable npm_package_version both during your build and when running the docker script, e.g. to tag) and the solution is portable:
Dockerfile:
# PREPARATION
FROM node:lts-alpine as preparation
COPY package.json package-lock.json ./
# Create temporary package.json where version is set to 0.0.0
# – this way the cache of the build step won't be invalidated
# if only the version changed.
RUN ["node", "-e", "\
const pkg = JSON.parse(fs.readFileSync('package.json', 'utf-8'));\
const pkgLock = JSON.parse(fs.readFileSync('package-lock.json', 'utf-8'));\
fs.writeFileSync('package.json', JSON.stringify({ ...pkg, version: '0.0.0' }));\
fs.writeFileSync('package-lock.json', JSON.stringify({ ...pkgLock, version: '0.0.0' }));\
"]
# BUILD
FROM node:lts-alpine as build
# Install deps, using temporary package.json from preparation step
COPY --from=preparation package.json package-lock.json ./
RUN npm ci
# Copy source files (including "real" package.json) and build app
COPY . .
RUN npm run build
If you think inlining the Node script is iffy (I like it, because this way the entire Docker build process can be found in the Dockerfile), you can of course extract it to a separate JS file:
create-tmp-pkg.js:
const fs = require('fs');
const pkg = JSON.parse(fs.readFileSync('package.json', 'utf-8'));
const pkgLock = JSON.parse(fs.readFileSync('package-lock.json', 'utf-8'));
fs.writeFileSync('package.json', JSON.stringify({ ...pkg, version: '0.0.0' }));
fs.writeFileSync('package-lock.json', JSON.stringify({ ...pkgLock, version: '0.0.0' }));
and change your preparation step to:
# PREPARATION
FROM node:lts-alpine as preparation
COPY package.json package-lock.json create-tmp-pkg.js ./
# Create temporary package.json where version is set to "0.0.0"
# – this way the cache of the build step won't be invalidated
# if only the version changed.
RUN node create-tmp-pkg.js
I spent some time thinking about this. Fundamentally, I'm cheating because the package.json file is, in fact, changed, which means anything that circumvents the cache invalidation technically makes the build not reproducible.
For my purposes, however, I care more about build time than strict cache correctness. Here's what I came up with:
build-artifacts.js
/*
Used to keep docker cache fresh despite package.json version bumps.
In this script
- copy package.json to package-artifact.json
- zero package.json version
In Docker
- copy package.json
- run npm install normal
- copy package-artifact.json to package.json (undo-build-artifacts.js accomplishes this with a conditional check that package-artifact exists)
*/
const fs = require('fs');
const package = fs.readFileSync('package.json', 'utf8');
fs.writeFileSync('package-artifact.json', package);
const modifiedPackage = { ...JSON.parse(package), version: '0.0.0' };
fs.writeFileSync('package.json', JSON.stringify(modifiedPackage));
const packageLock = fs.readFileSync('package-lock.json', 'utf8');
fs.writeFileSync('package-lock-artifact.json', packageLock);
const modifiedPackageLock = { ...JSON.parse(packageLock), version: '0.0.0' };
fs.writeFileSync('package-lock.json', JSON.stringify(modifiedPackageLock));
undo-build-artifacts.js
const fs = require('fs');
const hasBuildArtifacts = fs.existsSync('package-artifact.json');
if (hasBuildArtifacts) {
const package = fs.readFileSync('package-artifact.json', 'utf8');
const packageLock = fs.readFileSync('package-lock-artifact.json', 'utf8');
fs.writeFileSync('package.json', package);
fs.writeFileSync('package-lock.json', packageLock);
fs.unlinkSync('package-artifact.json');
fs.unlinkSync('package-lock-artifact.json');
}
These two files serve to relocate package.json and package-lock.json, replacing them with artifacts that have zeroed-out versions. These artifacts will be used in the docker build, and will be replaced with the original versions upon npm install completion.
I run build-artifacts.js in a Travis CI before_script, and undo-build-artifacts.js in the Dockerfile itself (after I npm install). undo-build-artifacts.js incorporates a check for the build artifacts, meaning the Docker container can still build if build-artifacts.js hasn't run. That keeps the container portable enough in my books. :)
I went about this a bit different. I just ignore the version in package.json and leave it set to 1.0.0. Instead I add a file version.json then I use a script like the one below for deploying.
This approach won't work if you need to publish to npm, since the version will never change
version.json
{"version":"1.2.3"}
deploy.sh
#!/bin/sh
VERSION=`node -p "require('./version.json').version"`
#docker build
docker pull node:10
docker build . -t mycompany/myapp:v$VERSION
#commit version tag
git add version.json
git commit -m "version $VERSION"
git tag v$VERSION
git push origin
git push origin v$VERSION
#push Docker image to repo
docker push mycompany/myapp:v$VERSION
I normally just update the version file manually but if you want something that works like npm version you can use a script like this that uses the semvar package.
patch.js
var semver = require('semver')
var fs = require('fs')
var version = require('./version.json').version
var patch = semver.inc(version, 'patch')
fs.writeFile('./version.json', JSON.stringify({'version': patch}), (err) => {
if (err) {
console.error(err)
} else {
console.log(version + ' -> ' + patch)
}
})
Based on n3tn0de answer
I changed the Dockerfile to be
######## Preperation
FROM node:12-alpine AS deps
COPY package.json package-lock.json ./
RUN npm version --allow-same-version 1.0.0
######## Building
FROM node:12-alpine
WORKDIR /app
COPY --from=deps package.json package-lock.json ./
RUN npm ci
COPY . .
EXPOSE 80
CMD ["npm", "start"]
This approach will avoid using 2 different Docker images -less download and less storage- and fix/avoid any issues in package.json
Another option pnpm now has pnpm fetch which only uses the lockfile so you are free to make other changes to package.json
This requires switching from npm/yarn to using pnpm
Example from: https://pnpm.io/cli/fetch
FROM node:14
RUN curl -f https://get.pnpm.io/v6.16.js | node - add --global pnpm
# pnpm fetch does require only lockfile
COPY pnpm-lock.yaml ./
RUN pnpm fetch --prod
ADD . ./
RUN pnpm install -r --offline --prod
EXPOSE 8080
CMD [ "node", "server.js" ]
Patching the version can be done without jq, using basic sed:
FROM alpine AS temp
COPY package.json /tmp
RUN sed -e 's/"version": "[0-9]\+\.[0-9]\+\.[0-9]\+",/"version": "0.0.0",/'
< /tmp/package.json > /tmp/package-v0.json
FROM node:14.5.0-alpine
....
COPY --from=temp /tmp/package-v0.json package.json
...
The sed regex assumes that the version value follows the semver scheme (e.g. 1.23.456)
The other assumption is that the "version": "xx.xx.xx," string is not found elsewhere in the file. The "," at the end of the pattern can help to lower the probability of "false positives". Check it before with your package.json file by security of course.
Steps:
remove version from package json
install packages for production
Copy to production image
Benefits:
Can freely patch package json without invalidating docker
If dependencies were not changed will not do unecessary npm install for production (packages don't change that frequently)
in practice:
# prepare package
FROM node:14-alpine AS package
COPY ./package.json ./package-lock.json ./
RUN node -e "['./package.json','./package-lock.json'].forEach(n => { \
let p = require(n); \
p.version = '0.0.0'; \
fs.writeFileSync(n, JSON.stringify(p)); \
});"
# install deps
FROM node:14-alpine AS build
COPY --from=package package*.json ./
RUN npm ci --only=production
# production
FROM node:14-alpine
...
COPY . .
COPY --from=build ./node_modules ./node_modules
...

Resources