Why COPY package*.json ./ precedes COPY . .?

Why COPY package*.json ./ precedes COPY . .? - node.js

In this Node.js tutorial on Docker:
https://nodejs.org/en/docs/guides/nodejs-docker-webapp/
What is the point of COPY package*.json ./?
Isn't everything copied over with COPY . .?
The Dockerfile in question:
FROM node:8
# Create app directory
WORKDIR /usr/src/app
# Install app dependencies
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available (npm#5+)
COPY package*.json ./
RUN npm install
# If you are building your code for production
# RUN npm install --only=production
# Bundle app source
COPY . .
EXPOSE 8080
CMD [ "npm", "start" ]

This is a common pattern in Dockerfiles (in all languages). The npm install step takes a long time, but you only need to run it when the package dependencies change. So it's typical to see one step that just installs dependencies, and a second step that adds the actual application, because it makes rebuilding the container go faster.
You're right that this is essentially identical if you're building the image once; you get the same filesystem contents out at the end.
Say this happens while you're working on the package, though. You've changed some src/*.js file, but haven't changed the package.json. You run npm test and it looks good. Now you re-run docker build. Docker notices that the package*.json files haven't changed, so it uses the same image layer it built the first time without re-running anything, and it also skips the npm install step (because it assumes running the same command on the same input filesystem produces the same output filesystem). So this makes the second build run faster.

During building of an image docker works on the basis of layer based architecture that is each line you write in a Dockerfile gets into the layer and gets cached... now the purpose of copying the package*.json file first is a kind of optimization you did in the Dockerfile during the building of an image if bcoz we want to run the command npm install only when some dependencies gets added into the project hence copying first package*.json into the image file system for every successive build runs npm install only when a new dependency gets added into the project and then just copy everything into the image filesystem then after docker is a headless pc of software it doesn't check a layer subsequent to the change of a layer it just executes after then ... hence we get saved each time without running npm install after copying the entire host file system into image file system

Related

Copying node_modules into a dockerfile vs installing them

I’ve been tasked with dockerizing our Node app at work. When it comes to node_modules, I’m in a bit of a disagreement with our lead dev.
He is advocating for something like this in the dockerfile. His reasoning is that the docker image will be more deterministic, and on that point I don’t entirely disagree with him.
COPY node_modules ./
I am advocating for something like this. My reasoning.. that’s essentially how everyone on the internet says to do it, including the Node docs and the Docker docs. I wish I was arguing from a technical perspective, but I just can’t seem to find anything that specifically addresses this.
COPY package*.json ./
RUN npm install
So who is right? What would be the downsides associated with the first option?

I'd almost always install Node packages from inside the container rather than COPYing them from the host (probably via RUN npm ci if I was using npm).
If the host environment doesn't exactly match the container environment, COPYing the host's node_modules directory may not work well (or at all). The most obvious case of this using a MacOS or Windows host with a Linux container, where if there are any C extensions or other binaries they just won't work. It's also conceivable that there would be if the Node versions don't match exactly. Finally, and individual developer might have npm installed an additional package or a different version, and the image would vary based on who's building it.
Also consider the approach of using a multi-stage build to have both development and production versions of node_modules; that way you do not include build-only tools like the tsc Typescript compiler in the final image. If you have two different versions of node_modules then you can't COPY a single tree from the host, you must install it in the Dockerfile.
FROM node AS build
WORKDIR /app
COPY package*.json .
RUN npm ci
COPY . .
RUN npm install
FROM node
WORKDIR /app
COPY package*.json .
ENV NODE_ENV=production
RUN npm ci
COPY --from=build /app/build /app/build
CMD ["node", "/app/build/index.js"]

How to avoid npm build node.js packages every time when creating a Docker image via Circle CI?

I deploy my Node.Js app via AWS ECS Docker container using Circle CI.
However, each time I build a new image it runs npm build (because it's in my Dockerfile) and downloads and builds all the node modules again every time. Then it uploads a new image to the AWS ECS repository.
As my environment stays the same I don't want it to build those packages every time. So do you think it is possible for Docker to actually update an existing image rather than building a new one from scratch with all the modules every time? Is this generally a good practice?
I was thinking the following workflow:
Check if there are any new Node packages compared to the previous image
If yes, run npm build
If not, just keep the old node_modules folder, don't run build and simply update the code
Deploy
What would be the best way to do that?
Here's my Dockerfile
FROM node:12.18.0-alpine
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY . .
COPY package.json package-lock.json* ./
RUN npm install
RUN npm install pm2 -g
EXPOSE 3000
CMD [ "pm2-runtime", "ecosystem.config.js"]
My Circle CI workflow (from the ./circleci/config.yml):
workflows:
version: 2.1
test:
jobs:
- test
- aws-ecr/build-and-push-image:
create-repo: true
no-output-timeout: 10m
repo: 'stage-instance'

Move the COPY . . line after the RUN npm install line.
The way Docker's layer caching works, it will skip re-running a RUN line if it knows that it's already run it. So given this Dockerfile fragment:
FROM node:12.18.0-alpine
WORKDIR /usr/src/app
COPY package.json package-lock.json* ./
RUN npm install
Docker keeps track of the hashes of the files it COPYs in. When it gets to the RUN line, if the image up to this point is identical to one it's previously built, it will also skip over the RUN line.
If you have COPY . . first, then if any file in your source tree changes, it will invalidate the layer cache for everything afterwards. If you only copy package.json and the lock file first, then npm install only gets re-run if either of those two files change.
(CircleCI may or may not perform the same layer caching, but "install dependencies, then copy the application in" is a typical Docker-level optimization.)

Node Docker Build and Production Container Best Practices

I have a Node project that uses MongoDB. To for automated testing, we use Mongo Memory Server
For Mongo Memory Server, Alpine is not supported my Mongo, so it can't run on Alpine images
From the docs:
There isn't currently an official MongoDB release for alpine linux. This means that we can't pull binaries for Alpine (or any other platform that isn't officially supported by MongoDB), but you can use a Docker image that already has mongod built in and then set the MONGOMS_SYSTEM_BINARY variable to point at that binary. This should allow you to use mongodb-memory-server on any system on which you can install mongod.
I can run all of my tests in a Docker container using a Node base image, but for production I want to use an Alpine image to save on memory.
so my Dockerfile looks something like this.
FROM node:x.x.x as test
WORKDIR /app
COPY . /app
npm install
npm run build # we use Typescript, this runs the transpilation
npm test # runs our automated tests
FROM node:x.x.x-alpine
WORKDIR /app
COPY --from=test /app/src /app/src
COPY --from=test /app/package.json /app/package.json
COPY --from=test /app/package-lock.json /app/package-lock.json
COPY --from=test /app/config /app/config
COPY --from=test /app/scripts /app/scripts
RUN npm install --production
RUN npm run build
Doing smoke testing, the resulting Alpine image seems to work okay. I assume it is safe because I install the modules in the alpine image itself.
I am wondering, is this the best practice? Is there a better way to do something like this? That is, for Node specifically, have a larger test container and a small production container safely.

Few points
If you are building twice, what is the point of the multistage build. I don't do much node stuff. But the reason you would want a multistage build is that you build you application with npm build that take those artifacts and copy them to the image and serve/run that in some way. In go world it would be something like building in a builder stage then just running the binary.
You always want to have the most changing things on the top of the union file system. What it means is that instead of copying the entire application code and running npm install, you should copy just package.json and run npm install on it. That way docker can cache the result of npm install and save on downloading the node files if nothing has changed on top. You application code changes way more than the package.json
On the second stage same idea. If you have to - copy package.json first and run npm install then copy the rest of the stuff.
You can have more stages if you want. The name of the game is to get the leanest and cleanest final stage image. Thats the one that goes on registry. Everything else can be and should be removed.
Hope it helps.

Docker build from Dockerfile hangs indefinitely and occasionally crashes with error 'failed to start service utility VM'

I am currently using Docker Desktop for Windows and following this tutorial for using Docker and VSCode ( https://scotch.io/tutorials/docker-and-visual-studio-code ) and when I am attempting to build the image, the daemon is able to complete the first step of the Dockerfile, but then hangs indefinitely on the second step. Sometimes, but very rarely, after an indeterminate amount of time, it will error out and give me this error
failed to start service utility VM (createreadwrite): CreateComputeSystem 97cb9905dbf6933f563d0337f8321c8cb71e543a242cddb0cb09dbbdbb68b006_svm: The operation could not be started because a required feature is not installed.
(extra info: {"SystemType":"container","Name":"97cb9905dbf6933f563d0337f8321c8cb71e543a242cddb0cb09dbbdbb68b006_svm","Layers":null,"HvPartition":true,"HvRuntime":{"ImagePath":"C:\\Program Files\\Linux Containers","LinuxInitrdFile":"initrd.img","LinuxKernelFile":"kernel"},"ContainerType":"linux","TerminateOnLastHandleClosed":true})
I have made sure that virtualization is enabled on my machine, uninstalled and reinstalled Docker, uninstalled Docker and deleted all files related to it before reinstalling, as well as making sure that the experimental features are enabled. These are fixes that I have found from various forums while trying to find others who have had the same issue.
Here is the Dockerfile that I am trying to build from. I have double checked with the tutorial that it is correct, though its still possible that I missed something (outside of the version number in the FROM line).
FROM node:10.13-alpine
ENV NODE_ENV production
WORKDIR /usr/src/app
COPY ["package.json", "package-lock.json*", "npm-shrinkwrap.json*", "./"]
RUN npm install --production --silent && mv node_modules ../
COPY . .
EXPOSE 3000
CMD npm start
I would expect the image to build correctly as I have followed the tutorial to a T. I have even full reset and started the tutorial over again and I'm still getting this same issue where it hangs indefinitely.

well, you copy some files two times. I would not do that.
so for the minimum change to your Dockerfile I would try:
FROM node:10.13-alpine
ENV NODE_ENV production
WORKDIR /usr/src/app
COPY . .
RUN npm install --production --silent && mv node_modules ../
EXPOSE 3000
CMD npm start
I would also think about the && mv node_modules ../ part, if it is really needed.
If you don't do it already I advise you to write a .dockerignore file right next to your Dockerfile with the minimum content of:
/node_modules
so that your local node_modules directory does not get also copied while building the image (saves time).
hope this helps.

How should I Accomplish a Better Docker Workflow?

Everytime I change a file in the nodejs app I have to rebuild the docker image.
This feels redundant and slows my workflow. Is there a proper way to sync the nodejs app files without rebuilding the whole image again, or is this a normal usage?

It sounds like you want to speed up the development process. In that case I would recommend to mount your directory in your container using the docker run -v option: https://docs.docker.com/engine/userguide/dockervolumes/#mount-a-host-directory-as-a-data-volume
Once you are done developing your program build the image and now start docker without the -v option.

What I ended up doing was:
1) Using volumes with the docker run command - so I could change the code without rebuilding the docker image every time.
2) I had an issue with node_modules being overwritten because a volume acts like a mount - fixed it with node's PATH traversal.
Dockerfile:
FROM node:5.2
# Create our app directories
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
RUN npm install -g nodemon
# This will cache npm install
# And presist the node_modules
# Even after we are using the volume (overwrites)
COPY package.json /usr/src/
RUN cd /usr/src && npm install
#Expose node's port
EXPOSE 3000
# Run the app
CMD nodemon server.js
Command-line:
to build:
docker build -t web-image
to run:
docker run --rm -v $(pwd):/usr/src/app -p 3000:3000 --name web web-image

You could have also done something like change the instruction and it says look in the directory specified by the build context argument of docker build and find the package.json file and then copy that into the current working directory of the container and then RUN npm install and afterwards we will COPY over everything else like so:
# Specify base image
FROM node:alpine
WORKDIR /usr/app
# Install some dependencies
COPY ./package.json ./
RUN npm install
# Setup default command
CMD ["npm", "start"]
You can make as many changes as you want and it will not invalidate the cache for any of these steps here.
The only time that npm install will be executed again is if we make a change to that step or any step above it.
So unless you make a change to the package.json file, the npm install will not be executed again.
So we can test this by running the docker build -t <tagname>/<project-name> .
Now I have made a change to the Dockerfile so you will see some steps re run and eventually our successfully tagged and built image.
Docker detected the change to the step and every step after it, but not the npm install step.
The lesson here is that yes it does make a difference the order in which all these instructions are placed in a Dockerfile.
Its nice to segment out these operations to ensure you are only copying the bare minimum.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string