Copying node_modules into a dockerfile vs installing them

Copying node_modules into a dockerfile vs installing them - node.js

I’ve been tasked with dockerizing our Node app at work. When it comes to node_modules, I’m in a bit of a disagreement with our lead dev.
He is advocating for something like this in the dockerfile. His reasoning is that the docker image will be more deterministic, and on that point I don’t entirely disagree with him.
COPY node_modules ./
I am advocating for something like this. My reasoning.. that’s essentially how everyone on the internet says to do it, including the Node docs and the Docker docs. I wish I was arguing from a technical perspective, but I just can’t seem to find anything that specifically addresses this.
COPY package*.json ./
RUN npm install
So who is right? What would be the downsides associated with the first option?

I'd almost always install Node packages from inside the container rather than COPYing them from the host (probably via RUN npm ci if I was using npm).
If the host environment doesn't exactly match the container environment, COPYing the host's node_modules directory may not work well (or at all). The most obvious case of this using a MacOS or Windows host with a Linux container, where if there are any C extensions or other binaries they just won't work. It's also conceivable that there would be if the Node versions don't match exactly. Finally, and individual developer might have npm installed an additional package or a different version, and the image would vary based on who's building it.
Also consider the approach of using a multi-stage build to have both development and production versions of node_modules; that way you do not include build-only tools like the tsc Typescript compiler in the final image. If you have two different versions of node_modules then you can't COPY a single tree from the host, you must install it in the Dockerfile.
FROM node AS build
WORKDIR /app
COPY package*.json .
RUN npm ci
COPY . .
RUN npm install
FROM node
WORKDIR /app
COPY package*.json .
ENV NODE_ENV=production
RUN npm ci
COPY --from=build /app/build /app/build
CMD ["node", "/app/build/index.js"]

Related

is it necesary copy dependencies in Dockerfile when using containers for dev only?

I want to create a dev enviroment for a node app with Docker. I have seen examples of Dockerfile with similar configurations as the following:
FROM node:14-alpine
WORKDIR /express-api-server
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["node", "index.js"]
I know that you can use volumes in docker-compose.yml to map host directories with directories of the containers using Volumes, thus you can make changes in your code and save data in a mongo database and preserve those changes locally when deleting or stopping container.
My question is: if I want to use the container for dev purposes only, there is any benefit on copying the package.json and package-lock.json, and installing dependencies?
I can use volumes to map the node_modules and the package files alonside with the code, so there's no need for me to take those action when building the image the first time.

correct. both work.
you just need to balance pros and cons.
the most obvious advantage is that having one dockerfile for dev and prod is easier and garanty that the environment is the same.
i personally have a single dockerfile for dev / test / prod for max coherence. and i mount volume with code and dependencies for dev.
when i do "npm install" i do it on the host. it instantly restarts the project without needing to rebuild. then when i want to publish to prod i do a docker build. it rebuils everything. ignoring mounts.
if you do like me, check that host nodejs version and docker's nodejs version is the same.

How to avoid npm build node.js packages every time when creating a Docker image via Circle CI?

I deploy my Node.Js app via AWS ECS Docker container using Circle CI.
However, each time I build a new image it runs npm build (because it's in my Dockerfile) and downloads and builds all the node modules again every time. Then it uploads a new image to the AWS ECS repository.
As my environment stays the same I don't want it to build those packages every time. So do you think it is possible for Docker to actually update an existing image rather than building a new one from scratch with all the modules every time? Is this generally a good practice?
I was thinking the following workflow:
Check if there are any new Node packages compared to the previous image
If yes, run npm build
If not, just keep the old node_modules folder, don't run build and simply update the code
Deploy
What would be the best way to do that?
Here's my Dockerfile
FROM node:12.18.0-alpine
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY . .
COPY package.json package-lock.json* ./
RUN npm install
RUN npm install pm2 -g
EXPOSE 3000
CMD [ "pm2-runtime", "ecosystem.config.js"]
My Circle CI workflow (from the ./circleci/config.yml):
workflows:
version: 2.1
test:
jobs:
- test
- aws-ecr/build-and-push-image:
create-repo: true
no-output-timeout: 10m
repo: 'stage-instance'

Move the COPY . . line after the RUN npm install line.
The way Docker's layer caching works, it will skip re-running a RUN line if it knows that it's already run it. So given this Dockerfile fragment:
FROM node:12.18.0-alpine
WORKDIR /usr/src/app
COPY package.json package-lock.json* ./
RUN npm install
Docker keeps track of the hashes of the files it COPYs in. When it gets to the RUN line, if the image up to this point is identical to one it's previously built, it will also skip over the RUN line.
If you have COPY . . first, then if any file in your source tree changes, it will invalidate the layer cache for everything afterwards. If you only copy package.json and the lock file first, then npm install only gets re-run if either of those two files change.
(CircleCI may or may not perform the same layer caching, but "install dependencies, then copy the application in" is a typical Docker-level optimization.)

Node Docker Build and Production Container Best Practices

I have a Node project that uses MongoDB. To for automated testing, we use Mongo Memory Server
For Mongo Memory Server, Alpine is not supported my Mongo, so it can't run on Alpine images
From the docs:
There isn't currently an official MongoDB release for alpine linux. This means that we can't pull binaries for Alpine (or any other platform that isn't officially supported by MongoDB), but you can use a Docker image that already has mongod built in and then set the MONGOMS_SYSTEM_BINARY variable to point at that binary. This should allow you to use mongodb-memory-server on any system on which you can install mongod.
I can run all of my tests in a Docker container using a Node base image, but for production I want to use an Alpine image to save on memory.
so my Dockerfile looks something like this.
FROM node:x.x.x as test
WORKDIR /app
COPY . /app
npm install
npm run build # we use Typescript, this runs the transpilation
npm test # runs our automated tests
FROM node:x.x.x-alpine
WORKDIR /app
COPY --from=test /app/src /app/src
COPY --from=test /app/package.json /app/package.json
COPY --from=test /app/package-lock.json /app/package-lock.json
COPY --from=test /app/config /app/config
COPY --from=test /app/scripts /app/scripts
RUN npm install --production
RUN npm run build
Doing smoke testing, the resulting Alpine image seems to work okay. I assume it is safe because I install the modules in the alpine image itself.
I am wondering, is this the best practice? Is there a better way to do something like this? That is, for Node specifically, have a larger test container and a small production container safely.

Few points
If you are building twice, what is the point of the multistage build. I don't do much node stuff. But the reason you would want a multistage build is that you build you application with npm build that take those artifacts and copy them to the image and serve/run that in some way. In go world it would be something like building in a builder stage then just running the binary.
You always want to have the most changing things on the top of the union file system. What it means is that instead of copying the entire application code and running npm install, you should copy just package.json and run npm install on it. That way docker can cache the result of npm install and save on downloading the node files if nothing has changed on top. You application code changes way more than the package.json
On the second stage same idea. If you have to - copy package.json first and run npm install then copy the rest of the stuff.
You can have more stages if you want. The name of the game is to get the leanest and cleanest final stage image. Thats the one that goes on registry. Everything else can be and should be removed.
Hope it helps.

Docker build from Dockerfile hangs indefinitely and occasionally crashes with error 'failed to start service utility VM'

I am currently using Docker Desktop for Windows and following this tutorial for using Docker and VSCode ( https://scotch.io/tutorials/docker-and-visual-studio-code ) and when I am attempting to build the image, the daemon is able to complete the first step of the Dockerfile, but then hangs indefinitely on the second step. Sometimes, but very rarely, after an indeterminate amount of time, it will error out and give me this error
failed to start service utility VM (createreadwrite): CreateComputeSystem 97cb9905dbf6933f563d0337f8321c8cb71e543a242cddb0cb09dbbdbb68b006_svm: The operation could not be started because a required feature is not installed.
(extra info: {"SystemType":"container","Name":"97cb9905dbf6933f563d0337f8321c8cb71e543a242cddb0cb09dbbdbb68b006_svm","Layers":null,"HvPartition":true,"HvRuntime":{"ImagePath":"C:\\Program Files\\Linux Containers","LinuxInitrdFile":"initrd.img","LinuxKernelFile":"kernel"},"ContainerType":"linux","TerminateOnLastHandleClosed":true})
I have made sure that virtualization is enabled on my machine, uninstalled and reinstalled Docker, uninstalled Docker and deleted all files related to it before reinstalling, as well as making sure that the experimental features are enabled. These are fixes that I have found from various forums while trying to find others who have had the same issue.
Here is the Dockerfile that I am trying to build from. I have double checked with the tutorial that it is correct, though its still possible that I missed something (outside of the version number in the FROM line).
FROM node:10.13-alpine
ENV NODE_ENV production
WORKDIR /usr/src/app
COPY ["package.json", "package-lock.json*", "npm-shrinkwrap.json*", "./"]
RUN npm install --production --silent && mv node_modules ../
COPY . .
EXPOSE 3000
CMD npm start
I would expect the image to build correctly as I have followed the tutorial to a T. I have even full reset and started the tutorial over again and I'm still getting this same issue where it hangs indefinitely.

well, you copy some files two times. I would not do that.
so for the minimum change to your Dockerfile I would try:
FROM node:10.13-alpine
ENV NODE_ENV production
WORKDIR /usr/src/app
COPY . .
RUN npm install --production --silent && mv node_modules ../
EXPOSE 3000
CMD npm start
I would also think about the && mv node_modules ../ part, if it is really needed.
If you don't do it already I advise you to write a .dockerignore file right next to your Dockerfile with the minimum content of:
/node_modules
so that your local node_modules directory does not get also copied while building the image (saves time).
hope this helps.

Why COPY package*.json ./ precedes COPY . .?

In this Node.js tutorial on Docker:
https://nodejs.org/en/docs/guides/nodejs-docker-webapp/
What is the point of COPY package*.json ./?
Isn't everything copied over with COPY . .?
The Dockerfile in question:
FROM node:8
# Create app directory
WORKDIR /usr/src/app
# Install app dependencies
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available (npm#5+)
COPY package*.json ./
RUN npm install
# If you are building your code for production
# RUN npm install --only=production
# Bundle app source
COPY . .
EXPOSE 8080
CMD [ "npm", "start" ]

This is a common pattern in Dockerfiles (in all languages). The npm install step takes a long time, but you only need to run it when the package dependencies change. So it's typical to see one step that just installs dependencies, and a second step that adds the actual application, because it makes rebuilding the container go faster.
You're right that this is essentially identical if you're building the image once; you get the same filesystem contents out at the end.
Say this happens while you're working on the package, though. You've changed some src/*.js file, but haven't changed the package.json. You run npm test and it looks good. Now you re-run docker build. Docker notices that the package*.json files haven't changed, so it uses the same image layer it built the first time without re-running anything, and it also skips the npm install step (because it assumes running the same command on the same input filesystem produces the same output filesystem). So this makes the second build run faster.

During building of an image docker works on the basis of layer based architecture that is each line you write in a Dockerfile gets into the layer and gets cached... now the purpose of copying the package*.json file first is a kind of optimization you did in the Dockerfile during the building of an image if bcoz we want to run the command npm install only when some dependencies gets added into the project hence copying first package*.json into the image file system for every successive build runs npm install only when a new dependency gets added into the project and then just copy everything into the image filesystem then after docker is a headless pc of software it doesn't check a layer subsequent to the change of a layer it just executes after then ... hence we get saved each time without running npm install after copying the entire host file system into image file system

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string