Node Docker Build and Production Container Best Practices - node.js

I have a Node project that uses MongoDB. To for automated testing, we use Mongo Memory Server
For Mongo Memory Server, Alpine is not supported my Mongo, so it can't run on Alpine images
From the docs:
There isn't currently an official MongoDB release for alpine linux. This means that we can't pull binaries for Alpine (or any other platform that isn't officially supported by MongoDB), but you can use a Docker image that already has mongod built in and then set the MONGOMS_SYSTEM_BINARY variable to point at that binary. This should allow you to use mongodb-memory-server on any system on which you can install mongod.
I can run all of my tests in a Docker container using a Node base image, but for production I want to use an Alpine image to save on memory.
so my Dockerfile looks something like this.
FROM node:x.x.x as test
WORKDIR /app
COPY . /app
npm install
npm run build # we use Typescript, this runs the transpilation
npm test # runs our automated tests
FROM node:x.x.x-alpine
WORKDIR /app
COPY --from=test /app/src /app/src
COPY --from=test /app/package.json /app/package.json
COPY --from=test /app/package-lock.json /app/package-lock.json
COPY --from=test /app/config /app/config
COPY --from=test /app/scripts /app/scripts
RUN npm install --production
RUN npm run build
Doing smoke testing, the resulting Alpine image seems to work okay. I assume it is safe because I install the modules in the alpine image itself.
I am wondering, is this the best practice? Is there a better way to do something like this? That is, for Node specifically, have a larger test container and a small production container safely.

Few points
If you are building twice, what is the point of the multistage build. I don't do much node stuff. But the reason you would want a multistage build is that you build you application with npm build that take those artifacts and copy them to the image and serve/run that in some way. In go world it would be something like building in a builder stage then just running the binary.
You always want to have the most changing things on the top of the union file system. What it means is that instead of copying the entire application code and running npm install, you should copy just package.json and run npm install on it. That way docker can cache the result of npm install and save on downloading the node files if nothing has changed on top. You application code changes way more than the package.json
On the second stage same idea. If you have to - copy package.json first and run npm install then copy the rest of the stuff.
You can have more stages if you want. The name of the game is to get the leanest and cleanest final stage image. Thats the one that goes on registry. Everything else can be and should be removed.
Hope it helps.

Related

Best way to build a Docker image

I have just started to learn Docker. I have a nodeJs web application and built 2 images of it using 2 different Dockerfile. The 2 images are really different in size. Can someone tell me why they are really different in size even though they use the same alpine, node, and npm version, and which one is the recommended way to build a nodeJS application image?
First Dockerfile which created a 91.92MB image:
FROM alpine
RUN apk add --update nodejs npm curl
COPY . /src
WORKDIR /src
RUN npm install
EXPOSE 8080
ENTRYPOINT ["node", "./app.js"]
Second Dockerfile which created a 259.57MB image:
FROM node:19-alpine3.16
RUN apk add --update npm curl
COPY . /src
WORKDIR /src
RUN npm install
CMD [ "node", "./app.js" ]
If it meets your technical and non-technical needs, using the prebuilt node image is probably simpler and easier to update, and I'd generally recomment using it.
The node image already includes npm; you do not have to apk add it. Moreover, the Dockerfile doesn't use apk to install Node, but instead installs it either from a tar file or from source. So the apk add npm call winds up installing an entire second copy of Node that you don't need; that's probably the largest difference in the two image sizes.
The base image you start from hugely affects your final image size - assuming you are adding the same stuff to each base image. In your case, you are comparing a base alpine image (5.5mb) to a base node image (174mb). The base alpine image will have very little stuff in it while the node image - while based on alpine - has at least got node in it but presumably lots of extra stuff too. Whether you do or do not need that extra stuff is not for me to say.
You can examine the Dockerfile used to build any of these public images if you wish to see exactly what was added. You can also use tools like dive to examine a local image layers.

Docker Can't Parallelize NPM Install Despite Running Parallel Stages

I've been working on a fullstack project that's end-to-end JavaScript, and have a Dockerfile that simplifies to this
FROM node:18-slim AS backend-builder
WORKDIR /backend-staging
COPY ./backend/package.* .
RUN npm install
# bring in other files
FROM node:18-slim AS frontend-builder
WORKDIR /frontend-staging
COPY ./frontend/package.* .
RUN npm install
# bring in other files, run webpack, etc
FROM node:18-slim AS final
WORKDIR /core
COPY ./package.* # root file w/ some non-webpack'd externals and scripts to access after the contianer builds
RUN npm install
COPY --from=backend-builder /backend-staging/build/. .
COPY --from-frontend-builder /frontend-staging/build ./webapp
# setup some postinstall things, set start command, fin
When I run docker build, after pulling the node:18-slim image, I see that buildkit parallelization takes place across my stages, with all 3 npm install commands showing up in the output at the same time. However, it seems that the actual installation goes one stage at a time, with the output first appearing from the final stage and running to completion, then the backend, then the frontend, before the copying and whatnot resumes. It almost seems like there's a mutex on access to the actual npm program, so while the command can be parallelized, the execution is first-come-first-serve single threaded.
I was reading this question, but I seem to have a different problem, as the build stages are definitely being parallelized; their timers all start simultaneously. It's the commands within the build stages that are serialized across stages, for seemingly no reason.
I'm accessing the Docker engine/daemon via Docker Desktop integration w/ WSL2, with buildkit enabled through my Docker Desktop config. The exact command I'm running to build is docker build . -t 'some-image-tag'.

Copying node_modules into a dockerfile vs installing them

I’ve been tasked with dockerizing our Node app at work. When it comes to node_modules, I’m in a bit of a disagreement with our lead dev.
He is advocating for something like this in the dockerfile. His reasoning is that the docker image will be more deterministic, and on that point I don’t entirely disagree with him.
COPY node_modules ./
I am advocating for something like this. My reasoning.. that’s essentially how everyone on the internet says to do it, including the Node docs and the Docker docs. I wish I was arguing from a technical perspective, but I just can’t seem to find anything that specifically addresses this.
COPY package*.json ./
RUN npm install
So who is right? What would be the downsides associated with the first option?
I'd almost always install Node packages from inside the container rather than COPYing them from the host (probably via RUN npm ci if I was using npm).
If the host environment doesn't exactly match the container environment, COPYing the host's node_modules directory may not work well (or at all). The most obvious case of this using a MacOS or Windows host with a Linux container, where if there are any C extensions or other binaries they just won't work. It's also conceivable that there would be if the Node versions don't match exactly. Finally, and individual developer might have npm installed an additional package or a different version, and the image would vary based on who's building it.
Also consider the approach of using a multi-stage build to have both development and production versions of node_modules; that way you do not include build-only tools like the tsc Typescript compiler in the final image. If you have two different versions of node_modules then you can't COPY a single tree from the host, you must install it in the Dockerfile.
FROM node AS build
WORKDIR /app
COPY package*.json .
RUN npm ci
COPY . .
RUN npm install
FROM node
WORKDIR /app
COPY package*.json .
ENV NODE_ENV=production
RUN npm ci
COPY --from=build /app/build /app/build
CMD ["node", "/app/build/index.js"]

How to avoid npm build node.js packages every time when creating a Docker image via Circle CI?

I deploy my Node.Js app via AWS ECS Docker container using Circle CI.
However, each time I build a new image it runs npm build (because it's in my Dockerfile) and downloads and builds all the node modules again every time. Then it uploads a new image to the AWS ECS repository.
As my environment stays the same I don't want it to build those packages every time. So do you think it is possible for Docker to actually update an existing image rather than building a new one from scratch with all the modules every time? Is this generally a good practice?
I was thinking the following workflow:
Check if there are any new Node packages compared to the previous image
If yes, run npm build
If not, just keep the old node_modules folder, don't run build and simply update the code
Deploy
What would be the best way to do that?
Here's my Dockerfile
FROM node:12.18.0-alpine
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY . .
COPY package.json package-lock.json* ./
RUN npm install
RUN npm install pm2 -g
EXPOSE 3000
CMD [ "pm2-runtime", "ecosystem.config.js"]
My Circle CI workflow (from the ./circleci/config.yml):
workflows:
version: 2.1
test:
jobs:
- test
- aws-ecr/build-and-push-image:
create-repo: true
no-output-timeout: 10m
repo: 'stage-instance'
Move the COPY . . line after the RUN npm install line.
The way Docker's layer caching works, it will skip re-running a RUN line if it knows that it's already run it. So given this Dockerfile fragment:
FROM node:12.18.0-alpine
WORKDIR /usr/src/app
COPY package.json package-lock.json* ./
RUN npm install
Docker keeps track of the hashes of the files it COPYs in. When it gets to the RUN line, if the image up to this point is identical to one it's previously built, it will also skip over the RUN line.
If you have COPY . . first, then if any file in your source tree changes, it will invalidate the layer cache for everything afterwards. If you only copy package.json and the lock file first, then npm install only gets re-run if either of those two files change.
(CircleCI may or may not perform the same layer caching, but "install dependencies, then copy the application in" is a typical Docker-level optimization.)

Continuous integration: Where to build the project?

I have a Jenkins server on which I observe a private git repository for changes, which then triggers a pipeline script (the repository contains a nodejs app). In this pipeline script I need to do the following steps:
Install dependencies (npm install)
Build my application (npm run build, which creates a dist folder)
Build a docker container (docker build) and run the container (which runs a script in the dist folder)
Which of the following two options would be the recommended way to do this, and why?
Option A: Run npm install and npm run build in the jenkins pipeline and copy the dist folder to the docker container during the docker build. This would allow me to only install runtime dependencies in the docker container using npm install --only=production, therefore reducing the image size significantly.
Option B: Run npm install and npm run build during docker build (In the Dockerfile). This would allow me to run the docker container outside the CI server if I have to (I don't have a use case for it now, but it seems cleaner because it is more independent). However, the image size would significantly increase and I am not sure if this is the recommended way.
Any suggestions?
I would choose option B.
The reason behind it would be that there are some npm packages that runs a node-gyp, gcc, and other platform-dependent builds.
Look at the popular bcrypt package as an example.
Going with option A would mean that your docker and Jenkins machine need to hold the same infra for such builds which is not common, to say the least.

Resources