Best way to build a Docker image - node.js

I have just started to learn Docker. I have a nodeJs web application and built 2 images of it using 2 different Dockerfile. The 2 images are really different in size. Can someone tell me why they are really different in size even though they use the same alpine, node, and npm version, and which one is the recommended way to build a nodeJS application image?
First Dockerfile which created a 91.92MB image:
FROM alpine
RUN apk add --update nodejs npm curl
COPY . /src
WORKDIR /src
RUN npm install
EXPOSE 8080
ENTRYPOINT ["node", "./app.js"]
Second Dockerfile which created a 259.57MB image:
FROM node:19-alpine3.16
RUN apk add --update npm curl
COPY . /src
WORKDIR /src
RUN npm install
CMD [ "node", "./app.js" ]

If it meets your technical and non-technical needs, using the prebuilt node image is probably simpler and easier to update, and I'd generally recomment using it.
The node image already includes npm; you do not have to apk add it. Moreover, the Dockerfile doesn't use apk to install Node, but instead installs it either from a tar file or from source. So the apk add npm call winds up installing an entire second copy of Node that you don't need; that's probably the largest difference in the two image sizes.

The base image you start from hugely affects your final image size - assuming you are adding the same stuff to each base image. In your case, you are comparing a base alpine image (5.5mb) to a base node image (174mb). The base alpine image will have very little stuff in it while the node image - while based on alpine - has at least got node in it but presumably lots of extra stuff too. Whether you do or do not need that extra stuff is not for me to say.
You can examine the Dockerfile used to build any of these public images if you wish to see exactly what was added. You can also use tools like dive to examine a local image layers.

Related

How to avoid npm build node.js packages every time when creating a Docker image via Circle CI?

I deploy my Node.Js app via AWS ECS Docker container using Circle CI.
However, each time I build a new image it runs npm build (because it's in my Dockerfile) and downloads and builds all the node modules again every time. Then it uploads a new image to the AWS ECS repository.
As my environment stays the same I don't want it to build those packages every time. So do you think it is possible for Docker to actually update an existing image rather than building a new one from scratch with all the modules every time? Is this generally a good practice?
I was thinking the following workflow:
Check if there are any new Node packages compared to the previous image
If yes, run npm build
If not, just keep the old node_modules folder, don't run build and simply update the code
Deploy
What would be the best way to do that?
Here's my Dockerfile
FROM node:12.18.0-alpine
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY . .
COPY package.json package-lock.json* ./
RUN npm install
RUN npm install pm2 -g
EXPOSE 3000
CMD [ "pm2-runtime", "ecosystem.config.js"]
My Circle CI workflow (from the ./circleci/config.yml):
workflows:
version: 2.1
test:
jobs:
- test
- aws-ecr/build-and-push-image:
create-repo: true
no-output-timeout: 10m
repo: 'stage-instance'
Move the COPY . . line after the RUN npm install line.
The way Docker's layer caching works, it will skip re-running a RUN line if it knows that it's already run it. So given this Dockerfile fragment:
FROM node:12.18.0-alpine
WORKDIR /usr/src/app
COPY package.json package-lock.json* ./
RUN npm install
Docker keeps track of the hashes of the files it COPYs in. When it gets to the RUN line, if the image up to this point is identical to one it's previously built, it will also skip over the RUN line.
If you have COPY . . first, then if any file in your source tree changes, it will invalidate the layer cache for everything afterwards. If you only copy package.json and the lock file first, then npm install only gets re-run if either of those two files change.
(CircleCI may or may not perform the same layer caching, but "install dependencies, then copy the application in" is a typical Docker-level optimization.)

Node Docker Build and Production Container Best Practices

I have a Node project that uses MongoDB. To for automated testing, we use Mongo Memory Server
For Mongo Memory Server, Alpine is not supported my Mongo, so it can't run on Alpine images
From the docs:
There isn't currently an official MongoDB release for alpine linux. This means that we can't pull binaries for Alpine (or any other platform that isn't officially supported by MongoDB), but you can use a Docker image that already has mongod built in and then set the MONGOMS_SYSTEM_BINARY variable to point at that binary. This should allow you to use mongodb-memory-server on any system on which you can install mongod.
I can run all of my tests in a Docker container using a Node base image, but for production I want to use an Alpine image to save on memory.
so my Dockerfile looks something like this.
FROM node:x.x.x as test
WORKDIR /app
COPY . /app
npm install
npm run build # we use Typescript, this runs the transpilation
npm test # runs our automated tests
FROM node:x.x.x-alpine
WORKDIR /app
COPY --from=test /app/src /app/src
COPY --from=test /app/package.json /app/package.json
COPY --from=test /app/package-lock.json /app/package-lock.json
COPY --from=test /app/config /app/config
COPY --from=test /app/scripts /app/scripts
RUN npm install --production
RUN npm run build
Doing smoke testing, the resulting Alpine image seems to work okay. I assume it is safe because I install the modules in the alpine image itself.
I am wondering, is this the best practice? Is there a better way to do something like this? That is, for Node specifically, have a larger test container and a small production container safely.
Few points
If you are building twice, what is the point of the multistage build. I don't do much node stuff. But the reason you would want a multistage build is that you build you application with npm build that take those artifacts and copy them to the image and serve/run that in some way. In go world it would be something like building in a builder stage then just running the binary.
You always want to have the most changing things on the top of the union file system. What it means is that instead of copying the entire application code and running npm install, you should copy just package.json and run npm install on it. That way docker can cache the result of npm install and save on downloading the node files if nothing has changed on top. You application code changes way more than the package.json
On the second stage same idea. If you have to - copy package.json first and run npm install then copy the rest of the stuff.
You can have more stages if you want. The name of the game is to get the leanest and cleanest final stage image. Thats the one that goes on registry. Everything else can be and should be removed.
Hope it helps.

How to dynamically change content in node project run through docker

I have an angularjs application, I'm running using docker.
The docker file looks like this:-
FROM node:6.2.2
RUN npm install --global gulp-cli && \
npm install --global bower
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY package.json /usr/src/app/
COPY bower.json /usr/src/app/
RUN npm install && \
bower install -F --allow-root --config.interactive=false
COPY . /usr/src/app
ENV GULP_COMMAND serve:dist
ENTRYPOINT ["sh", "-c"]
CMD ["gulp $GULP_COMMAND"]
Now when I make any changes in say any html file, It doesn't dynamically loads up on the web page. I have to stop the container, remove it, build the image again, remove the earlier image and then restart the container from new image. Do I have to do this every time? (I'm new to docker, and I guess this issue is coz my source code is not put into volume, but I don't know how to do it using docker file)
You are correct, you should use volumes for stuff like this. During development, give it the same volumes as the COPY directories. It'll override it with whatever is on your machine, no need to rebuild the image, or even restart the container. Perfect for development.
When actually baking your images for production, you remove the volumes, leave the COPY in, and you'll get a deterministic container. I would recommend you read through this article here: https://docs.docker.com/storage/volumes/.
In general, there are 3 ways to do volumes.
Define them in your dockerfile using VOLUME.
Personally, I've never done this. I don't really see the benefits of this against the other two methods. I believe it would be more common to do this when your volume is meant to act as a permanent data-store. Not so much when you're just trying to use your live dev environment.
Define them when calling docker run.
docker run ... -v $(pwd)/src:/usr/src/app ...
This is great, cause if your COPY in your dockerfile is ./src /usr/src/app then it temporarily overrides the directory while running the image, but it's still there for deployment when you don't use -v.
Use docker-compose.
My personal recommendation. Docker compose massively simplifies running containers. For sake of simplicity just calls docker run ... but automates the arguments based on a given docker-compose.yml config.
Create a dev service specifying the volumes you want to mount, other containers you want it linked to, etc. Then bring it up using docker-compose up ... or docker-compose run ... depending on what you need.
Smart use of volumes will DRAMATICALLY reduce your development cycle. Would really recommend looking into them.
Yes, you need to rebuild every time the files change, since you only modify the files that are outside of the container. In order to apply the changes to the files IN the container, you need to rebuild the container.
Depending on the use case, you could either make the Docker Container dynamically load the files from another repository, or you could mount an external volume to use in the container, but there are some pitfalls associated with either solution.
If you want to keep your container running as you add your files you could also use a variation.
Mount a volume to any other location e.g. /usr/src/staging.
While the container is running, if you need to copy new files into the container, copy them into the location of the mounted volume.
Run docker exec -it <container-name> bash to open a bash shell inside the running container.
Run a cp /usr/src/staging/* /usr/src/app command to copy all new files into the target folder.

cannot build docker image

I have been trying to build a Docker image by using this Dockerfile:
FROM mhart/alpine-node:base-6
MAINTAINER techhadmin
COPY ./package.json src/
RUN cd src && npm install
COPY . /src
WORKDIR /src
EXPOSE 3000
CMD ["npm", "start"]
But I receive this error:
/bin/sh: npm: not found
The command '/bin/sh -c cd src && npm install' returned a non-zero code: 127
Any idea how I can solve this?
Read the docs:
https://hub.docker.com/r/mhart/alpine-node/
Is written:
# If you need npm, don't use a base tag
# RUN npm install
So don't use base-6 tag and change FROM image to something like 7
FROM mhart/alpine-node:7
You are seeing this error message because when you tried to run npm install, there is no copy of npm available.
You are using alpine as the base image.
By default, alpine is a small image and so it has a limited set of default programs inside of it. What programs are in the alpine image? Not much.
So if you are trying to run an alpine image with Nodejs you need to do additional work.
To solve it, you have two options:
Find a different base image. - You can try to find a base image that already has Node and NPM inside of it.
Run alpine with some additional commands that attempts to install npm inside of it.
Use someone else's work or building it from scratch.
I recommend finding an image preconfigured with npm inside of it. You can navigate to DockerHub, which is a repository of images.
There is an official Node repository in DockerHub.
https://hub.docker.com/_/node
So you could do something like this:
# Specify base image
FROM node:alpine
# Install some dependencies
RUN npm install
# Setup default command
CMD ["npm", "start"]
The nice thing about node:alpine is that you will not get any additional unnecessary packages, just the absolute stripped down version of Nodejs and nothing else aside from the basics such as the ping command, cat, ls and so on.

How should I Accomplish a Better Docker Workflow?

Everytime I change a file in the nodejs app I have to rebuild the docker image.
This feels redundant and slows my workflow. Is there a proper way to sync the nodejs app files without rebuilding the whole image again, or is this a normal usage?
It sounds like you want to speed up the development process. In that case I would recommend to mount your directory in your container using the docker run -v option: https://docs.docker.com/engine/userguide/dockervolumes/#mount-a-host-directory-as-a-data-volume
Once you are done developing your program build the image and now start docker without the -v option.
What I ended up doing was:
1) Using volumes with the docker run command - so I could change the code without rebuilding the docker image every time.
2) I had an issue with node_modules being overwritten because a volume acts like a mount - fixed it with node's PATH traversal.
Dockerfile:
FROM node:5.2
# Create our app directories
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
RUN npm install -g nodemon
# This will cache npm install
# And presist the node_modules
# Even after we are using the volume (overwrites)
COPY package.json /usr/src/
RUN cd /usr/src && npm install
#Expose node's port
EXPOSE 3000
# Run the app
CMD nodemon server.js
Command-line:
to build:
docker build -t web-image
to run:
docker run --rm -v $(pwd):/usr/src/app -p 3000:3000 --name web web-image
You could have also done something like change the instruction and it says look in the directory specified by the build context argument of docker build and find the package.json file and then copy that into the current working directory of the container and then RUN npm install and afterwards we will COPY over everything else like so:
# Specify base image
FROM node:alpine
WORKDIR /usr/app
# Install some dependencies
COPY ./package.json ./
RUN npm install
# Setup default command
CMD ["npm", "start"]
You can make as many changes as you want and it will not invalidate the cache for any of these steps here.
The only time that npm install will be executed again is if we make a change to that step or any step above it.
So unless you make a change to the package.json file, the npm install will not be executed again.
So we can test this by running the docker build -t <tagname>/<project-name> .
Now I have made a change to the Dockerfile so you will see some steps re run and eventually our successfully tagged and built image.
Docker detected the change to the step and every step after it, but not the npm install step.
The lesson here is that yes it does make a difference the order in which all these instructions are placed in a Dockerfile.
Its nice to segment out these operations to ensure you are only copying the bare minimum.

Resources