How to remove development dependencies in production docker images

How to remove development dependencies in production docker images - node.js

When shipping a dockerized node.js to production,
is it correct to ship an image that contains development dependencies?
I am not talking about the development dependencies
Not the devDependencies listed in packages.json, I mean gcc, python, node-gyp, some other *-dev packages, containing a bunch of headers, static libraries.
All of them are needed to compile some node dependencies (like node-sass)
An idea could be a two stage build, one image with all *-dev dependencies, build stuff in there, and export the results to another new image with just the binaries.
Pros: The final "production" image is small
Cons: Not standard way to build images
In general, any compiled sofware I want to distribute in docker images, should not contain the compilers, headers, and tools used to build the binaries.

If you want something to not be included in your final image, you have to do all the related commands in only one layer (one RUN statement).
Something like the following (pseudo code):
RUN install dev-dependencies && build your-project && uninstall dev-dependencies
Only one layer is created for the RUN statement and it won't contain dev dependencies.

The image would not be smaller if you remove the dependencies because the older layers contain them.
Try the new (experimental) --squash option with docker-build, using Docker 1.13.

The answer to OP question depends of how much images OP/his company maintain for production needs.
There few strategies possible:
If the quantity of maintained images is medium to low and system architecture is not very complex and do not uses dozens of images at once there the simplest and easiest solution to maintain is the best one. You can approach it with 1 single build. Or 2 step build if you want to use compiled source as base for containers that could bear different content (In this case 2nd stage could be even done during docker-compose up (system start-up)).
You can remove dev-only dependencies (as other answers suggested) if its necessary to keep the image slim/ there a lot of running containers that use same image / the size of compiled files is huge. This will increase time span of the build process but will result in smaller image.
3rd approach is totally different - if there is compiling process, use CI pipeline that independently compiles the assets within separate container (CI runner process) and provides versioned artifact - which you can use in your production builds (even store it somewhere, on S3/CDN/private, accesible to deployment storage), and then just fetch it from there, or just use files hosted there (in case of CDN).

Related

How to install npm dependencies efficiently for multiple dockerized node.js projects in a monorepo?

I have a monorepo project, containing 10 packages let's say.
Each and every one of these packages has a Dockerfile, with a seprate npm install command. Each npm install command takes about 10 minutes on their own. Building all images takes 10 x 10 minutes, which is unacceptable.
What I came up with so far: Creating a Dockerfile where I copy the whole monorepo at the same time, do an npm install there, which takes only ~15 mins, since there's a lot of common depenendencies. From there I can just use this as the base image for all the other images. The only problem is, that this base image is 5 gigs, which is huge, because it ecapsulates the node_modules folder, which now contains ALL the dependencies from every project.
Copying only necessary folders from node_modules ? Is there a command in npm, that lists all the depdendencies for a package.json file (with all the peer dependendencies etc) so I can filter what I copy from the base image's node_modules folder to the final image (thus reducing size while not downloading common dependencies multiple times)? Maybe an already established tool that can do just that?
Boundling node_modules ?: Another idea I had is to bundle the app, which would only include in the finaly bundle the code the package actaully uses, so using multi-stage builds, the final image could be very small. This is what I'm already using with the frontend of the applicaiton (with Vite or CRA previously) but I have not managed to get it working for node.js. Presumably because pg-native might have some binaries, which can not be bundled. Nevertheless, it would be the BEST solution to bundle everything into a 5mb js file, and install only problematic dependencies afterwards like pg-native. I have not found any tutorial, description about this whatsoever, so I'm not terribly confident this can even be achieved, and I've got no clue ...why?
Ridiculously uneffective answers I've found so far include:
reducing dependencies - its not possible, checked 3 times with depcheck
installing only the prod dependencies - only saves a few mb-s nothing significant
using multi stage builds - it's a good solution to start a fresh image, if you have build artifacts in the "builder" image, but in my case, it does not reduce the node_moduels size, only in the frontend image, because there the node_modules are bundle to a js file, and not necessary during runtime.
using node:x-alpine image - using it already, it saves 500mb, but that's also insignificant compared to the 3-4gigs of node_modules I have to copy around everywhere.
Finally, a little explanation why this is important for me: I'm using Builx with a docker-container driver, which is necessary to take advantage of caching via gha or local caching types set via the --cache-to option. The docker-container driver has a very slow phase, called sending tarball, and it's directly related to image size. Downloading-uploading the cache from GHA-s cache storage is also very much effected by the cache size. So on paper, if we take ~600mb images, instead of ~4000mb images, the whole process could be speed up via 6-7x. That's a huge improvement, not even mentioning how cost effective that would be, since in the cloud everything is billed my seconds of computing power / storage used.
Cheers

Dynamically specify the Dockerfile base image based on the package.json defined node version

I am currently working in an environment where there are multiple node projects, some running on a 14.* node docker image, some on 16.*
The problem I faced was that I didn't realize the Dockerfile was obsolete and that the app would not run as it has been developed under 16.* while the Dockerfile specified a node environment in 14.*
I was wondering if there would be a way to reduce the possible amount of code that has to be modified if our structure decides to start implementing projects in other versions than the ones we currently use. After thinking about it, I ended up with two main axes of thoughts:
The environment is set (Dockerfile), the app should be developed under the environment specifications
The environment needs to be set according to the app
After some research I ran into this article about dynamic image specification. Now this would make it pretty nice as we could dynamically somehow pass as an argument a version of the node image we would want to install for our environment.
This would require two things:
As a dev, I must define the node version in my package.json
A script must be able to read from that package.json and launch the docker build with the parameter it got and possibly catch the error if it is not defined.
Is this a recommended pattern to work? I believe it would reduce the amount of manual code changes in case of version updates but at the same time seeing the lack of documentation towards this use case I don't feel like this is a common thing.

Should I use Docker to deploy a library of functions?

I see that Docker is intended to deploy applications, but what about libraries? For instance I have a library called RAILWAY that is a set of headers, binary code libraries, and command line tools.
I was thinking the output of the railway CI/CD pipeline can be a docker image that is pushed to a registry. Any application that wants to use railway, must be built using docker. And it will just put FROM railway:latest and COPY --from=railway ... in its Dockerfile. The application can copy whatever it need from the library image into its own image.
Is this a normal use-case?
I could use a Debian package for railway, but Azure Artifacts do not support Debian packages (only nuget and npm). And docker is just so damn easy!

Most languages have their own systems for distributing and managing dependencies (like NuGet which you mentioned) which you should use instead.
The problem with your suggestion is that it's not as simple as "applications use libraries", it's rather "applications use libraries which use libraries which use libraries which use...".
E.g. if your app wants to use libraries A and B, but library A also uses library B itself, how do you handle that in your setup? Is there a binary for B in As docker image that gets copied over? Does it overwrite the binary for B that you copied earlier? What if they're different versions with different methods in them?

How to speed up CI build times when using docker?

I currently use docker + travis CI to test/ deploy my app. This works great locally because I have data volumes for things like node_modules etc, and docker's layers provide caching speeding up builds.
However, when I push the code to travis it has to rebuild and install everything from scratch and it takes forever! Travis doesn't support caching docker layers atm. Is there some other way to speed up my builds, or another similar tool that allows docker layer caching?

You might want to investigate how i3wm has solved a similar problem.
The main developer has written on the design behind his Travis CI workflow. Quoting the relevant part:
The basic idea is to build a Docker container based on Debian testing
and then run all build/test commands inside that container. Our
Dockerfile installs compilers, formatters and other development tools
first, then installs all build dependencies for i3 based on the
debian/control file, so that we don’t need to duplicate build
dependencies for Travis and for Debian.
This solves the immediate issue nicely, but comes at a significant
cost: building a Docker container adds quite a bit of wall clock time
to a Travis run, and we want to give our contributors quick feedback.
The solution to long build times is caching: we can simply upload the
Docker container to the Docker Hub and make subsequent builds use the
cached version.
We decided to cache the container for a month, or until inputs to the
build environment (currently the Dockerfile and debian/control)
change. Technically, this is implemented by a little shell script
called ha.sh (get it? hash!) which prints the SHA-256 hash of the
input files. This hash, appended to the current month, is what we use
as tag for the Docker container, e.g. 2016-03-3d453fe1.
See our .travis.yml for how to plug it all together.

Testing elixir release build with exrm

I am building phoenix application with exrm.
Good practice suggests, that I should make tests against the same binary, I'll be pushing to production.
Exrm gives me the ability to deploy phoenix on machines, that don't have Erlang or Elixir installed, which makes pulling docker images faster.
Is there a way to start mix test against binary built by exrm?

It should be noted that releases aren't a binary file. Sure they are packaged into a tarball, but that is just to ease deployment, what it contains is effectively the binary .beam files generated with MIX_ENV=prod mix compile, plus ERTS (if you are bundling it), Erlang/Elixir .beam files, and the boot scripts/config files for starting the application, etc.
So in short your code will behave identically in a release as it would when running with MIX_ENV=prod (assuming you ran MIX_ENV=prod mix release). The only practical difference is whether or not you've correctly configured your application for being packaged in a release, and testing this boils down to doing a test deployment to /tmp/<app> and booting it to make sure you didn't forget to add dependencies to applications in mix.exs.
The other element you'd need to test is if you are doing hot upgrades/downgrades with your application, in which case you need to do test deploys locally to make sure the upgrade/downgrade is applied as expected, since exrm generates default .appup files for you, which may not always do the correct thing, or everything you need them to do, in which case you need to edit them as appropriate. I do this by deploying to /tmp/<app> starting up the old version, then deploying the upgrade tarball to /tmp/<app>/releases/<new version>/<app>.tar.gz, and running /tmp/<app>/bin/<app> upgrade <version> and testing that the application was upgraded as expected, then running the downgrade command for the previous version to see if it rolls back properly. The nature of the testing varies depending on the code changes you've made, but that's the gist of it.
Hopefully that helps answer your question!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string