Why does docker image grow so fast after installing sudo?

Why does docker image grow so fast after installing sudo? - linux

I have a local docker image based on oracle linux for installing oracle database. This is more a docker question that a DB question.
There are some fairly large data files (~12GB).
Normally when I do a docker commit it increases the size of the image slightly with anything newly added.
I installed sudo on the machine and all of a sudden every time I commit the image it starts doubling the size of the image (12GB->24GB->50GB).
I guess I can find ways to live without sudo but does anyone have a guess why merely running "yum install sudo" as root would cause this huge image size? Im at a loss what other details to provide.
Nothing fancy in the docker commit:
docker commit my-container the-image:v1
I've gone through the entire image with du -sh and there are no additional data files that would explain the size.
I do have some mount points and volumes but those should not be included in the image from what I can tell and even if they are the image size ends up bigger than the sum of volumes.
Any chance its writing swap space to the image? I have the database shut down and no idea what that would have to do with sudo command being installed or not.

Im using the following instead of sudo. Essentially there is an entrypoint script that has to extract a tar with the database bits (into a docker volume). I was having some permission issues extracting the data. It seems like maybe docker always changes the owner of a volume to root and I needed it to be user oracle. Anyway I added this in my script and its working fine:
echo "myRootPassword" | su -c "chown oracle:oinstall /ORCL" root
I guess I am still curious why adding sudo causes this issue but not curious to waste any more time on it.
You can replicate it by running a base image from ubuntu.
commit the base image to a second image
Then run and ssh into the second image and install sudo.
Lastly commit the image to a 3rd image. The third image image is like 1GB bigger than the second image.

I guess I had a fundamental misunderstanding of how the images get layered when you do a commit. Some times it seems like the commits create a slightly larger image and some times it must take the previous image and add the current image (12g+12g). I guess this has nothing to do with sudo after all - I just happened to keep adding sudo at the wrong time so it seemed related.
The issue is Im trying to build a database image and its very hard to do it all in 1 Docker file. So I was creating a base image and running the oracle install in the running container using x11 forwarding. Oracle DB installation is a pain to figure out in silent mode and Ive spent the last full day trying to get the response file correct. Anyway closing this issue as its nothing to do with sudo.

Related

Slow Docker build image - Sending context is very slow

when I build a docker image from a docker-compose with "context" configured, it need a lot of time to complete.
The step that keep too much time is "Sending context...": it need 20 minutes for a 85MB folder.
The issue appear both in Ubuntu 20.04 and MacOS (using colima as virualization engine).
The folder I need to send is a nodejs project, so it has a lot of small files in the node_modules folder, but I can not excclude it becouse I need to run the node project in the container.
Is there a way to speed up the Sending context step?

I've found a solution: from docker-compose 1.25.1, docker support the BuildKit, a different way to build the image that solve this issue.
You only need to update the docker-compose and set this variable:
export DOCKER_BUILDKIT=1
https://docs.docker.com/develop/develop-images/build_enhancements/

How to list all files accessed after running docker?

I have to deal with some very large vendor support packages for embedded development —- I’ve used docker successfully just as a means of keeping their installs segmented away from the rest of my system and for the sake of environment reproducibility. That works great, but often these installs are monoliths, including a ton of files and functionality I don’t need, especially in a CI environment. And moving giant, slow-to-recreate docker images around is a pain.
So, in the interest of teasing out just the features I need, and porting them to a much smaller image, I’m wondering:
Can I run a docker image, performing some CI-relevant task, and then find all the files that were accessed in the duration the docker image was running?
The plan after that would be to copy all those files into a tarfile or similar, then use that for specialized images in the future. So as an alternative question... is that plan worth pursuit?
Thanks :) -Chloë

Maybe it will not answer exactly to your question, however it may help.
You can check what is happening in the container by
checking its logs through the docker container logs command.
checking the modification performed in its filesystem through the docker diff command.
Here is an example
# run a ubuntu container
$ docker run -it --rm --name focal ubuntu:focal
# run a command in the container
$ echo "test" > test.txt
# messages in the logs
$ docker container logs --follow --details focal
# root#aa86b4988bfe:/# echo "test" > test.txt
# checking the differences
$ docker diff focal
# A /test.txt

Why is COPY in docker build not detecting updates

I run a build on a node application and then use the artifacts to build a docker image. The COPY command that moves my source in place isn't detecting changes to the source files after a build; its just using the cache.
Step 9/12 : COPY server /home/nodejs/app/server
---> Using cache
---> bee2f9334952
Am I doing something wrong with COPY or is there a way to not cache a particular step?

I found this in the Docker documentation:
For the ADD and COPY instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file. The last-modified and last-accessed times of the file(s) are not considered in these checksums. During the cache lookup, the checksum is compared against the checksum in the existing images. If anything has changed in the file(s), such as the contents and metadata, then the cache is invalidated.
So, as far as I understand, the cache should be invalidated. You can use the --no-cache command-line option to make sure. If you get the correct behavior with --no-cache and an incorrect behavior without it, you would have discovered a bug and should report it.

This was interesting. I found out that COPY WAS working, it just looked like it wasn't.
I was rebuilding the images and restarting my containers, but the container was still using the old image. I had to remove my containers, and then when I started them up they used the newer image that was created, and I could see my changes.
Here is another thread that deals with this more accurately diagnosed (in my case).

For me, the problem was in my interpretation of Docker build output. I did not realize not only the last version of a layer is cached, but also all previous ones.
I was testing cache invalidation by changing a single file back and forth. After the first change, the cache was invalidated OK, but after changing back, the layer was taken from cache, which seemed as if the invalidation logic did not work properly.
https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache

It is likely a bug, but hard to replicate. It happens to me in Jenkins builds when I copy a new file to existing folder that used to be copied its entirety using single Dockerfile COPY command. To make cache invalidation work correctly (and avoid rebuilding earlier layers as --no-cache would), it is necessary to run docker build --tag <REPO>/<IMAGE> . on the host (outside of Jenkins).

You could try with ADD instead. It will invalidate the cache for the copy. The bad side is that it will also invalidate the cache for the other commands after it. If your ADD is in the last steps it shouldn't impact to much the build process.
Note: The first encountered ADD instruction will invalidate the cache for all following instructions from the Dockerfile if the contents of have changed. This includes invalidating the cache for RUN instructions. See the Dockerfile Best Practices guide for more information. https://docs.docker.com/engine/reference/builder/#add

Had the same issue. After considering #Nick Brady's post (thanks for the suggestion!), here is my current update procedure that seems to be working fine:
svn update --non-interactive --no-auth-cache --username UUU --password PPP
docker build . -f deploy/Dockerfile -t myimage
docker stop mycontainer
docker rm mycontainer
docker run --name=mycontainer -p 80:3100 -d --restart=always \
--env-file=deploy/.env.production myimage
The magic here is to not simply restart the container (docker restart mycontainer), as this would actually stop and run again the old container that was instantiated from a previous version of myimage. Stopping and destroying the old container and running a new one instead results in a fresh container instantiated from the newly built myimage.

From the point of view of Docker this is just like any other command.
Docker sees that this line didn't change, so it caches it.
Similarly if you have a curl command in your Dockerfile, Docker doesn't fetch the URL just to change if it changed. It checks if the command changed or not, not it's result.

Cannot install inside docker container

I'm quite new at docker, but I'm facing a problem I have no idea how to solve it.
I have a jenkins (docker) image running and everything was fine. A few days ago I created a job so I can run my nodejs tests every time a pull request is made. one of the job's build steps is to run npm install. And the job is constantly failing with this error:
tar (child): bzip2: Cannot exec: No such file or directory
So, I know that I have to install bzip2 inside the jenkins container, but how do I do that? I've already tried to run docker run jenkins bash -c "sudo apt-get bzip2" but I got: bash: sudo: command not found.
With that said, how can I do that?
Thanks in advance.

Answer to this lies inside the philosophy of dcoker containers. Docker containers are/should be immutable. So, this is what you can try to fix this issue.
Treat your base image i.e, jenkins as starting point.
login to this base image and install bzip2.
commit these changes and this should result in a new image.
Now use above image from step 3 to install any other package like npm.
Now commit above image.
Note: To execute commands in much controlled way, I always prefer to use something like this;
docker exec -it jenkins bash
In nutshell, answer to both of your current issues lie in the fact that images are immutable so to make any change that will get propagated is to commit them and use newly created image to make further changes. I hope this helps.

Lots of issues here, but the biggest one is that you need to build your images with the tools you need rather than installing inside of a running container. As techtrainer mentions, images are immutable and don't change (at least from your running container), and containers are disposable (so any changes you make inside them are lost when you restart them unless your data is stored outside the container in a volume).
I do disagree with techtrainer on making your changes in a container and committing them to an image with docker commit. This will work, but it's the hand built method that is very error prone and not easily reproduced. Instead, you should leverage a Dockerfile and use docker build. You can either modify the jenkins image you're using by directly modifying it's Dockerfile, or you can create a child image that is FROM jenkins:latest.
When modifying this image, the Jenkins image is configured to run as the user "jenkins", so you'll need to switch to root to perform your application installs. The "sudo" app is not included in most images, but external to the container, you can run docker commands as any user. From the cli, that's as easy as docker run -u root .... And inside your Dockerfile, you just need a USER root at the top and then USER jenkins at the end.
One last piece of advice is to not run your builds directly on the jenkins container, but rather run agents with your needed build tools that you can upgrade independently from the jenkins container. It's much more flexible, allows you to have multiple environments with only the tools needed for that environment, and if you scale this up, you can use a plugin to spin up agents on demand so you could have hundreds of possible agents to use and only be running a handful of them concurrently.

How to deploy a Docker image to make changes in the local environment?

EDIT +2=Just fyi, i am a root user which means i do not have type out superuser do (sudo) every time i do a authorized only cmd.
Alright so after about 24 hours of researching Docker i am a little upset if i got my facts straight.
As a quick recap, docker serves as a way to write code or configuration file changes for a specific web service, run environment, virtual machines, all from the cozy confines of a linux terminal/text file. This is beyond a doubt an amazing feature: to have code or builds you made on one computer work on an unlimited number of other machines is truly a breakthrough. While i am annoyed that the terminology is wrong with respect to whats containers and what are images (images are save points of layers of code that are made from dockers servers or can be created from containers which require a base image to go off of. Dockerfiles serve as a way to automate the build process of making images by running all the desired layers and roll them into one image so it can be accessed easily.).
See the catch is with docker is that "sure it can be deployed on a variety of different operating systems and use their respective commands". But those commands do not really come to pass on say something like the local environment though. While running some tests on a dockerbuild working with centos, the basic command structure goes
FROM centos
RUN yum search epel
RUN yum install -y epel-release.noarch
RUN echo epel installed!
So this works within the docker build and says it succesfully installs it.
The same can be said with ubuntu by running an apt-cache instead of yum. But going back to the centos VM, it DOES NOT state that epel has been installed because when attempting to run the command of
yum remove epel-release.noarch
it says "no packages were to be removed yet there is a package named ...". So then, if docker is able to be multi-platform why can it not actually create those changes on the local platform/image we are targeting? The docker builds run a simulation of what is going to happen on that particular environment but i can not seem to make it come to pass. This just defeats one of my intended purposes of the docker if it can not change anything local to the system one is using, unless i am missing something.
Please let me know if anyone has a solution to this dilemma.
EDIT +1=Ok so i figured out yesterday what i was trying to do was to view and modify the container which can be done by doing either docker logs containerID or docker run -t -i img /bin/sh which would put me into an interactive shell to make container changes there. Still, i want to know if theres a way to make docker comunicate to the local environment from within a container.

So, I think you may have largely missed the point behind Docker, which is the management of containers that are intentionally isolated from your local environment. The idea is that you create containerized applications that can be run on any Docker host without needing to worry about the particular OS installed or configuration of the host machine.
That said, there are a variety of ways to break this isolation if that's really what you want to do.
You can start a container with --net=host (and probably --privileged) if you want to be able to modify the host network configuration (including interface addresses, routing tables, iptables rules, etc).
You can parts of (or all of) the host filesystem as volumes inside the container using the -v command line option. For example, docker run -v /:/host ... would expose the root of your host filesystem as /host inside the container.
Normally, Docker containers have their own PID namespace, which means that processes on the host are not visible inside the container. You can run a container in the host PID namespace by using --pid=host.
You can combine these various options to provide as much or as little access to the host as you need to accomplish your particular task.
If all you're trying to do is install packages on the host, a container is probably the wrong tool for the job.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string