I am trying to build a docker image inside docker (docker-in-docker, dind). The image is very large and it is failing to build with the error no space left on device.
Setup:
I am running this on the teamcity agent docker image, with the docker-in-docker configuration
Does my host machine need more memory or more disk space? Does docker-in-docker build in memory or on disk?
It uses disk.
A thorough explanation of how and why docker-in-docker works is in this article.
I originally asked this question with Teamcity's agent in mind, but I wanted to generalize the question. It seems that it is the industry standard never actually run docker-in-docker because it can cause data corruption, and most use cases can be solved with a docker-to-docker solution (explained below). Nevertheless, the nearly-docker-in-docker implementations are still referred to as docker-in-docker in some CI documentation even if it is not a true docker-in-docker solution.
The docker-to-docker workaround, generally, is to expose the host docker daemon to the container in a volume mount, i.e. docker run -v /var/run/docker.sock:/var/run/docker.sock ... And if you expose the daemon to a container, or expose the dind image to another container, in both cases the docker engine is running at the host level or as a container in the first level of docker, meaning that it uses disk.
Hope this clarifies.
Related
Looking for some recommendations for how to report linux host metrics such as cpu and memory utilization and disk usage stats from within a docker container. The host will contain a number of docker containers. One thought was to run Top and other basic linux commands from the outside the container and push them into a container folder that has the appropriate authorization so that they can be consumed. Another thought was to use the docker api to run docker stats for the containers but not sure this is the best as it may not report on other processes running on the host that are not containerized. A third option would be to somehow execute something like TOP and other commands on the host from within the container, this option being the most ideal for my situation. I was just looking for some proven design patterns that others have used. Also, I don’t have the ability to install a bunch of tools on the host as this would be a customer host which I don’t have control as to what is already installed.
You may run your container in privileged mode, but be aware that it this could compromise the host security as your container will no longer be in a sandboxed environment.
docker run -d --privileged --pid=host alpine:3.8 sh
When the operator executes docker run --privileged, Docker will enable access to all devices on the host as well as set some configuration in AppArmor or SELinux to allow the container nearly all the same access to the host as processes running outside containers on the host. Additional information about running with --privileged is available on the Docker Blog.
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
Good reference: https://security.stackexchange.com/a/218379
Could someone explain me what is happening when you map (in a volume) your vendor or node_module files?
I had some speed problems of docker environment and red that I don't need to map vendor files there, so I excluded it in docker-compose.yml file and the speed was much faster instantly.
So I wonder what is happening under the hood if you have vendor files mapped in your volume and what's happening when you don't?
Could someone explain that? I think this information would be useful to more than only me.
Docker does some complicated filesystem setup when you start a container. You have your image, which contains your application code; a container filesystem, which gets lost when the container exits; and volumes, which have persistent long-term storage outside the container. Volumes break down into two main flavors, bind mounts of specific host directories and named volumes managed by the Docker daemon.
The standard design pattern is that an image is totally self-contained. Once I have an image I should be able to push it to a registry and run it on another machine unmodified.
git clone git#github.com:me/myapp
cd myapp
docker build -t me/myapp . # requires source code
docker push me/myapp
ssh me#othersystem
docker run me/myapp # source code is in the image
# I don't need GitHub credentials to get it
There's three big problems with using volumes to store your application or your node_modules directory:
It breaks the "code goes in the image" pattern. In an actual production environment, you wouldn't want to push your image and also separately push the code; that defeats one of the big advantages of Docker. If you're hiding every last byte of code in the image during the development cycle, you're never actually running what you're shipping out.
Docker considers volumes to contain vital user data that it can't safely modify. That means that, if your node_modules tree is in a volume, and you add a package to your package.json file, Docker will keep using the old node_modules directory, because it can't modify the vital user data you've told it is there.
On MacOS in particular, bind mounts are extremely slow, and if you mount a large application into a container it will just crawl.
I've generally found three good uses for volumes: storing actual user data across container executions; injecting configuration files at startup time; and reading out log files. Code and libraries are not good things to keep in volumes.
For front-end applications in particular there doesn't seem to be much benefit to trying to run them in Docker. Since the actual application code runs in the browser, it can't directly access any Docker-hosted resources, and there's no difference if your dev server runs in Docker or not. The typical build chains involving tools like Typescript and Webpack don't have additional host dependencies, so your Docker setup really just turns into a roundabout way to run Node against the source code that's only on your host. The production path of building your application into static files and then using a Web server like nginx to serve them is still right in Docker. I'd just run Node on the host to develop this sort of thing, and not have to think about questions like this one.
I am new to docker and am running an Ubuntu container on Arch linux. I use it for debugging and building software with an older version of gcc. I was running low on disk and stumbled upon logs which I was able to truncate. I don't need the logs but don't want to loose my existing container that I created some time back. The solutions I have come across (disable through drivers or set rotate size to 0m) are in my understanding applied to create new containers, but I want to apply them to existing one.
You can create an image of that container with docker commit, remove the container with docker rm and then use --log=none option to docker run.
If you're new to Docker, consider that it's best to use ephemeral containers of a given image. You can also maintain a Dockerfile to recreate that image with docker build.
How to ensure, that docker container will be secure, especially when using third party containers or base images?
Is it correct, when using base image, it may initiate any services or mount arbitrary partitions of host filesystem under the hood, and potentially send sensitive data to attacker?
So if I use third party container, which Dockerfile proves the container to be safe, should I traverse the whole linked list of base images (potentially very long) to ensure the container is actually safe and does what it intends of doing?
How to ensure the trustworthy of docker container in a systematic and definite way?
Consider Docker images similar to android/iOS mobile apps. You are never quite sure if they are safe to run, but the probability of it being safe is higher when it's from an official source such as Google play or App Store.
More concretely Docker images coming from Docker hub go through security scans details of which are undisclosed as yet. So chances of a malicious image pulled from Docker hub are rare.
However, one can never be paranoid enough when it comes to security. There are two ways to make sure all images coming from any source are secure:
Proactive security: Do security source code review of each Dockerfile corresponding to Docker image, including base images which you have already expressed in question
Reactive security: Run Docker bench, open sourced by Docker Inc., which runs as a privileged container looking for runtime known malicious activities by containers.
In summary, whenever possible use Docker images from Docker hub. Perform security code reviews of DockerFiles. Run Docker bench or any other equivalent tool that can catch malicious activities performed by containers.
References:
Docker security scanning formerly known as Project Nautilus: https://blog.docker.com/2016/05/docker-security-scanning/
Docker bench: https://github.com/docker/docker-bench-security
Best practices for Dockerfile: https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/
Docker images are self-contained, meaning that unless you run them inside a container with volumes and network mode they have no way of accessing any network or memory stack of your host.
For example if I run an image inside a container by using the command:
docker run -it --network=none ubuntu:16.04
This will start the docker container ubuntu:16.04 with no mounting to host's storage and will not share any network stack with host. You can test this by running ifconfig inside the container and in your host and comparing them.
Regarding checking what the image/base-image does, a conclusion from above said is nothing harmful to your host (unless you mount your /improtant/directory_on_host to container and after starting container it removes them).
You can check what an image/base-image conatins after running by checking their dockerfile(s) or docker-compose .yml files.
To me the VOLUME in a Dockerfile doesn't seam to be doing anything, where -v on the commandline actually make a directory available inside the container.
When I read the Docker manual for VOLUME, it is not really clear to me, why I ever want to write it in the Dockerfile, and not just on the commandline?
Defining the volume in the Dockerfile doesn't expose the volumes to the host by default. Instead it sets up the linked volume to allow other containers to link to the volume(s) in other Docker containers. This is commonly used in a "Data Container" configuration where you start a container with the sole purpose of persisting data. Here's a simple example:
docker run -d --name docker_data docker/image1
docker run -d --volumes-from docker_data --name new_container docker/image2
Notice the --volumes-from flag.
See http://container-solutions.com/understanding-volumes-docker/ for a more thorough explanation.
In addition to the accepted answer, another consideration for using volumes is performance. Typically, the layered filesystems used by Docker (typically AUFS or Devicemapper, depending on which Linux distribution you're using) aren't the fastest and may become a bottleneck in high-throughput scenarios (like, for example, databases or caching directories).
Volumes, on the other hand, even if not explicitly mapped to a host directory, are still simple bind mounts to the host file system, allowing a higher throughput when writing data.
For further reading, there's an interesting paper by IBM on this topic, which contains some interesting conclusions regarding the performance impact of using Docker volumes (emphasis mine):
AUFS introduces significant overhead which
is not surprising since I/O is going through several layers, [...].
Applications that are filesystem or disk intensive should bypass AUFS
by using volumes. [...]
Although containers themselves have almost no overhead,
Docker is not without performance gotchas. Docker volumes
have noticeably better performance than files stored in AUFS.