To me the VOLUME in a Dockerfile doesn't seam to be doing anything, where -v on the commandline actually make a directory available inside the container.
When I read the Docker manual for VOLUME, it is not really clear to me, why I ever want to write it in the Dockerfile, and not just on the commandline?
Defining the volume in the Dockerfile doesn't expose the volumes to the host by default. Instead it sets up the linked volume to allow other containers to link to the volume(s) in other Docker containers. This is commonly used in a "Data Container" configuration where you start a container with the sole purpose of persisting data. Here's a simple example:
docker run -d --name docker_data docker/image1
docker run -d --volumes-from docker_data --name new_container docker/image2
Notice the --volumes-from flag.
See http://container-solutions.com/understanding-volumes-docker/ for a more thorough explanation.
In addition to the accepted answer, another consideration for using volumes is performance. Typically, the layered filesystems used by Docker (typically AUFS or Devicemapper, depending on which Linux distribution you're using) aren't the fastest and may become a bottleneck in high-throughput scenarios (like, for example, databases or caching directories).
Volumes, on the other hand, even if not explicitly mapped to a host directory, are still simple bind mounts to the host file system, allowing a higher throughput when writing data.
For further reading, there's an interesting paper by IBM on this topic, which contains some interesting conclusions regarding the performance impact of using Docker volumes (emphasis mine):
AUFS introduces significant overhead which
is not surprising since I/O is going through several layers, [...].
Applications that are filesystem or disk intensive should bypass AUFS
by using volumes. [...]
Although containers themselves have almost no overhead,
Docker is not without performance gotchas. Docker volumes
have noticeably better performance than files stored in AUFS.
Related
I am trying to build a docker image inside docker (docker-in-docker, dind). The image is very large and it is failing to build with the error no space left on device.
Setup:
I am running this on the teamcity agent docker image, with the docker-in-docker configuration
Does my host machine need more memory or more disk space? Does docker-in-docker build in memory or on disk?
It uses disk.
A thorough explanation of how and why docker-in-docker works is in this article.
I originally asked this question with Teamcity's agent in mind, but I wanted to generalize the question. It seems that it is the industry standard never actually run docker-in-docker because it can cause data corruption, and most use cases can be solved with a docker-to-docker solution (explained below). Nevertheless, the nearly-docker-in-docker implementations are still referred to as docker-in-docker in some CI documentation even if it is not a true docker-in-docker solution.
The docker-to-docker workaround, generally, is to expose the host docker daemon to the container in a volume mount, i.e. docker run -v /var/run/docker.sock:/var/run/docker.sock ... And if you expose the daemon to a container, or expose the dind image to another container, in both cases the docker engine is running at the host level or as a container in the first level of docker, meaning that it uses disk.
Hope this clarifies.
Reading through docker documentation I found this passage (located here):
Block-level storage drivers such as devicemapper, btrfs, and zfs perform better for write-heavy workloads (though not as well as Docker
volumes).
So does this mean that one should always use docker volumes when expecting lot's of persistent writing?
The container-local filesystem never stores persistent data, so you don't have a choice but to mount something into the container if you want data to live on after the container exits. The "block-level storage drivers" you quote discuss particular install-time options for how images and containers are stored, and aren't related to any particular volume or bind-mount implementation.
As far as performance goes, my general expectation is that the latency of disk I/O will far outweigh any overhead of any particular implementation. Without benchmarking any particular implementation, on a native Linux host, I would expect a named volume, a bind-mount, and writes to the container filesystem to be more or less similar.
From a programming point of view, you will probably get better long-term performance improvement from figuring out how to have fewer disk accesses (for example, by grouping together related database requests into a single transaction) than by trying to optimize the Docker-level storage.
The one prominent exception to this is that bind mounts on MacOS are known to be very slow and you should avoid them if your workload involves substantial disk access. (This includes both reading and writing, and includes some interpreted languages that want to read in every possible source file at startup time.) If you're managing something like database storage where you can't usefully directly access the files anyways, use a named volume. For your application code, COPY it into an image in a Dockerfile and do not overwrite it at run time.
should always use docker volumes when expecting lot's of persistent writing?
It depends.
Yes you want some kind of external to the container storage for any persistent data since data written inside the container is lost when that container is removed.
Whether that should be a host bind or named volume depends on how you need to manage that data. A host volume is a bind mount to the host filesystem. It gives you direct access to that data, but that direct access also comes with uid/gid permission issues and losses the initialization feature of named volumes.
Named volumes with all the defaults is just a bind mount to a folder under /var/lib/docker, so performance would be the same as a host volume of the underlying filesystem is the same. That said the named volume can be configured to mount just about anything you can do with the mount command.
Since each of these options can have varying underlying filesystem, and the performance difference comes from that underlying filesystem choice, there's no way to answer this in any generic sense. Hence, it depends.
I know, multiple Docker containers can be used in the same host, but can they be used securely like isolated instances? I want to run multiple secure and sandboxed containers such that no container can affect or access others.
For instance, can I serve nginx and apache containers which listen to different ports, with full trust that each container can only access their own files, resources etc?
In some sense you are asking the million dollar question with containers, and to be clear, IMHO there is no black and white answer to the question "is the platform/technology secure enough." It is a big (and important) enough question that the list of startups--not to mention amount of funding they've received--around container security is an appreciable number!
As noted in another answer, isolation for containers is realized through an assortment of Linux kernel capabilities (namespaces and cgroups), and adding more security to these capabilities is yet another set of technologies like seccomp, apparmor (or SELinux), user namespaces, or general hardening of the container runtime & node it is installed on (e.g. via the CIS benchmark guidelines). Out of the box default installation and default runtime parameters are probably not good enough for generically trusting in the kernel isolation primitives of Linux. However, this depends greatly on the trust level of what you are running across your container workloads. For example, is this all in-house within one organization? Can workloads be submitted from external sources? Obviously the spectrum of possibilities may greatly impact your level of trust.
If your use case is potentially narrow (for example, you mention web serving content from nginx or apache), and you are willing to do some work to handle base image creation, minimization and hardening; add to that a --readonly root filesystem and a capability limiting apparmor and seccomp profile, bind mount in the content served + writeable area, with no executables and ownership by an unprivileged user--all those things together might be enough for a specific use case.
However, there is no guarantee that a currently unknown security escape becomes a "0day" for Linux containers in the future, and that has led to promotion of lightweight virtualization that marries container isolation with actual hardware-level virtualization through shims from hyper.sh or Intel Clear Containers, as two examples. This is a happy medium between running a full virtualized OS with another container runtime and trusting kernel isolation with a single daemon on a single node. There is still a performance cost and memory overhead to adding this layer of isolation, but it is much less than a fully virtualized OS and work continues to make this less of a performance impact.
For a deeper set of information on all the "knobs" available for tuning container security, a presentation I gave last year several times is available on slideshare as well as via video from Skillsmatter.
The incredibly thorough "Understanding and Hardening Linux Containers" by Aaron Grattafiori is also a great resource with exhaustive detail on many of the same topics.
filesystem isolation (as well as memory and processes isolation) is a core feature of docker containers, based on the Linux Kernel abilities.
But if you wanted to be completely sure, you would deploy your containers on different nodes (each managed by their own docker daemons), each node being a VM (Virtual Machine) on your host, ensuring a complete sandbox.
Then a docker swarm or Kubernetes would be able to orchestrate those node and their containers, and make them communicate.
This is normally not needed when you have just a few linked containers: their should be able to be managed in isolation by a single docker daemon. You could use user namespace for additional isolation.
Plus, using nodes to separate containers implies different machines or different VM within the same machine.
And one big difference with a VM and a container is that a VM will preempt resources (allocate a fix minimal amount of disk/memory/CPU), which means you cannot launch an hundred VM, one per container. As opposed to a single docker instance, where a container, if it does nothing, won't consume much disk space/memory/CPU at all.
How to ensure, that docker container will be secure, especially when using third party containers or base images?
Is it correct, when using base image, it may initiate any services or mount arbitrary partitions of host filesystem under the hood, and potentially send sensitive data to attacker?
So if I use third party container, which Dockerfile proves the container to be safe, should I traverse the whole linked list of base images (potentially very long) to ensure the container is actually safe and does what it intends of doing?
How to ensure the trustworthy of docker container in a systematic and definite way?
Consider Docker images similar to android/iOS mobile apps. You are never quite sure if they are safe to run, but the probability of it being safe is higher when it's from an official source such as Google play or App Store.
More concretely Docker images coming from Docker hub go through security scans details of which are undisclosed as yet. So chances of a malicious image pulled from Docker hub are rare.
However, one can never be paranoid enough when it comes to security. There are two ways to make sure all images coming from any source are secure:
Proactive security: Do security source code review of each Dockerfile corresponding to Docker image, including base images which you have already expressed in question
Reactive security: Run Docker bench, open sourced by Docker Inc., which runs as a privileged container looking for runtime known malicious activities by containers.
In summary, whenever possible use Docker images from Docker hub. Perform security code reviews of DockerFiles. Run Docker bench or any other equivalent tool that can catch malicious activities performed by containers.
References:
Docker security scanning formerly known as Project Nautilus: https://blog.docker.com/2016/05/docker-security-scanning/
Docker bench: https://github.com/docker/docker-bench-security
Best practices for Dockerfile: https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/
Docker images are self-contained, meaning that unless you run them inside a container with volumes and network mode they have no way of accessing any network or memory stack of your host.
For example if I run an image inside a container by using the command:
docker run -it --network=none ubuntu:16.04
This will start the docker container ubuntu:16.04 with no mounting to host's storage and will not share any network stack with host. You can test this by running ifconfig inside the container and in your host and comparing them.
Regarding checking what the image/base-image does, a conclusion from above said is nothing harmful to your host (unless you mount your /improtant/directory_on_host to container and after starting container it removes them).
You can check what an image/base-image conatins after running by checking their dockerfile(s) or docker-compose .yml files.
I am learning Docker storage and I am not clear about Docker storage drivers.
What is docker's storage driver in layman's terms?
How is it different than Backing Filesystem that docker info command shows?
If someone wants to write his own storage driver? How to do that?
I suggest you go and look at the presentation from one of the docker developers: http://www.slideshare.net/Docker/docker-storage-drivers
What is docker's storage driver in layman's terms?
When you use the FROM command in a Dockerfile you are referring to a base image. Rather than copy everything in a new image, you will share the contents (a.k.a. fs layers); this is what is known as a copy-on-write (holy cow!) filesystem. The docker storage driver is just which kind of COW implementation to use (AUFS, BTRFS ...). If you imagine your images as layers and depending on each other, you get a graph.
How is it different than Backing Filesystem that docker info command shows?
Same difference between logical and physical representation. The filesystem may be mounted as ext4 (where docker is installed) but used by docker daemon to leverage COW semantics.
If someone wants to write his own storage driver? How to do that?
Go and take a look at the graphdriver (manages the graph of layers).
https://github.com/docker/docker/tree/master/daemon/graphdriver