What is Docker storage driver - linux

I am learning Docker storage and I am not clear about Docker storage drivers.
What is docker's storage driver in layman's terms?
How is it different than Backing Filesystem that docker info command shows?
If someone wants to write his own storage driver? How to do that?

I suggest you go and look at the presentation from one of the docker developers: http://www.slideshare.net/Docker/docker-storage-drivers
What is docker's storage driver in layman's terms?
When you use the FROM command in a Dockerfile you are referring to a base image. Rather than copy everything in a new image, you will share the contents (a.k.a. fs layers); this is what is known as a copy-on-write (holy cow!) filesystem. The docker storage driver is just which kind of COW implementation to use (AUFS, BTRFS ...). If you imagine your images as layers and depending on each other, you get a graph.
How is it different than Backing Filesystem that docker info command shows?
Same difference between logical and physical representation. The filesystem may be mounted as ext4 (where docker is installed) but used by docker daemon to leverage COW semantics.
If someone wants to write his own storage driver? How to do that?
Go and take a look at the graphdriver (manages the graph of layers).
https://github.com/docker/docker/tree/master/daemon/graphdriver

Related

Can kubernetes provide a pod with an emptyDir volume from the host backed by a specific filesystem different than the host's?

I know this is a bit weird, but I'm building an application that makes small local changes to ephemeral file/folder systems and needs to sync them with a store of record. I am using NFS right now, but it is slow, not super scalable, and expensive. Instead, I'd love to take advantage of btrfs or zfs snapshotting for efficient syncing of snapshots of a small local filesystem, and push the snapshots into cloud storage.
I am running this application in Kubernetes (in GKE), which uses GCP VMs with ext4 formatted root partitions. This means that when I mount an emptyDir volume into my pods, the folder is on an ext4 filesystem I believe.
Is there an easy way to get an ephemeral volume mounted with a different filesystem that supports these fancy snapshotting operations?
No. Nor does GKE offer that kind of low level control anyway but the rest of this answer presumes you've managed to create a local mount of some kind. The easiest answer is a hostPath mount, however that requires you manually account for multiple similar pods on the same host so they don't collide. A new option is an ephemeral CSI volume combined with a CSI plugin that basically reimplements emptyDir. https://github.com/kubernetes-csi/csi-driver-host-path gets most of the way there but would 1) require more work for this use case and 2) is explicitly not supported for production use. Failing either of those, you can move the whole kubelet data directory onto another mount, though that might not accomplish what you are looking for.

Are docker volumes better option for write heavy operations than binding directories directly?

Reading through docker documentation I found this passage (located here):
Block-level storage drivers such as devicemapper, btrfs, and zfs perform better for write-heavy workloads (though not as well as Docker
volumes).
So does this mean that one should always use docker volumes when expecting lot's of persistent writing?
The container-local filesystem never stores persistent data, so you don't have a choice but to mount something into the container if you want data to live on after the container exits. The "block-level storage drivers" you quote discuss particular install-time options for how images and containers are stored, and aren't related to any particular volume or bind-mount implementation.
As far as performance goes, my general expectation is that the latency of disk I/O will far outweigh any overhead of any particular implementation. Without benchmarking any particular implementation, on a native Linux host, I would expect a named volume, a bind-mount, and writes to the container filesystem to be more or less similar.
From a programming point of view, you will probably get better long-term performance improvement from figuring out how to have fewer disk accesses (for example, by grouping together related database requests into a single transaction) than by trying to optimize the Docker-level storage.
The one prominent exception to this is that bind mounts on MacOS are known to be very slow and you should avoid them if your workload involves substantial disk access. (This includes both reading and writing, and includes some interpreted languages that want to read in every possible source file at startup time.) If you're managing something like database storage where you can't usefully directly access the files anyways, use a named volume. For your application code, COPY it into an image in a Dockerfile and do not overwrite it at run time.
should always use docker volumes when expecting lot's of persistent writing?
It depends.
Yes you want some kind of external to the container storage for any persistent data since data written inside the container is lost when that container is removed.
Whether that should be a host bind or named volume depends on how you need to manage that data. A host volume is a bind mount to the host filesystem. It gives you direct access to that data, but that direct access also comes with uid/gid permission issues and losses the initialization feature of named volumes.
Named volumes with all the defaults is just a bind mount to a folder under /var/lib/docker, so performance would be the same as a host volume of the underlying filesystem is the same. That said the named volume can be configured to mount just about anything you can do with the mount command.
Since each of these options can have varying underlying filesystem, and the performance difference comes from that underlying filesystem choice, there's no way to answer this in any generic sense. Hence, it depends.

How does docker Images and Layers work?

Actually I am new to the Docker ecosystem and I am trying to understand how exactly does a container work on a base image? Does the base image gets loaded into the container?
I have been through Docker docs where its said that a read write container layer is formed on top of a image layer which is the container layer, but what I am confused about is image is immutable, right? Then where is the image running, is it inside the Docker engine in the VM and how the container is actually coming into play?
how exactly does a container work on a base image?
Does the base image gets loaded into the container?
Docker containers wrap a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries – anything that can be installed on a server.
Like FreeBSD Jails and Solaris Zones, Linux containers are self-contained execution environments -- with their own, isolated CPU, memory, block I/O, and network resources (Using CGROUPS kernel feature) -- that share the kernel of the host operating system. The result is something that feels like a virtual Machine, but sheds all the weight and startup overhead of a guest operating system.
This being said Each distribution has it's own official docker image (library), that is shipped with minimal binaries, Considered docker's best practices and it's ready to build on.
I am confused about is image is immutable, right? where is the image running, is it inside the Docker engine in the VM and how the container is actually coming into play?
Docker used to use AUFS, still uses it on debian and uses AUFS like file systems like overlay and etc on other distributions. AUFS provides layering. Each Image consists of Layers, These layers are read only. Each container has a read/write layer on top of its image layers. Read only layer are shared between containers so you will have storage space savings. Container will see the union mount of all image layers + read/write layer.

Is there a performance impact to having docker containers with different OS base boxes?

When building a docker container, we choose a container to inherit from, which is often a Linux OS (like ubuntu, debian or boot2docker). Does it have a performance impact whether multiple containers running on the same host share the same parent OS box?
[A great answer would explain why the answer is the case, whether elements of the OS are shared between separate containers and any best practices around choosing what dependencies to use when building docker containers.]
Short answer: yes for disk space, maybe for RAM.
Docker uses a union filesystem, which allows containers based on the same images to share files, until a file is modified at which point it is duplicated. (This is called copy-on-write.) So using the same base images will save you some disk space. Of course, disk space is rarely a limiting factor these days, so I usually wouldn't consider that a "performance impact".
Meanwhile, some of the Docker storage drivers—aufs and OverlayFS, but not btrfs and devicemapper—lead to certain shared libraries being shared in RAM. So if you have multiple containers based on the same image, they won't load duplicate copies of e.g. libc in RAM, which if you have lots and lots of containers could make a difference. (Source: http://permalink.gmane.org/gmane.comp.sysutils.docker.devel/1385)

Why would I want to to use VOLUME inside a Dockerfile?

To me the VOLUME in a Dockerfile doesn't seam to be doing anything, where -v on the commandline actually make a directory available inside the container.
When I read the Docker manual for VOLUME, it is not really clear to me, why I ever want to write it in the Dockerfile, and not just on the commandline?
Defining the volume in the Dockerfile doesn't expose the volumes to the host by default. Instead it sets up the linked volume to allow other containers to link to the volume(s) in other Docker containers. This is commonly used in a "Data Container" configuration where you start a container with the sole purpose of persisting data. Here's a simple example:
docker run -d --name docker_data docker/image1
docker run -d --volumes-from docker_data --name new_container docker/image2
Notice the --volumes-from flag.
See http://container-solutions.com/understanding-volumes-docker/ for a more thorough explanation.
In addition to the accepted answer, another consideration for using volumes is performance. Typically, the layered filesystems used by Docker (typically AUFS or Devicemapper, depending on which Linux distribution you're using) aren't the fastest and may become a bottleneck in high-throughput scenarios (like, for example, databases or caching directories).
Volumes, on the other hand, even if not explicitly mapped to a host directory, are still simple bind mounts to the host file system, allowing a higher throughput when writing data.
For further reading, there's an interesting paper by IBM on this topic, which contains some interesting conclusions regarding the performance impact of using Docker volumes (emphasis mine):
AUFS introduces significant overhead which
is not surprising since I/O is going through several layers, [...].
Applications that are filesystem or disk intensive should bypass AUFS
by using volumes. [...]
Although containers themselves have almost no overhead,
Docker is not without performance gotchas. Docker volumes
have noticeably better performance than files stored in AUFS.

Resources