Is it safe to mount /dev into a Docker container

Is it safe to mount /dev into a Docker container - linux

I'm affected by an issue described in moby/moby/27886, meaning loop devices I create in a Docker container do not appear in the container, but they do on the host.
One of the potential workarounds is to mount /dev into the Docker container, so something like:
docker run -v /dev:/dev image:tag
I know that it works, at least on Docker Engine and the Linux kernel that I currently use (20.10.5 and 5.4.0-70-generic respectively), but I'm not sure how portable and safe it is.
In runc's libcontainer/SPEC for filesystems I found some detailed information on /dev mounts for containers, so I'm a bit worried that mounting /dev from the host might cause unwanted side-effects. I even found one problematic use-case that was fixed in 2018...
So, is it safe to mount /dev into a Docker container? Is it guaranteed to work?
I'm not asking if it works, because it seems it works, I'm interested if this is a valid usage, natively supported by Docker Engine.

Related

Docker: mount filesystem in shared volume

docker volume create minty
docker run -v minty:/Minty:rw mango
docker run -v minty:/Minty:rw banana
The mango container then creates several empty folders in /Minty and mounts filesystems on them. Unfortunately, the banana container can see the empty folders, but can't see any of the mounted filesystems.
I presume this is to do with Docker running each container in its own namespace or something. Does anybody know how to fix this?
I've seen several answers that claim to fix this, by making use of "data containers" and the --volumes-from option. However, it appears that data containers are a deprecated feature now. Regardless, I tried it, and it doesn't seem to make any difference; the second container sees only the empty mount points, not the newly-mounted filesystems.
Even bind-mounting a folder to the host doesn't allow the host to see the newly-mounted filesystems.

How to customize golang-docker image to use golang for scripting?

I came across this blog: using go as a scripting language and tried to create a custom image that I can use to run golang scripts i.e.
FROM golang:1.15
RUN go get github.com/erning/gorun
RUN mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
RUN echo ':golang:E::go::/go/bin/gorun:OC' | tee /proc/sys/fs/binfmt_misc/register
It fails with error:
mount: /proc/sys/fs/binfmt_misc: permission denied.
ERROR: Service 'go_saga' failed to build : The command '/bin/sh -c mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc' returned a non-zero code: 32
It's readonly file system so can't change the permissions as well. The task I'm trying to achieve here is well documented here. Please help me with following questions:
Is that even possible i.e. mount /proc/sys/fs/binfmt_misc and write to the file: /proc/sys/fs/binfmt_misc/register ?
If Yes, how to do that ?
I guess, it would be great, if we could run golang scripts in the container.

First a quick disclaimer that I haven't done this binfmt trick to run go scripts. I suppose it might work, but I just use go run when I want to run something on the fly.
There's a lot to unpack in this. Container isolation runs an application with a shared kernel in an isolated environment. The namespaces, cgroups, and security settings are designed to prevent one container from impacting other containers or the host.
Why is that important? Because /proc/sys/fs/binfmt_misc is interacting with the kernel, and pushing a change to that would be considered a container escape since you're modifying the underlying host.
The next thing to cover is building an image vs running a container. When you build an image with the Dockerfile, you are defining the image filesystem and some metadata (labels, entrypoint, exposed ports, etc). Each RUN command executes that command inside a temporary container, based on the previous step's result, and when the command finishes it captures the changes to the container filesystem. When you mount another filesystem, that doesn't change the underlying container filesystem, so even if you could, the mount command would be a noop during the image build.
So if this is possible, you'll need to do it inside the container rather than during build time, that container will need to be privileged since doing things like mounting filesystems and modifying /proc requires access not normally given to containers, and you'll be modifying the host kernel in the process. You'd need to make the container entrypoint run the mount and register the binfmt_misc entry, and figure out what to do if the entry is already setup/registered, but possibly to a different directory in another container.
As an aside, when dealing with binfmt_misc and containers, the F flag is very important, though in your use case it's important that you don't have it. Typically you need the F flag so the binary is found on the host filesystem rather than searched for within the container filesystem namespace. The typical use case of binfmt_misc and containers is configuring the host to be able to run containers for different architectures, e.g. Docker Desktop can run amd64, arm64, and a bunch of other platforms today using this.
In the end, if you want to run a container as a one off to run a go command as a script, I'd skip the binfmt misc trick and make an entrypoint that does a go run instead. But if you're using the container for longer run processes where you want to periodically run a go file as a script, you'll need to do that in the container, and as a privileged container that has the ability to escape to the host.

Ubuntu docker image doesn't contain any /dev/sdX block devices?

On normal linux machine or VM, "ls /dev" will show a lot of sdX or hdX hard disk information.
But today I just tried latest docker and ubuntu image. ls shows this:
console core fd full mqueue null ptmx pts random shm stderr stdin stdout tty urandom zero
I don't see anything called sdX or hdX.
Why is this? How docker ubuntu store anything without having a /dev/sdX?

Docker doesn't create virtual machines, it creates containers to run an application in an isolated space. Including physical hardware devices would allow those applications to escape from the container isolation, so they are not provided by default. You can include specific devices inside the container with the docker run --device ... cli flag, and you will most likely need to include --privileged give the root user back various capabilities that are removed. None of this is recommended for running untrusted containers.

Docker on the mac separates internal and external file ownerships; not so on linux

Running docker on the Mac, with a centos image, I see mounted volumes taking on the ownership of the centos (internal) user, while on the filesystem the ownership is mine (mdf:mdf).
Using the same centos image on RHEL 7, I see the volumes mounted, but inside, in centos, the home dir and the files all show my uid (1055).
I can do a recursive chown to, e.g., insideguy:insideguy, and all looks right. But back in the host filesystem, the ownerships have changed to some other person in the registry that has the same uid as was selected for insideguy(1001) when useradd was executed.
Is there some fundamental limitation in docker for Linux that makes this happen?
As another side effect, in our cluster one cannot chown on a mounted filesystem, even with sudo privileges; only on a local filesystem. So the desire to keep the docker home directories in, e.g., ~/dockerhome, fails because docker seems to be trying (and failing) to perform some chowns (not described in the Dockerfile or the start script, so assumed to be part of the --volume treatment). Placed in /var or /opt with appropriate ownerships, all goes well.
Any idea what's different between the two docker hosts?
Specifics: OSX 10.11.6; docker v1.12.1 on mac, v1.12.2 on RHEL 7; centos 7

There is a fundamental limitation to Docker on OS X that makes this happen: that is the fact that Docker only runs on Linux.
When running Docker on other platforms, this requires first setting up a Linux VM (historically through VirtualBox, although more recently other options are available) and then running Docker inside that VM.
Because Docker is running natively on Linux, it is sharing filesystems directly with the host when you use something like docker run -v /host/path:/container/path. So if inside the container you run chown userA somefile and user A has userid 1001, and on your host that user id belongs to userB, then of course when you look at the files on the host they will appear to be owned by userB. There's no magic here; this is just how Unix file permissions work. You get the same behavior if, say, you were to move a disk or NFS filesystem from one host to another that had conflicting entries in their local /etc/passwd files.
Most Docker containers are running as root (or at least, not as your local user). This means that any files created by a process in Docker will typically not be owned by you, which can of course cause problems if you are trying to access a filesystem that does not permit this sort of access. Your choices when using Docker are pretty the same choices you have when not using Docker: either ensure that you are running containers as your own user id -- which may not be possible, since many images are built assuming they will be running as root -- or arrange to store files somewhere else.
This is one of the reasons why many people discourage the use of host volume mounts, because it can lead to this sort of confusion (and also because when interacting with a remote Docker API, the remote Docker daemon doesn't have any access to your local host filesystem).
With Docker for Mac, there is some magic file sharing that goes on to expose your local filesystem to the Linux VM (for example, with VirtualBox, Docker may use the shared folders feature). This translation layer is probably the cause of the behavior you've noted on OS X with respect to file ownership.

Mount SMB/CIFS share within a Docker container

I have a web application running in a Docker container. This application needs to access some files on our corporate file server (Windows Server with an Active Directory domain controller). The files I'm trying to access are image files created for our clients and the web application displays them as part of the client's portfolio.
On my development machine I have the appropriate folders mounted via entries in /etc/fstab and the host mount points are mounted in the Docker container via the --volume argument. This works perfectly.
Now I'm trying to put together a production container which will be run on a different server and which doesn't rely on the CIFS share being mounted on the host. So I tried to add the appropriate entries to the /etc/fstab file in the container & mounting them with mount -a. I get mount error(13): Permission denied.
A little research online led me to this article about Docker security. If I'm reading this correctly, it appears that Docker explicitly denies the ability to mount filesystems within a container. I tried mounting the shares read-only, but this (unsurprisingly) also failed.
So, I have two questions:
Am I correct in understanding that Docker prevents any use of mount inside containers?
Can anyone think of another way to accomplish this without mounting a CIFS share on the host and then mounting the host folder in the Docker container?

Yes, Docker is preventing you from mounting a remote volume inside the container as a security measure. If you trust your images and the people who run them, then you can use the --privileged flag with docker run to disable these security measures.
Further, you can combine --cap-add and --cap-drop to give the container only the capabilities that it actually needs. (See documentation) The SYS_ADMIN capability is the one that grants mount privileges.

yes
There is a closed issue mount.cifs within a container
https://github.com/docker/docker/issues/22197
according to which adding
--cap-add SYS_ADMIN --cap-add DAC_READ_SEARCH
to the run options will make mount -t cifs operational.
I tried it out and:
mount -t cifs //<host>/<path> /<localpath> -o user=<user>,password=<user>
within the container then works

You could use the smbclient command (part of the Samba package) to access the SMB/CIFS server from within the Docker container without mounting it, in the same way that you might use curl to download or upload a file.
There is a question on StackExchange Unix that deals with this, but in short:
smbclient //server/share -c 'cd /path/to/file; put myfile'
For multiple files there is the -T option which can create or extract .tar archives, however this looks like it would be a two step process (one to create the .tar and then another to extract it locally). I'm not sure whether you could use a pipe to do it in one step.

You can use a Netshare docker volume plugin which allows to mount remote CIFS/Samba as volumes.

Do not make your containers less secure by exposing many ports just to mount a share. Or by running it as --privileged
Here is how I solved this issue:
First mount the volume on the server that runs docker.
sudo mount -t cifs -o username=YourUserName,uid=$(id -u),gid=$(id -g) //SERVER/share ~/WinShare
Change the username, SERVER and WinShare here. This will ask your sudo password, then it will ask password for the remote share.
Let's assume you created WinShare folder inside your home folder. After running this command you should be able to see all the shared folders and files in WinShare folder. In addition to that since you use the uidand gid tags you will have write access without using sudo all the time.
Now you can run your container by using -v tag and share a volume between the server and the container.
Let's say you ran it like the following.
docker run -d --name mycontainer -v /home/WinShare:/home 2d244422164
You should be able to access the windows share and modify it from your container now.
To test it just do:
docker exec -it yourRunningContainer /bin/bash
cd /Home
touch testdocfromcontainer.txt
You should see testdocfromcontainer.txt in the windows share.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string