Run each Docker container in a specific user namespace configuration

Run each Docker container in a specific user namespace configuration - linux

Problem:
I am trying to mount a directory as Docker volume in such a way,
that a user, which is created inside a container could write
into a file in that volume. And at the same time, the file should
be at least readable to my user lape outside the container.
Essentially, I need to remap a user UID from container user namespace to a specific UID on the host user namespace.
How can I do that?
I would prefer answers that:
do not involve changing the way how Docker daemon is run;
and allows a possibility to configure container user namespace for each container separately;
do not require rebuilding the image;
I would accept answer that shows a nice solution using Access Control Lists as well;
Setup:
This is how the situation can be replicated.
I have my Linux user lape, assigned to docker group, so I
can run Docker containers without being root.
lape#localhost ~ $ id
uid=1000(lape) gid=1000(lape) groups=1000(lape),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),121(lpadmin),131(sambashare),999(docker)
Dockerfile:
FROM alpine
RUN apk add --update su-exec && rm -rf /var/cache/apk/*
# I create a user inside the image which i want to be mapped to my `lape`
RUN adduser -D -u 800 -g 801 insider
VOLUME /data
COPY ./entrypoint.sh /entrypoint.sh
ENTRYPOINT ["sh", "/entrypoint.sh"]
entrypoint.sh:
#!/bin/sh
chmod 755 /data
chown insider:insider /data
# This will run as `insider`, and will touch a file to the shared volume
# (the name of the file will be current timestamp)
su-exec insider:insider sh -c 'touch /data/$(date +%s)'
# Show permissions of created files
ls -las /data
Once the is built with:
docker build -t nstest
I run the container:
docker run --rm -v $(pwd)/data:/data nstest
The output looks like:
total 8
4 drwxr-xr-x 2 insider insider 4096 Aug 26 08:44 .
4 drwxr-xr-x 31 root root 4096 Aug 26 08:44 ..
0 -rw-r--r-- 1 insider insider 0 Aug 26 08:44 1503737079
So the file seems to be created as user insider.
From my host the permissions look like this:
lape#localhost ~ $ ls -las ./data
total 8
4 drwxr-xr-x 2 800 800 4096 Aug 26 09:44 .
4 drwxrwxr-x 3 lape lape 4096 Aug 26 09:43 ..
0 -rw-r--r-- 1 800 800 0 Aug 26 09:44 1503737079
Which indicates that the file belongs to uid=800 (that is the insider user which does not even exist outside the Docker namespace).
Things I tried already:
I tried specifying --user parameter to docker run, but it seems it can only map which user on the host is mapped to uid=0 (root) inside the docker namespace, in my case the insider is not root. So it did not really work in this case.
The only way how I achieved insider(uid=800) from within container, to be seen as lape(uid=1000) from host, was by adding --userns-remap="default" to the dockerd startup script, and adding dockremap:200:100000 to files /etc/subuid and /etc/subgid as suggested in documentation for --userns-remap. Coincidentally this worked for me, but it is not sufficient solution, because:
it requires reconfigure the way how the Docker daemon runs;
requires to do some arithmetic on user ids: '200 = 1000 - 800', where 1000 is the UID my user on the host, and 800 the UID is of the insider user;
that would not even work if the insider user would need to have a higher UID than my host user;
it can only configure how user namespaces are mapped globally, without a way to have unique configuration per container;
this solution kind of works but it is a bit too ugly for practical usage.

If you just need a read access for your user, the simplest will be to add the read permissions for all files and subdirectories in /data with acls outside of docker.
Add default acl: setfacl -d -m u:lape:-rx /data.
You will also need to give access to the directory itself: setfacl -m u:lape:-rx /data.
Are there any obstacles for such a solution?

Related

Create a user on the docker host from inside a container

I am trying to create a user "foo" on the docker host from within a container, but it fails.
The following files are volume-mounted read-write in the container:
/etc/group:/etc/group:rw
/etc/gshadow:/etc/gshadow:rw
/etc/passwd:/etc/passwd:rw
/etc/shadow:/etc/shadow:rw
When running the following command as root inside the container:
adduser --debug --system --shell /bin/bash --group foo
Then the output is:
Selecting UID from range 100 to 999 ...
Selecting GID from range 100 to 999 ...
Adding system user `foo' (UID 130) ...
Adding new group `foo' (GID 139) ...
/sbin/groupadd -g 139 foo
groupadd: failure while writing changes to /etc/group
adduser: `/sbin/groupadd -g 139 foo' returned error code 10. Exiting.
Permissions of these files look okay to me. Both on the docker host as well as inside the container, permissions are the same.
-rw-r--r-- 1 root root 1167 apr 14 12:51 /etc/group
-rw-r----- 1 root shadow 969 apr 14 12:51 /etc/gshadow
-rw-r--r-- 1 root root 3072 apr 14 12:51 /etc/passwd
-rw-r----- 1 root shadow 1609 apr 14 12:51 /etc/shadow
I have also tried chattr -i on these files, but it still fails.
Is there some other file that I have overlooked and needs to be mounted? Is it even possible what I am trying to achieve?

When you mount an individual file, you end up mounting the inode of that file with the bind mount. And when you write to the file, many tools create a new file, with a new inode, and replace the existing file with that. This avoids partial reads, and other file corruption risks if you were to modify the file in place.
What you are attempting to do is likely a very bad idea, it's the very definition of a container escape, allowing the container to setup credentials on the host. If you really need host access, I'd mount the folder in a different location because containers have other files that are automatically mounted in /etc. So you could say /etc:/host/etc and access the files in the container under /host/etc. Just realize that's even a larger security hole.
Note, if the entire goal is to avoid permission issues between the host and the container, there are much better ways to do this, but that would be an X-Y problem.

How to work with files which belong to the subuser namespace under Linux?

I am using docker on ubuntu 16.04 with user id mapping (user namepsaces) enabled. I have following settings:
/etc/passwd
myusername:x:1000:1000:,,,:/home/myusername:/bin/bash
/etc/subuid
myusername:100000:65536
/etc/subguid
myusername:100000:65536
When I start a container the files are being correctly mapped from 0 (root) to my subuid 100000.
host
-rw-r--r-- 1 100000 100000 0 Mär 30 13:05 testfile
container
rw-r--r-- 1 root root 0 Mar 30 13:05 testfile
I can read the file on the host machine, but I cannot edit it. My assumption was that 100000 is "my" subuid, so I can edit those files. How can I achieve that those files are accessible by myusername without sudo?

I'm not sure how to fix this with user namespace mapping but you can work around it with ACL's.
If you don't mind leaking some UID information into the container, you can add an ACL to the directory for your host user. ACL's sit on top of the standard POSIX permissions.
To set a default ACL on the parent directory, that new entries inherit:
setfacl -d -m u:1000:rwx volume_dir/
To set the ACL on all existing files and directories in a directory:
setfacl -R -m u:1000:rwX volume_dir/
The X auto detects directories and sets them to executable but skips making files executable. Unfortunately this type of differentiation is not available on the default ACL.

Mount a file in read/write mode for all in Docker

On my MacOS laptop I mounted a file in my newly created container using:
docker run --name mediawiki --link mysql:mysql -p 80:80 -v /Users/poiuytrez/Downloads/LocalSettings.php:/var/www/html/LocalSettings.php
--rm poiuytrez/mediawiki:1.25.3
However, apache seems to have issues to read the file. We can learn by running a bash command in the container that the read permissions is not applied for all:
root#078252e20671:/var/www/html# ls -l LocalSettings.php
-rw-r----- 1 1000 staff 4857 Nov 18 15:44 LocalSettings.php
I tried the same process on docker installed on a Linux Debian 8 machine and I am getting:
root#16e34a9b169d:/var/www/html# ls -l LocalSettings.php
-rw-r--r-- 1 www-data www-data 4858 Nov 19 13:32 LocalSettings.php
which is much better for me.
How to add the read permissions for everybody without doing a chmod a+r on boot2docker/dockermachine?
I am using Docker 1.8.3

In docker-machine and boot2docker your /Users directory are mapped inside the virtual-machine at the same path, so when you map the volume like:
-v /Users/poiuytrez/Downloads/LocalSettings.php:/var/www/html/LocalSettings.php
actually is the boot2docker directory that you are mounting inside the container, so there is 2 levels.
You can see that the LocalSettings.php owner does not exist inside the container, so when you ls -l the user id are showing in your case userid 1000 and group staff.
-rw-r----- 1 1000 staff 4857 Nov 18 15:44 LocalSettings.php
1000 staff
Try to see the owner and the permissions inside boot2docker vm with boot2docker ssh or docker-machine ssh <you-machine-name> and ls -l inside it.
Other approach is to add an user with id 1000 inside your container and run your web server as this user.
You can also add a fix-permission.sh script to your container run command.
In Docker roadmap there are some improvements in user namespace to come in the next releases. I saw this article some days ago:
http://integratedcode.us/2015/10/13/user-namespaces-have-arrived-in-docker/
I hope it solves this ownership issues.

LocalSettings.php was -rw-r----- on my Mac. So it was the same in the container...

Can't expose a fuse based volume to a Docker container

I'm trying to provide my docker container a volume of encrypted file system for internal use.
The idea is that the container will write to the volume as usual, but in fact the host will be encrypting the data before writing it to the filesystem.
I'm trying to use EncFS - it works well on the host, e.g:
encfs /encrypted /visible
I can write files to /visible, and those get encrypted.
However, when trying to run a container with /visible as the volume, e.g.:
docker run -i -t --privileged -v /visible:/myvolume imagename bash
I do get a volume in the container, but it's on the original /encrypted folder, not going through the EncFS. If I unmount the EncFS from /visible, I can see the files written by the container. Needless to say /encrypted is empty.
Is there a way to have docker mount the volume through EncFS, and not write directly to the folder?
In contrast, docker works fine when I use an NFS mount as a volume. It writes to the network device, and not to the local folder on which I mounted the device.
Thanks

I am unable to duplicate your problem locally. If I try to expose an encfs filesystem as a Docker volume, I get an error trying to start the container:
FATA[0003] Error response from daemon: Cannot start container <cid>:
setup mount namespace stat /visible: permission denied
So it's possible you have something different going on. In any case, this is what solved my problem:
By default, FUSE only permits the user who mounted a filesystem to have access to that filesystem. When you are running a Docker container, that container is initially running as root.
You can use the allow_root or allow_other mount options when you mount the FUSE filesystem. For example:
$ encfs -o allow_root /encrypted /other
Here, allow_root will permit the root user to have acces to the mountpoint, while allow_other will permit anyone to have access to the mountpoint (provided that the Unix permissions on the directory allow them access).
If I mounted by encfs filesytem using allow_root, I can then expose that filesystem as a Docker volume and the contents of that filesystem are correctly visible from inside the container.

This is definitely because you started the docker daemon before the host mounted the mountpoint. In this case the inode for the directory name is still pointing at the hosts local disk:
ls -i /mounts/
1048579 s3-data-mnt
then if you mount using a fuse daemon like s3fs:
/usr/local/bin/s3fs -o rw -o allow_other -o iam_role=ecsInstanceRole /mounts/s3-data-mnt
ls -i
1 s3-data-mnt
My guess is that docker does some bootstrap caching of the directory names to inodes (someone who has more knowledge of this than can fill in this blank).
Your comment is correct. If you simply restart docker after the mounting has finished your volume will be correctly shared from host to your containers. (Or you can simply delay starting docker until after all your mounts have finished mounting)
What is interesting (but makes complete since to me now) is that upon exiting the container and un-mounting the mountpoint on the host all of my writes from within the container to the shared volume magically appeared (they were being stored at the inode on the host machines local disk):
[root#host s3-data-mnt]# echo foo > bar
[root#host s3-data-mnt]# ls /mounts/s3-data-mnt
total 6
1 drwxrwxrwx 1 root root 0 Jan 1 1970 .
4 dr-xr-xr-x 28 root root 4096 Sep 16 17:06 ..
1 -rw-r--r-- 1 root root 4 Sep 16 17:11 bar
[root#host s3-data-mnt]# docker run -ti -v /mounts/s3-data-mnt:/s3-data busybox /bin/bash
root#5592454f9f4d:/mounts/s3-data# ls -als
total 8
4 drwxr-xr-x 3 root root 4096 Sep 16 16:05 .
4 drwxr-xr-x 12 root root 4096 Sep 16 16:45 ..
root#5592454f9f4d:/s3-data# echo baz > beef
root#5592454f9f4d:/s3-data# ls -als
total 9
4 drwxr-xr-x 3 root root 4096 Sep 16 16:05 .
4 drwxr-xr-x 12 root root 4096 Sep 16 16:45 ..
1 -rw-r--r-- 1 root root 4 Sep 16 17:11 beef
root#5592454f9f4d:/s3-data# exit
exit
[root#host s3-data-mnt]# ls /mounts/s3-data-mnt
total 6
1 drwxrwxrwx 1 root root 0 Jan 1 1970 .
4 dr-xr-xr-x 28 root root 4096 Sep 16 17:06 ..
1 -rw-r--r-- 1 root root 4 Sep 16 17:11 bar
[root#host /]# umount -l s3-data-mnt
[root#host /]# ls -als
[root#ip-10-0-3-233 /]# ls -als /s3-stn-jira-data-mnt/
total 8
4 drwxr-xr-x 2 root root 4096 Sep 16 17:28 .
4 dr-xr-xr-x 28 root root 4096 Sep 16 17:06 ..
1 -rw-r--r-- 1 root root 4 Sep 16 17:11 bar

You might be able to work around this by wrapping the mount call in nsenter to mount it in the same Linux mount namespace as the docker daemon, eg.
nsenter -t "$PID_OF_DOCKER_DAEMON" encfs ...
The question is whether this approach will survive a daemon restart itself. ;-)

why it is not possible to modify file in a directory, where i have read/write group rights

I am currently messing around on my linux system and now I have the following situation.
The directory /srv/http has the following permissions set:
drwxrwxr-x 2 root httpdev 80 Jun 13 11:48 ./
drwxr-xr-x 6 root root 152 Mar 26 13:56 ../
-rwxrwxr-x 1 root httpdev 8 Jun 13 11:48 index.html*
I have created the group httpdev before with the command:
groupadd httpdev
and added my user sighter with:
gpasswd -a sighter httpdev
Then I have set the permissions as above using the chown and chmod commands.
But now I am not allowed to modify the index.html file or create a new file, as user sighter ,with touch like that:
<sighter [bassment] ~http> touch hallo.php
touch: cannot touch `hallo.php': Permission denied
What do I understand wrong. I was expecting that I can do what I want there then the group has all the rights.
The following Output is for your information.
<sighter [bassment] ~http> cat /etc/group | grep sighter
...
httpdev:x:1000:sighter
...
The used linux-distro is archlinux.

Adding a user to a group does not affect currently running sessions. So you have to logout and login again or use su - sighter to login.
After this you should be able to do what you want to do.

You're not in the right group. You need to log out and back in again. Also, superuser.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Run each Docker container in a specific user namespace configuration - linux

Related

Create a user on the docker host from inside a container

How to work with files which belong to the subuser namespace under Linux?

Mount a file in read/write mode for all in Docker

Can't expose a fuse based volume to a Docker container

why it is not possible to modify file in a directory, where i have read/write group rights

Categories

Resources