Linux Named Pipe Mounted on Docker Volume Showing as Regular File - linux

I am trying to use a named pipe to run certain commands from a dockerised guest application to the host.
I am aware of the risks and this is not public facing, so please no comments about not doing this.
I have a named pipe configured on the host using:
sudo mkfifo -m a+rw /path/to/pipe/file
When I check the created pipe permissions with ls -la file, it shows the pipe has been created and intended permissions are set.
prw-rw-rw- 1 root root 0 Feb 2 11:43 file
When I then test the input by catting a command into the pipe from the host, this runs successfully.
Input
echo "echo test" > file
Output
[!] Starting listening on named pipe: file
test
The problem appears to be within my docker container. I have created a volume and mounted the named pipe from the host. When I then start an sh session and ls -l however, the file named pipe appears to be a normal file without the p and permission properties present on the host.
/hostpipe # ls -la
total 12
drwxr-xr-x 2 root root 4096 Feb 1 16:25 .
drwxr-xr-x 1 root root 4096 Feb 2 11:44 ..
-rw-r--r-- 1 root root 11 Feb 2 11:44 file
Running the same and similar echo "echo test" > file does not work from within the guest.
The host is a Linux desktop on baremetal.
Linux desktop 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
And the guest is an Alpine image
FROM python:3.8-alpine
and
Linux b16a4357fcf5 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 Linux
Any idea what is going wrong here?

The issue was how the container was being set up. I was using a regular volume used for persisting data not mounting drives and files. I had to change my definition to use the - type: bind
Using volumes without the bind parameter does not allow use of the host file system functionality and only allows data sharing.
Before
volumes:
- static_data:/vol/static
- ./web:/web
- /opt/named_pipes/:/hostpipe
After
volumes:
- static_data:/vol/static
- ./web:/web
- type: bind
source: /opt/named_pipes/
target: /hostpipe

Related

Can you bind the default network interface of the host into the container to read network stats?

I have a project where I read system information from the host inside a container. Right now I got CPU, RAM and Storage to work, but Network turns out to be a little harder. I am using the Node.js library https://systeminformation.io/network.html, which reads the network stats from /sys/class/net/.
The only solution that I found right now, is to use --network host, but that does not seem like the best way, because it breaks a lot of other networking related stuff and I cannot make the assumption that everybody who uses my project is fine with that.
I have tried --add-host=host.docker.internal:host-gateway as well, but while it does show up in /etc/hosts, it does not add a network interface to /sys/class/net/.
My knowledge on Docker and Linux is very limited, so does someone know if there is any other way?
My workaround for now is, to use readlink -f /sys/class/net/$(ip addr show | awk '/inet.*brd/{print $NF; exit}') to get the final path to the network statistics of the default interface and mount it to a imaginary path in the container. Therefore I don't use the mentioned systeminformation library for that right now. I would still like to have something that is a bit more reliable and in the best case officially supported by docker. I am fine with something that is not compatible with systeminformation, though.
There is a way to enter the host network namespace after starting the container. This can be used to run one process in the container in the container network namespace and another process in the host network namespace. Communication between the processes can be done using a unix domain socket.
Alternatively you can just mount a new instance of the sysfs which points to the host network namespace. If I understood correctly this is what you really need.
For this to work you need access to the host net namespace (I mount /proc/1/ns/net to the container for this purpose). Additionally the capabilities CAP_SYS_PTRACE and CAP_SYS_ADMIN are needed.
# /proc/1 is the 'init' process of the host which is always running in host network namespace
$ docker run -it --rm --cap-add CAP_SYS_PTRACE --cap-add CAP_SYS_ADMIN -v /proc/1/ns/net:/host_ns_net:ro debian:bullseye-slim bash
root#8b40f2f48808:/ ls -l /sys/class/net
lrwxrwxrwx 1 root root 0 Jun 2 21:09 eth0 -> ../../devices/virtual/net/eth0
lrwxrwxrwx 1 root root 0 Jun 2 21:09 lo -> ../../devices/virtual/net/lo
# enter the host network namespace
root#8b40f2f48808:/ nsenter --net=/host_ns_net bash
# now we are in the host network namespace and can see the host network interfaces
root#8b40f2f48808:/ mkdir /sys2
root#8b40f2f48808:/ mount -t sysfs nodevice /sys2
root#8b40f2f48808:/ ls -l /sys2/class/net/
lrwxrwxrwx 1 root root 0 Oct 25 2021 enp2s0 -> ../../devices/pci0000:00/0000:00:1c.1/0000:02:00.0/net/enp2s0
lrwxrwxrwx 1 root root 0 Oct 25 2021 enp3s0 -> ../../devices/pci0000:00/0000:00:1c.2/0000:03:00.0/net/enp3s0
[...]
root#8b40f2f48808:/ ls -l /sys2/class/net/enp2s0/
-r--r--r-- 1 root root 4096 Oct 25 2021 addr_assign_type
-r--r--r-- 1 root root 4096 Oct 25 2021 addr_len
-r--r--r-- 1 root root 4096 Oct 25 2021 address
-r--r--r-- 1 root root 4096 Oct 25 2021 broadcast
[...]
# Now you can switch back to the original network namespace
# of the container; the dir "/sys2" is still accessible
root#8b40f2f48808:/ exit
Putting this together for non-interactive usage:
Use the docker run with the following parameters:
docker run -it --rm --cap-add CAP_SYS_PTRACE --cap-add CAP_SYS_ADMIN -v /proc/1/ns/net:/host_ns_net:ro debian:bullseye-slim bash
Execute these commands in the container before starting your node app:
mkdir /sys2
nsenter --net=/host_ns_net mount -t sysfs nodevice /sys2
After nsenter (and mount) exits, you are back in the network namespace of the container. In theory you could drop the extended capabilities now.
Now you can access the network devices under /sys2/class/net.
You could mount the host's /sys/class/net/ directory as a volume in your container and patch the systeminformation package to read the contents of your custom path instead of the default path. The changes would need to be made in lib/network.js. You can see in that file how the directory is hardcoded throughout, just do a find/replace in your local copy to change all instances of the default path.
An easy way is to mount the whole "/sys" filesystem of the host into the container. Either mount them to a new location (e.g. /sys_host) or over-mount the original "/sys" in the container:
# docker run -it --rm -v /sys:/sys:ro debian:bullseye-slim bash
root#b84df3184dce:/# ls -l /sys/class/net/
lrwxrwxrwx 1 root root 0 Oct 25 2021 enp2s0 -> ../../devices/pci0000:00/0000:00:1c.1/0000:02:00.0/net/enp2s0
lrwxrwxrwx 1 root root 0 Oct 25 2021 enp3s0 -> ../../devices/pci0000:00/0000:00:1c.2/0000:03:00.0/net/enp3s0
[...]
root#b84df3184dce:/# ls -l /sys/class/net/enp2s0/
-r--r--r-- 1 root root 4096 Oct 25 2021 addr_assign_type
-r--r--r-- 1 root root 4096 Oct 25 2021 addr_len
-r--r--r-- 1 root root 4096 Oct 25 2021 address
-r--r--r-- 1 root root 4096 Oct 25 2021 broadcast
[...]
Please be aware that this way the container has access to the whole "/sys" filesystem of the host. The relative links from the network interface to the pci device still work.
If you don't need to write you should mount it read-only by appending ":ro" to the mounted path.

Two identical NFS shares, but only one of the two gives Stale file handle errors

I have a Linux (raspbian) server:
$ uname -a
Linux hester 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux
With two directories that have the same user/group/permissions:
$ ls -ld /mnt/storage/gitea/ /mnt/storage/hester/
drwxr-xr-x 2 nobody nogroup 26 Mar 2 10:20 /mnt/storage/gitea/
drwxr-xr-x 3 nobody nogroup 21 Feb 21 11:26 /mnt/storage/hester/
These two directories are exported with the same parameters in the exports file:
$ cat /etc/exports
/mnt/storage/hester 192.168.1.15(rw,sync,no_subtree_check)
/mnt/storage/gitea 192.168.1.15(rw,sync,no_subtree_check)
On another machine (the 192.168.1.15 mentioned in the exports file) I mount both, successfully :
$ mount /mnt/storage/gitea/
$ echo $?
0
$ mount /mnt/storage/hester/
$ echo $?
0
But now weird things happen:
$ ls -l /mnt/storage/
ls: cannot access '/mnt/storage/gitea': Stale file handle
total 0
d????????? ? ? ? ? ? gitea
drwxr-xr-x 3 nobody nogroup 21 Feb 21 11:26 hester
I really can't figure
what's the source of the error, and above all
where I could look for a difference between the two.
I'm open to suggestions for further investigations or answers for the my doubts. Thanks in advance for any useful input!
I finally found the solution, which was to explicitly add an fsid option in exports:
$ cat /etc/exports
/mnt/storage/hester 192.168.1.15(rw,sync,fsid=20,no_subtree_check)
/mnt/storage/gitea 192.168.1.15(rw,sync,fsid=21,no_subtree_check)
I'm not entirely sure as to the reason why this works. From the man page I get that "NFS needs to be able to identify each filesystem that it exports. Normally it will use a UUID for the filesystem (if the filesystem has such a thing) or the device number of the device holding the filesystem (if the filesystem is stored on the device)."
Both these mountpoints are on the same filesystem, so according to the man page they should have the same fsid, but this causes the same directory to be exported, so I think it means that each export needs to have a separate fsid.
One more note: /mnt/storage is an XFS filesystem over a RAID3, so this could also have made NFS confused about UUIDs of devices.

Mount a file in read/write mode for all in Docker

On my MacOS laptop I mounted a file in my newly created container using:
docker run --name mediawiki --link mysql:mysql -p 80:80 -v /Users/poiuytrez/Downloads/LocalSettings.php:/var/www/html/LocalSettings.php
--rm poiuytrez/mediawiki:1.25.3
However, apache seems to have issues to read the file. We can learn by running a bash command in the container that the read permissions is not applied for all:
root#078252e20671:/var/www/html# ls -l LocalSettings.php
-rw-r----- 1 1000 staff 4857 Nov 18 15:44 LocalSettings.php
I tried the same process on docker installed on a Linux Debian 8 machine and I am getting:
root#16e34a9b169d:/var/www/html# ls -l LocalSettings.php
-rw-r--r-- 1 www-data www-data 4858 Nov 19 13:32 LocalSettings.php
which is much better for me.
How to add the read permissions for everybody without doing a chmod a+r on boot2docker/dockermachine?
I am using Docker 1.8.3
In docker-machine and boot2docker your /Users directory are mapped inside the virtual-machine at the same path, so when you map the volume like:
-v /Users/poiuytrez/Downloads/LocalSettings.php:/var/www/html/LocalSettings.php
actually is the boot2docker directory that you are mounting inside the container, so there is 2 levels.
You can see that the LocalSettings.php owner does not exist inside the container, so when you ls -l the user id are showing in your case userid 1000 and group staff.
-rw-r----- 1 1000 staff 4857 Nov 18 15:44 LocalSettings.php
1000 staff
Try to see the owner and the permissions inside boot2docker vm with boot2docker ssh or docker-machine ssh <you-machine-name> and ls -l inside it.
Other approach is to add an user with id 1000 inside your container and run your web server as this user.
You can also add a fix-permission.sh script to your container run command.
In Docker roadmap there are some improvements in user namespace to come in the next releases. I saw this article some days ago:
http://integratedcode.us/2015/10/13/user-namespaces-have-arrived-in-docker/
I hope it solves this ownership issues.
LocalSettings.php was -rw-r----- on my Mac. So it was the same in the container...

Programmatically create a btrfs file system whose root directory has a specific owner

Background
I have a test script that creates and destroys file systems on the fly, used in a suite of performance tests.
To avoid running the script as root, I have a disk device /dev/testdisk that is owned by a specific user testuser, along with a suitable entry in /etc/fstab:
$ ls -l /dev/testdisk
crw-rw---- 1 testuser testuser 21, 1 Jun 25 12:34 /dev/testdisk
$ grep testdisk /etc/fstab
/dev/testdisk /mnt/testdisk auto noauto,user,rw 0 0
This allows the disk to be mounted and unmounted by a normal user.
Question
I'd like my script (which runs as testuser) to programmatically create a btrfs file system on /dev/testdisk such that the root directory is owned by testuser:
$ mount /dev/testdisk /mnt/testdisk
$ ls -la /mnt/testdisk
total 24
drwxr-xr-x 3 testuser testuser 4096 Jun 25 15:15 .
drwxr-xr-x 3 root root 4096 Jun 23 17:41 ..
drwx------ 2 root root 16384 Jun 25 15:15 lost+found
Can this be done without running the script as root, and without resorting to privilege escalation (use of sudo) within the script?
Comparison to other file systems
With ext{2,3,4} it's possible to create a filesystem whose root directory is owned by the current user, with the following command:
mkfs.ext{2,3,4} -F -E root_owner /dev/testdisk
Workarounds I'd like to avoid (if possible)
I'm aware that I can use the btrfs-convert tool to convert an existing (possibly empty) ext{2,3,4} file system to btrfs format. I could use this workaround in my script (by first creating an ext4 filesystem and then immediately converting it to brtfs) but I'd rather avoid it if there's a way to create the btrfs file system directly.

Can't expose a fuse based volume to a Docker container

I'm trying to provide my docker container a volume of encrypted file system for internal use.
The idea is that the container will write to the volume as usual, but in fact the host will be encrypting the data before writing it to the filesystem.
I'm trying to use EncFS - it works well on the host, e.g:
encfs /encrypted /visible
I can write files to /visible, and those get encrypted.
However, when trying to run a container with /visible as the volume, e.g.:
docker run -i -t --privileged -v /visible:/myvolume imagename bash
I do get a volume in the container, but it's on the original /encrypted folder, not going through the EncFS. If I unmount the EncFS from /visible, I can see the files written by the container. Needless to say /encrypted is empty.
Is there a way to have docker mount the volume through EncFS, and not write directly to the folder?
In contrast, docker works fine when I use an NFS mount as a volume. It writes to the network device, and not to the local folder on which I mounted the device.
Thanks
I am unable to duplicate your problem locally. If I try to expose an encfs filesystem as a Docker volume, I get an error trying to start the container:
FATA[0003] Error response from daemon: Cannot start container <cid>:
setup mount namespace stat /visible: permission denied
So it's possible you have something different going on. In any case, this is what solved my problem:
By default, FUSE only permits the user who mounted a filesystem to have access to that filesystem. When you are running a Docker container, that container is initially running as root.
You can use the allow_root or allow_other mount options when you mount the FUSE filesystem. For example:
$ encfs -o allow_root /encrypted /other
Here, allow_root will permit the root user to have acces to the mountpoint, while allow_other will permit anyone to have access to the mountpoint (provided that the Unix permissions on the directory allow them access).
If I mounted by encfs filesytem using allow_root, I can then expose that filesystem as a Docker volume and the contents of that filesystem are correctly visible from inside the container.
This is definitely because you started the docker daemon before the host mounted the mountpoint. In this case the inode for the directory name is still pointing at the hosts local disk:
ls -i /mounts/
1048579 s3-data-mnt
then if you mount using a fuse daemon like s3fs:
/usr/local/bin/s3fs -o rw -o allow_other -o iam_role=ecsInstanceRole /mounts/s3-data-mnt
ls -i
1 s3-data-mnt
My guess is that docker does some bootstrap caching of the directory names to inodes (someone who has more knowledge of this than can fill in this blank).
Your comment is correct. If you simply restart docker after the mounting has finished your volume will be correctly shared from host to your containers. (Or you can simply delay starting docker until after all your mounts have finished mounting)
What is interesting (but makes complete since to me now) is that upon exiting the container and un-mounting the mountpoint on the host all of my writes from within the container to the shared volume magically appeared (they were being stored at the inode on the host machines local disk):
[root#host s3-data-mnt]# echo foo > bar
[root#host s3-data-mnt]# ls /mounts/s3-data-mnt
total 6
1 drwxrwxrwx 1 root root 0 Jan 1 1970 .
4 dr-xr-xr-x 28 root root 4096 Sep 16 17:06 ..
1 -rw-r--r-- 1 root root 4 Sep 16 17:11 bar
[root#host s3-data-mnt]# docker run -ti -v /mounts/s3-data-mnt:/s3-data busybox /bin/bash
root#5592454f9f4d:/mounts/s3-data# ls -als
total 8
4 drwxr-xr-x 3 root root 4096 Sep 16 16:05 .
4 drwxr-xr-x 12 root root 4096 Sep 16 16:45 ..
root#5592454f9f4d:/s3-data# echo baz > beef
root#5592454f9f4d:/s3-data# ls -als
total 9
4 drwxr-xr-x 3 root root 4096 Sep 16 16:05 .
4 drwxr-xr-x 12 root root 4096 Sep 16 16:45 ..
1 -rw-r--r-- 1 root root 4 Sep 16 17:11 beef
root#5592454f9f4d:/s3-data# exit
exit
[root#host s3-data-mnt]# ls /mounts/s3-data-mnt
total 6
1 drwxrwxrwx 1 root root 0 Jan 1 1970 .
4 dr-xr-xr-x 28 root root 4096 Sep 16 17:06 ..
1 -rw-r--r-- 1 root root 4 Sep 16 17:11 bar
[root#host /]# umount -l s3-data-mnt
[root#host /]# ls -als
[root#ip-10-0-3-233 /]# ls -als /s3-stn-jira-data-mnt/
total 8
4 drwxr-xr-x 2 root root 4096 Sep 16 17:28 .
4 dr-xr-xr-x 28 root root 4096 Sep 16 17:06 ..
1 -rw-r--r-- 1 root root 4 Sep 16 17:11 bar
You might be able to work around this by wrapping the mount call in nsenter to mount it in the same Linux mount namespace as the docker daemon, eg.
nsenter -t "$PID_OF_DOCKER_DAEMON" encfs ...
The question is whether this approach will survive a daemon restart itself. ;-)

Resources