Kubernetes: in-memory volume shared between pods - linux

I need a shared volume accessible from multiple pods for caching files in RAM on each node.
The problem is that the emptyDir volume provisioner (which supports Memory as its medium) is available in Volume spec but not in PersistentVolume spec.
Is there any way to achieve this, except by creating a tmpfs volume manually on each host and mounting it via local or hostPath provisioner in the PV spec?
Note that Docker itself supports such volumes:
docker volume create --driver local --opt type=tmpfs --opt device=tmpfs \
--opt o=size=100m,uid=1000 foo
I don't see any reason why k8s doesn't. Or maybe it does, but it's not obvious?
I tried playing with local and hostPath PVs with mountOptions but it didn't work.

EmtpyDir tied to lifetime of a pod, so it can't be used via shared with multiple pods.
What you request, is additional feature and if you look at below github discussions, you will see that you are not the first that asking for this feature.
consider a tmpfs storage class
Also according your mention that docker supports this tmpfs volume, yes it supports, but you can't share this volume between containers. From Documentation
Limitations of tmpfs mounts:
Unlike volumes and bind mounts, you can’t
share tmpfs mounts between containers.

Related

Docker: mount filesystem in shared volume

docker volume create minty
docker run -v minty:/Minty:rw mango
docker run -v minty:/Minty:rw banana
The mango container then creates several empty folders in /Minty and mounts filesystems on them. Unfortunately, the banana container can see the empty folders, but can't see any of the mounted filesystems.
I presume this is to do with Docker running each container in its own namespace or something. Does anybody know how to fix this?
I've seen several answers that claim to fix this, by making use of "data containers" and the --volumes-from option. However, it appears that data containers are a deprecated feature now. Regardless, I tried it, and it doesn't seem to make any difference; the second container sees only the empty mount points, not the newly-mounted filesystems.
Even bind-mounting a folder to the host doesn't allow the host to see the newly-mounted filesystems.

Bind volumes that contains mount points mounted within docker container

I have a privileged docker container that will mount a custom File System with FUSE.
It is achieved by bind mounting /dev,/sys from the host to the container and running some custom software which accessed block device e.g. /dev/sdX inside the container to mount a custom FS on some mount point, let's say /mnt/some_mountpoint_inside_the_container (everything still happens inside the container).
Now, I would like to access this mount point that are mounted inside the docker container from the host but with no avail. So far, I have tried:
In my docker-compose.yaml, I defined a (binded) volume from host to container, e.g.:
...
volumes:
- /mnt/mountpoint_at_host:/mnt/some_mountpoint_inside_the_container
...
Then, I have FUSE mounted the custom FS inside the container on /mnt/some_mountpoint_inside_the_container. It seems that even I have added files in /mnt/mountpoint_at_host on my host, changes are not reflected within the container (i.e. ls -al /mnt/some_mountpoint_inside_the_container inside the container returns nothing). Only AFTER I have un-mounted /mnt/some_mountpoint_inside_the_container within the container, the created files on the host can now be found on the container.
I have also tried to bind mount a parent folder:
...
volumes:
- /mnt/mountpoint_at_host:/mnt/parent_folder
...
Then I created a folder on my host: mkdir -p /mnt/mountpoint_at_host/the_real_mntpt.
I have then again, FUSE mounted the custom FS in the docker container on:
/mnt/parent_folder/the_real_mntpt.
But still, changes on the host are not reflected on the container side, or the underlining block device.
Is there any way I can access to the mount point that are mounted within the container from the host? I have thought of methods like creating NFS service within the container after I have FUSE mounted the FS, and then exposing the NFS port to the host. But it seems a bit inefficient.
EDIT: I am using Ubuntu with docker.io/docker-compose from apt-get. The container itself is a CentOS 8.

Is it safe to mount /dev into a Docker container

I'm affected by an issue described in moby/moby/27886, meaning loop devices I create in a Docker container do not appear in the container, but they do on the host.
One of the potential workarounds is to mount /dev into the Docker container, so something like:
docker run -v /dev:/dev image:tag
I know that it works, at least on Docker Engine and the Linux kernel that I currently use (20.10.5 and 5.4.0-70-generic respectively), but I'm not sure how portable and safe it is.
In runc's libcontainer/SPEC for filesystems I found some detailed information on /dev mounts for containers, so I'm a bit worried that mounting /dev from the host might cause unwanted side-effects. I even found one problematic use-case that was fixed in 2018...
So, is it safe to mount /dev into a Docker container? Is it guaranteed to work?
I'm not asking if it works, because it seems it works, I'm interested if this is a valid usage, natively supported by Docker Engine.

Dropping priviliges inside of the container

One of my images requires mounting of devices. Thus, it needs cap_sys_admin when starting. However, I'd like to drop this capability once it is no longer needed.
Is there some way of dropping the capability at a later stage?
You should consider using a volume to do the amount instead of requiring the container to do them out from inside.
For example, instead of doing:
docker run --cap-add SYS_ADMIN ...
and then calling mount inside:
mount -t nfs server:/some/path /local/path
Instead, you can create a volume using the 'local' driver like so:
docker volume create -d local -o type=nfs -o device=:/some/path -o o=addr=server,rw my_volume
And then use it when you can run the container:
docker run -v my_volume:/local/path ...
When the container starts, the host will handle doing the mount, as the file system will be available to the container. The container needs no capabilities added.

Disk out of space on Azure Web app on Linux

I having trouble building and deploying new Docker containers on Azure Web App on Linux.
Error logs is claiming to be out off space, and when looking at disk usage through Kudu I can see that I'm indeed out of space.
/>df -H gives:
Filesystem Size Used Avail Use% Mounted on
none 29G 28G 0 100% /
/dev/sda1 29G 28G 0 100% /etc/hosts
Have deployed several docker containers in web apps before and removed them aswell but it seems as they are still taking up space.
Creating a new App Service plan without anything deployed gives about 5.7G of free space.
Can't seem to run docker commands from the Kudu terminal so I'm not able to check how many images and can't figure out how to clean up space. Also sodu isn't available.
Does anyone have any ideas about how to free up some space?
Your disk was indeed full of Docker images. I have cleared them off; you should be unblocked.
This is a known issue that we will have a fix for soon. Iterating and deploying new containers is a common scenario, and the goal is that this should be completely abstracted away and you should not have to worry about this.
I believe my coworker and I ran into this issue when pulling images from a repository on Azure. The images would not be cleared after running docker-compose pull, but not appear to be present on the primary node.
We would see the following upon sshing onto that node:
> ssh username#server.eastus.cloudapp.azure.com -A -p 2200
> df -h
Filesystem Size Used Avail Use% Mounted on
# ...
/dev/sda1 29G 2.0G 26G 8% /
We would still enounter space issues. After some debugging, we found these results differed when attached to a container itself:
> docker-compose exec container_name /bin/bash
> df -h
Filesystem Size Used Avail Use% Mounted on
# ...
/dev/sda1 29G 29G 0G 100% /etc/hosts
The following snippet worked to clear all images not in use without issue:
docker rmi $(docker images --filter "dangling=true" -q --no-trunc)
Note that --no-trunc is required, without it docker complains that the images don't actually exist.

Resources