How to increase shared memory (dev/shm) of Azure Container Instance? - azure

I am running Selenium Standalone Firefox as Azure Container Instance. To solve the error that often occurs when running protractor tests "failed to decode response from marionette" I need to increase shared memory of the container.
It is not possible to pass it as a parameter to az container create command that I am using in my pipeline.
I tried to pass it as command line script to be executed after the container is deployed
--command-line "/bin/sh -c 'sudo mount -o remount,size=2G /dev/shm'" but it does not work because the container is read-only, and unfortunately, according to https://feedback.azure.com/forums/602224-azure-container-instances/suggestions/33870166-aci-support-for-privileged-container it is not possible to run container instance in privileged mode to allow write-mode.
Do you have any ideas ?
Thanks,
Magda

This is not supported and would be very hard to support since it opens a lot of risk for the VM running the different container groups.
The underlying memory/CPU is shared with other users, allowing extra /DEV/SHM could potentially hide the real memory usage of the container hence affecting other containers running on the same now.
This request has been made in the past. see below.
https://feedback.azure.com/forums/602224-azure-container-instances/suggestions/37442194-allow-specifying-the-size-of-the-dev-shm-filesyst
I would suggest to look at the Kubernetes alternative, it supports emptyDir with medium type: memory, which would be creating the right temps directory for your need.
you can set the emptyDir.medium field to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead. While tmpfs is very fast, be aware that unlike disks, tmpfs is cleared on node reboot and any files you write will count against your Container’s memory limit.
https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

Related

does docker manage filesystem like a standalone OS?

I have a program I'm running in a docker container. After 10-12 hours of run, the program terminated with filesystem-related errors (FileNotFoundError, or similar).
I'm wondering if the disk space got filled up or a similar filesystem-related issue or there was a problem in my code (e.g one process deleted the file pre-maturely).
I don't know much about docker management of files and wonder if inside docker it creates and manages its own FS or not. Here are three possibilities I'm considering and mainly wonder if #1 could be the case or not.
If docker manages it's own filesystem, could it be that although disk space is available on the host machine, docker container ran out of it's own storage space? (I've seen similar issues regarding running out of memory for a process that has limited memory artificially imposed using cgroups)
Could it be that host filesystem ran out of space and the files got corrupted or maybe didn't get written correctly?
There is some bug in my code.
This is likely a bug in your code. Most programs print the error they encounter, and when a program encounters out-of-space, the error returned by the filesystem is: "No space left on device" (errno 28 ENOSPC).
If you see FileNotFoundError, that means the file is missing. My best theory is that it's coming from your consumer process.
It's still possible though, that the file doesn't exist because the producer ran out of space and you didn't handle the error correctly - you'll need to check your logs.
It might also be a race condition, depending on your application. There's really not enough details to answer that.
As to the title question:
By default, docker just overlay-mounts an empty directory from the host's filesystem into the container, so the amount of free space on the container is the same as the amount on the host.
If you're using volumes, that depends on the storage driver you use. As #Dan Serbyn mentioned, the default limit for the devicemapper driver is 10 GB. The overlay2 driver - the default driver - doesn't have that limitation.
In the current Docker version, there is a default limitation on the Docker container storage of 10 GB.
You can check the disk space that containers are using by running the following command:
docker system df
It's also possible that the file your container is trying to access has access level restrictions. Try to make it available for docker or maybe everybody (chmod 777 file.txt).

Obtaining Linux host metrics from within a docker container

Looking for some recommendations for how to report linux host metrics such as cpu and memory utilization and disk usage stats from within a docker container. The host will contain a number of docker containers. One thought was to run Top and other basic linux commands from the outside the container and push them into a container folder that has the appropriate authorization so that they can be consumed. Another thought was to use the docker api to run docker stats for the containers but not sure this is the best as it may not report on other processes running on the host that are not containerized. A third option would be to somehow execute something like TOP and other commands on the host from within the container, this option being the most ideal for my situation. I was just looking for some proven design patterns that others have used. Also, I don’t have the ability to install a bunch of tools on the host as this would be a customer host which I don’t have control as to what is already installed.
You may run your container in privileged mode, but be aware that it this could compromise the host security as your container will no longer be in a sandboxed environment.
docker run -d --privileged --pid=host alpine:3.8 sh
When the operator executes docker run --privileged, Docker will enable access to all devices on the host as well as set some configuration in AppArmor or SELinux to allow the container nearly all the same access to the host as processes running outside containers on the host. Additional information about running with --privileged is available on the Docker Blog.
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
Good reference: https://security.stackexchange.com/a/218379

Reliably targeting correct Azure Managed Disk on a Linux VM using an Ansible playbook

How do I reliably partition and mount the file system of an Azure Managed Disk on a Linux VM using Ansible playbooks?
I can create an Azure Managed Disk with azure_rm_manageddisk and assign it to a VM instance. My issue starts when I'm trying to take the disk into use. I just don't know how to reliably target the correct managed disk anymore for partitioning and file system mounting.
Neither azure_rm_manageddisk nor azure_rm_manageddisk_info seems to return a reliable, unambigious id for the disk that could be referred from the OS side.
I don't think the disk even shows up on blkid before it has been partitioned.
Microsoft has documented that
By default when you create a VM, Azure provides you with an OS disk (/dev/sda) and a temporary disk (/dev/sdb). All additional disks you add show up as /dev/sdc, /dev/sdd, /dev/sde and so on.
(source: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/optimization)
but this doesn't seem reliable. I think I saw my VM have a setup different from this right after creation and this is definitely going to change after reboot. So no trusting /dev/sdc in my opinion. A rerun of a playbook could cause all kinds of havoc, if the block device files aren't stable. I actually managed already to make my root partition visible at /media/my_data_disk_mount.
Is this just something I will have to handle manually? Seems odd. It's such a common thing to do.
There's also /dev/disk/azure/resource for example, but that seemed to lead to messy results also.
(source: https://learn.microsoft.com/en-us/azure/virtual-machines/troubleshooting/troubleshoot-device-names-problems)
Maybe something with the LUN numbers?
According to the messages, you want to find the correct disk and get the UUID to mount. What you think is right. You can use the LUN of the disk to judge which one is you want. You can use the command tree /dev/disk/azure and it shows as below:
You can see the disk /dev/sdc use the lun1. And you can also find which disk is using the lun1in the Azure portal. And then you can use the command sudo blkid to get the UUID after you init the disk:

Is there a way to restrict untrusted container scheduler?

I have an application which I'd like to give the privilege to launch short-lived tasks and schedule these as docker containers. I was thinking of doing this simply via docker run.
As I want to make the attack surface as small as possible, I treat the application as untrusted. As such it can potentially run arbitrary docker run commands (if the codebase contained bug or the container was compromised, input was improperly escaped somewhere etc.) against a predefined docker API endpoint.
This is why I'd like to restrict that application (effectively a scheduler) in some ways:
prevent --privileged use
enforce --read-only flag
enforce memory & CPU limits
I looked at couple of options:
selinux
the selinux policies would need to be set on the host level and then propagated inside the containers via --selinux-enabled flag on the daemon level. The scheduler can however override this anyway via run --privileged.
seccomp profiles
these are only applied at a time of launching the container (seccomp flags are available for docker run)
AppArmor
this can (again) be overriden on the scheduler level via --privileged
docker daemon --exec-opts flag
only a single option is actually available for this flag (native.cgroupdriver)
It seems that Docker is designed to trust container schedulers by default.
Does anyone know if this is a design decision?
Is there any other possible solution available w/ current latest Docker version that I missed?
I also looked at Kubernetes and its Limit Ranges & Resource Quotas which can be applied to K8S namespaces, which looked interesting, assuming there's a way to enforce certain schedulers to only use certain namespaces. This would however increase the scope of this problem to operating K8S cluster.
running docker on a unix platform should be compatible with nice Or so I would think at first looking a little more closely it looks like you need somethign like -cpuset-cpus="0,1"
From the second link , "The --cpu-quota looks to be similar to the --cpuset-cpus ... allocate one or a few cores to a process, it's just time managed instead of processor number managed."

LXC without chroot

Is there any way to use LXC for resource management using process groups without creating containers? I am working on a service that runs arbitrary code inside a sandbox, for which I am only interested in hardware resource management. I don't want any chrooting; I just want these process groups to have access to the main file system.
I was told that lxc is light weight, but all the examples that I see create a new container (I.e. a dir with a full OS) for every lxc process. I don't really see how this is much lighter than any other VM solution.
So is there any way that LXC can be used to control and manage multiple process groups, without creating separate containers for each and every one of them?
LXC isn't a monolithic system. It's a collection of kernel features that can be used to isolate processes in various different ways, and a userspace tool to use all of these features together to create full-fledged containers. But the individual features are still usable on their own, without LXC. Furthermore, LXC does not require a chroot, and even when you give it a chroot, you can bind-mount directories from the host system into the container, sharing those particular directory trees between the host and the container.
For instance, cgroups are used by LXC to set resource limits on containers. But they can be used to set resource limits on groups of processes without using the LXC tools at all. You can manipulate /sys/fd/cgroup/memory or /sys/fs/cgroup/cpuacct directly, to put process into cgroups that limit the amount of memory or CPU they are allowed to use. Or if you're on a system using systemd, you can control the memory limits for a group of processes using MemoryLimit=200M or the like in the .service file for a given service.
If you want to use LXC to do lightweight resource management, you can do that with or without a chroot. When starting an LXC container, you can choose which resources you want to isolate; so you could create a container with only a virtualized network, and nothing else; or a container with only memory limits, but sharing everything else with the host. The only things that will be isolated are those specified in the configuration file for your container. For example, lxc ships with several example container definitions that only isolate the network; they share a root partition and almost everything else with the host. Here's how to run a container identical to the host system except it has no network interface:
sudo lxc-execute -n foo -f /usr/share/doc/lxc/examples/lxc-no-netns.conf /bin/bash
If you want some files to be shared with the host, but not others, you have two choices; you could use a shared root directory, and mount over the files that you want to be different in the container; or you could use a chroot, but mount the files that you do want to share in the container.
For example, here's the configuration for a container that shares everything with the host except for /home; it instead bind-mounts /home/me/fake-home over /home within the container:
lxc.mount.entry = /home/me/fake-home /home none rw,bind 0 0
Or if you want to have a completely different root, but still share some directories like /usr, you can bind mount a few directories into a directory, and use that as the root of the filesystem.
So you have lots of options, and can choose to isolate just one component, more than one, or as many as LXC supports, depending on your needs.

Resources