I have a program I'm running in a docker container. After 10-12 hours of run, the program terminated with filesystem-related errors (FileNotFoundError, or similar).
I'm wondering if the disk space got filled up or a similar filesystem-related issue or there was a problem in my code (e.g one process deleted the file pre-maturely).
I don't know much about docker management of files and wonder if inside docker it creates and manages its own FS or not. Here are three possibilities I'm considering and mainly wonder if #1 could be the case or not.
If docker manages it's own filesystem, could it be that although disk space is available on the host machine, docker container ran out of it's own storage space? (I've seen similar issues regarding running out of memory for a process that has limited memory artificially imposed using cgroups)
Could it be that host filesystem ran out of space and the files got corrupted or maybe didn't get written correctly?
There is some bug in my code.
This is likely a bug in your code. Most programs print the error they encounter, and when a program encounters out-of-space, the error returned by the filesystem is: "No space left on device" (errno 28 ENOSPC).
If you see FileNotFoundError, that means the file is missing. My best theory is that it's coming from your consumer process.
It's still possible though, that the file doesn't exist because the producer ran out of space and you didn't handle the error correctly - you'll need to check your logs.
It might also be a race condition, depending on your application. There's really not enough details to answer that.
As to the title question:
By default, docker just overlay-mounts an empty directory from the host's filesystem into the container, so the amount of free space on the container is the same as the amount on the host.
If you're using volumes, that depends on the storage driver you use. As #Dan Serbyn mentioned, the default limit for the devicemapper driver is 10 GB. The overlay2 driver - the default driver - doesn't have that limitation.
In the current Docker version, there is a default limitation on the Docker container storage of 10 GB.
You can check the disk space that containers are using by running the following command:
docker system df
It's also possible that the file your container is trying to access has access level restrictions. Try to make it available for docker or maybe everybody (chmod 777 file.txt).
Related
I have run into a problem today where I am unable to connect via SSH to my Google Cloud VM instance running debian-10-buster. SSH has been working until today when it suddenly lost connection while docker was running. I've tried rebooting the VM instance and resetting, but the problem still persists. This is the serial console output on GCE, but I am not sure what to look for in that, so any help would be highly appreciated.
Another weird thing is that earlier today before the problem started, my disk usage was fine and then suddenly I was getting a bunch of errors that the disk was out of space even after I tried clearing up a bunch of space. df showed that the disk was 100% full to the point where I couldn't even install ncdu to see what was taking the space. So then I tried rebooting the instance to see if that would help and that's when the SSH problem started. Now I am unable to connect to SSH at all (even through the online GCE interface), so I am not sure what next steps to take.
Your system has run out of disk space for the boot (root) file system.
The error message is:
Root filesystem has insufficient free space
Shutdown the VM, resize the disk larger in the Google Cloud Web GUI and then restart the VM.
Provided that there are no uncorrectable file system errors, your system will startup, resize the partition and file system, and be fine.
If you have modified the boot disk (partition restructuring, added additional partitions, etc) then you will need to repair and resize manually.
I wrote an article on resizing the Debian root file system. My article goes into more detail than you need, but I do explain the low level details of what happens.
Google Cloud – Debian 9 – Resize Root File System
I am running Selenium Standalone Firefox as Azure Container Instance. To solve the error that often occurs when running protractor tests "failed to decode response from marionette" I need to increase shared memory of the container.
It is not possible to pass it as a parameter to az container create command that I am using in my pipeline.
I tried to pass it as command line script to be executed after the container is deployed
--command-line "/bin/sh -c 'sudo mount -o remount,size=2G /dev/shm'" but it does not work because the container is read-only, and unfortunately, according to https://feedback.azure.com/forums/602224-azure-container-instances/suggestions/33870166-aci-support-for-privileged-container it is not possible to run container instance in privileged mode to allow write-mode.
Do you have any ideas ?
Thanks,
Magda
This is not supported and would be very hard to support since it opens a lot of risk for the VM running the different container groups.
The underlying memory/CPU is shared with other users, allowing extra /DEV/SHM could potentially hide the real memory usage of the container hence affecting other containers running on the same now.
This request has been made in the past. see below.
https://feedback.azure.com/forums/602224-azure-container-instances/suggestions/37442194-allow-specifying-the-size-of-the-dev-shm-filesyst
I would suggest to look at the Kubernetes alternative, it supports emptyDir with medium type: memory, which would be creating the right temps directory for your need.
you can set the emptyDir.medium field to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead. While tmpfs is very fast, be aware that unlike disks, tmpfs is cleared on node reboot and any files you write will count against your Container’s memory limit.
https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
This is a duplicate of a post I have created in the docker forum. Thus I am going to close this / the other one once this problem is solved. But since no one answers in the docker forum and my problem persists, I'll post it again, looking forward to get an answer.
I would like to expose a server monitoring app as a docker container. The app I have written relies on /proc to read system information like CPU utilization or disk stats. Thus I have to forward the information provided in hosts /proc virtual file system to my docker container.
So I made a simple image (using the first or second intro on docker website: Link) and started it:
docker run -v=/proc:/host/proc:ro -d hostfiletest
Assuming the running container could read from /host/proc to obtain information about the host system.
I fired up a console inside the container to check:
docker exec -it {one of the funny names the container get} bash
And checked the content of /host/proc.
Easiest way to check it was getting the content of /host/proc/sys/kernel/hostname - that should yield the hostname of the vm I am working on.
But I get the hostname of the container, while /host/proc/uptime gets me the correct uptime of the vm.
Do I miss something here? Maybe something conceptual?
Docker version 17.05.0-ce, build 89658be running on Linux 4.4.0-97-generic (VM)
Update:
I found several articles describing how to run a specific monitoring app inside a containing using the same approach I mentioned above.
Update:
Just tried using an existing Ubuntu image - same behavior. Running the image privileged and with pid=host doesn't help.
Greetings
Peepe
The reason of this problem is that /proc is not a normal filesystem. According to procfs, it is like an interface to access some kernel data and system information. This interface provides a file-like structure, so it can make people misunderstand that it is a normal directory.
Files in /proc are also not normal files. They are empty (size = 0). You can check by yourself.
$ stat /proc/sys/kernel/hostname
File: /proc/sys/kernel/hostname
Size: 0 Blocks: 0 IO Block: 1024 regular empty file
So the file doesn't hold any data, but when you read the file, the kernel will dynamically return to you a corresponding system information.
To answer your question, /proc/sys/kernel/hostname is just an interface to access the hostname. And depending on where you access that interface, on the host or on the container, you will get the corresponding hostname. This is also applied when you use bind mount -v /proc:/hosts/proc:ro, since bind mount will provide an alternative view of /proc. If you call the interface /hosts/proc/sys/kernel/hostname, the kernel will return the hostname of the box where you are in (the container).
In short, think about/proc/sys/kernel/hostname as a mirror, if your host stands in front of it, it will reflect the host. If it is the container, it will reflect the container.
I know its a few months later no but I came across the same problem today.
In my case I was using psutil in Python to read disk stats of the hosts from inside a docker container.
The solution was to mount the whole host files system as read only into the docker container with -v /:/rootfs:ro and specify the path to proc as psutil.PROCFS_PATH = '/rootfs/proc'.
Now the psutil.disk_partitions() lists all partitions from the host files system. As the hostname is also contained within the proc hierarchy, I guess this also works for other host system information as long the the retrieving command points to /rootsfs/proc.
I'm currently trying to log in to one of the instances created on google cloud, but found myself unable to do so. Somehow the machine escaped my attention and the hard disk got completely full. Of course I wanted to free some disk space and make sure the server running could restart, but I am facing some issues.
First off, I have found the guide on increasing the size of the persistent disk (https://cloud.google.com/compute/docs/disks/add-persistent-disk). I followed that and already set it 50 GB which should be fine for now.
However, on file system level because my disk is full I cannot make any SSH connection. The error is simply a timeout caused by the fact that there is absolutely no space for the SSH deamon to write to its log. Without any form of connection I cannot free some disk space and/or run the "resize2fs" command.
Furthermore, I already tried different approaches.
I seem to not be able to change the boot disk to something else.
I created a snapshot and tried to increase the disk size on the new
instance I created from that snapshot, but it has the same problem
(filesystem is stuck at 15GB).
I am not allowed to mount the disk as an additional disk in another
instance.
Currently I'm pretty much out of ideas. The important data on the disk was back-upped but I'd rather have the settings working as well. Does anyone have any clues as where to start?
[EDIT]
Currently still trying out new things. I have also tried to run shutdown- and startup scripts that remove /opt/* in order to free some temporary space but the script either don't run or provide some error I cannot catch. It's pretty frustrating working nearly blind I must say.
The next step for me would be to try and get the snapshot locally. It should be doable using the bucket but I will let you know.
[EDIT2]
Getting a snapshot locally is not an option either or so it seems. Images from the google cloud instances can only be created or deleted, but not downloaded.
I'm now out of ideas.
So I finally found the answer. These steps were taken:
In the GUI I increased the size of the disk to 50 GB.
In the GUI I detached the drive by deleting the machine whilst
ensuring that I did not throw away the original disk.
In the GUI I created a new machine with a sufficiently big harddisk.
On the command line (important!!) I attached the disk to the newly
created machine (the GUI option has a bug still ...)
After that I could mount the disk as a secondary disk and perform all the operations I needed.
Keep in mind: By default google cloud solutions do NOT use logical volume management, so pvresize/lvresize/etc. is not installed and resize2fs might not work out of the box.
I am trying to setup a docker image with a DB2 database.
The installation is completed without any problems, but I get the following error when I try to restart the database:
SQL1084C Shared memory segments cannot be allocated. SQLSTATE=57019
I based the Dockerfile on this one:
https://github.com/jeffbonhag/db2-docker
where he states the same problem should be addressed by adding the command
sysctl kernel.shmmax=18446744073692774399
to allow the kernel to allocate more memory but the error persists.
The docker daemon itself runs in Ubuntu 14.04 which runs inside Parallels on MacOSX.
EDIT: After some search I found out that this is related to the following command:
UPDATE DB CFG FOR S0MXAT01 USING locklist 100000;
You are over-allocating the database memory heap, i.e. docker is unable to satisfy the memory requirements. Have a look at the following link to the manuals. This will give you a breakdown of what is located in the database memory:
Bufferpools
The database heap
The locklist
The utility heap
The package cache
The catalog cache
The shared sort heap, if it is enabled
A 20% overflow area
You can fiddle around with (decrease) any of this heaps until docker is happy.
In case others run into this - If you're rolling your own container and leave memory set at automatic, it may try to allocate all the memory on the host to Db2, leading to this error. Sometimes the initial start works out ok, but you end up with odd crashes weeks or months down the line.
The "official" db2 container (the developer community edition one) handles this. If you're building your own container, you'll likely need to set DATABASE_MEMORY and/or INSTANCE_MEMORY to reasonable limits based on the size of your container and restart Db2 in the container. This can be done in your entrypoint script.