Unable to increase disk size on file system - linux

I'm currently trying to log in to one of the instances created on google cloud, but found myself unable to do so. Somehow the machine escaped my attention and the hard disk got completely full. Of course I wanted to free some disk space and make sure the server running could restart, but I am facing some issues.
First off, I have found the guide on increasing the size of the persistent disk (https://cloud.google.com/compute/docs/disks/add-persistent-disk). I followed that and already set it 50 GB which should be fine for now.
However, on file system level because my disk is full I cannot make any SSH connection. The error is simply a timeout caused by the fact that there is absolutely no space for the SSH deamon to write to its log. Without any form of connection I cannot free some disk space and/or run the "resize2fs" command.
Furthermore, I already tried different approaches.
I seem to not be able to change the boot disk to something else.
I created a snapshot and tried to increase the disk size on the new
instance I created from that snapshot, but it has the same problem
(filesystem is stuck at 15GB).
I am not allowed to mount the disk as an additional disk in another
instance.
Currently I'm pretty much out of ideas. The important data on the disk was back-upped but I'd rather have the settings working as well. Does anyone have any clues as where to start?
[EDIT]
Currently still trying out new things. I have also tried to run shutdown- and startup scripts that remove /opt/* in order to free some temporary space but the script either don't run or provide some error I cannot catch. It's pretty frustrating working nearly blind I must say.
The next step for me would be to try and get the snapshot locally. It should be doable using the bucket but I will let you know.
[EDIT2]
Getting a snapshot locally is not an option either or so it seems. Images from the google cloud instances can only be created or deleted, but not downloaded.
I'm now out of ideas.

So I finally found the answer. These steps were taken:
In the GUI I increased the size of the disk to 50 GB.
In the GUI I detached the drive by deleting the machine whilst
ensuring that I did not throw away the original disk.
In the GUI I created a new machine with a sufficiently big harddisk.
On the command line (important!!) I attached the disk to the newly
created machine (the GUI option has a bug still ...)
After that I could mount the disk as a secondary disk and perform all the operations I needed.
Keep in mind: By default google cloud solutions do NOT use logical volume management, so pvresize/lvresize/etc. is not installed and resize2fs might not work out of the box.

Related

does docker manage filesystem like a standalone OS?

I have a program I'm running in a docker container. After 10-12 hours of run, the program terminated with filesystem-related errors (FileNotFoundError, or similar).
I'm wondering if the disk space got filled up or a similar filesystem-related issue or there was a problem in my code (e.g one process deleted the file pre-maturely).
I don't know much about docker management of files and wonder if inside docker it creates and manages its own FS or not. Here are three possibilities I'm considering and mainly wonder if #1 could be the case or not.
If docker manages it's own filesystem, could it be that although disk space is available on the host machine, docker container ran out of it's own storage space? (I've seen similar issues regarding running out of memory for a process that has limited memory artificially imposed using cgroups)
Could it be that host filesystem ran out of space and the files got corrupted or maybe didn't get written correctly?
There is some bug in my code.
This is likely a bug in your code. Most programs print the error they encounter, and when a program encounters out-of-space, the error returned by the filesystem is: "No space left on device" (errno 28 ENOSPC).
If you see FileNotFoundError, that means the file is missing. My best theory is that it's coming from your consumer process.
It's still possible though, that the file doesn't exist because the producer ran out of space and you didn't handle the error correctly - you'll need to check your logs.
It might also be a race condition, depending on your application. There's really not enough details to answer that.
As to the title question:
By default, docker just overlay-mounts an empty directory from the host's filesystem into the container, so the amount of free space on the container is the same as the amount on the host.
If you're using volumes, that depends on the storage driver you use. As #Dan Serbyn mentioned, the default limit for the devicemapper driver is 10 GB. The overlay2 driver - the default driver - doesn't have that limitation.
In the current Docker version, there is a default limitation on the Docker container storage of 10 GB.
You can check the disk space that containers are using by running the following command:
docker system df
It's also possible that the file your container is trying to access has access level restrictions. Try to make it available for docker or maybe everybody (chmod 777 file.txt).

Recover deleted folder from Google VPS

We have a VPS running on Google Cloud which had a very important folder in a user directory. An employee of ours deleted that folder and we can't seem to figure out how to recover it. I came across extundelete but it seems the partition needs to be unmounted for it to work but I don't understand how I would do it on Google. This project took more than a year and that was the latest copy after a fire which took out the last copy from our local servers.
Could anyone please help or guide me in the right direction?
Getting any files back from your VM's disk may be tricky (at best) or impossible (most probably) if the files got overwritten.
Easiest way would be to get them back from a copy or snapshot of your VM's disk. If you have a snapshot of your disk (either taken manually or automatically) from before when the folder in question got delete then you will get your files back.
If you don't have any backups then you may try to recover the files - I've found many guides and tutorials, let me just link the ones I believe would help you the most:
Unix/Linux undelete/recover deleted files
Recovering accidentally deleted files
Get list of files deleted by rm -rf
------------- UPDATE -----------
Your last chance in this battle is to make two clones of the disk
and then detach original disk from the VM and attach one of the clones to keep your VM running. Then use second clone for any experiments. Keep the original untouched in case you mess up the second clone.
Now create a new Windows VM and attach your second clone as the additional disk. At this moment you're ready to try various data redovery software;
UFS Explorer
Virtual Machine Data Recovery
There are plenty of others to try from too.
Another approach would be to create an image from the original disk and export it as a VMDK imagae (and save it to a storage bucket). Then download it to yor local computer and then use for example VMware VMDK Recovery or other specialized software for extracting data from virtual machines disk images.

Unable to connect to SSH on Google Cloud VM Instance

I have run into a problem today where I am unable to connect via SSH to my Google Cloud VM instance running debian-10-buster. SSH has been working until today when it suddenly lost connection while docker was running. I've tried rebooting the VM instance and resetting, but the problem still persists. This is the serial console output on GCE, but I am not sure what to look for in that, so any help would be highly appreciated.
Another weird thing is that earlier today before the problem started, my disk usage was fine and then suddenly I was getting a bunch of errors that the disk was out of space even after I tried clearing up a bunch of space. df showed that the disk was 100% full to the point where I couldn't even install ncdu to see what was taking the space. So then I tried rebooting the instance to see if that would help and that's when the SSH problem started. Now I am unable to connect to SSH at all (even through the online GCE interface), so I am not sure what next steps to take.
Your system has run out of disk space for the boot (root) file system.
The error message is:
Root filesystem has insufficient free space
Shutdown the VM, resize the disk larger in the Google Cloud Web GUI and then restart the VM.
Provided that there are no uncorrectable file system errors, your system will startup, resize the partition and file system, and be fine.
If you have modified the boot disk (partition restructuring, added additional partitions, etc) then you will need to repair and resize manually.
I wrote an article on resizing the Debian root file system. My article goes into more detail than you need, but I do explain the low level details of what happens.
Google Cloud – Debian 9 – Resize Root File System

Proxmox VE: How to create a raw disk and pass it through to a VM

I am searching for an answer on how to create and pass through a raw device to a VM using proxmox. Through that I am hoping to have full control of the disk including S.M.A.R.T. stats and disk spindown.
Currently I am using passthrough using the SATA passthrough offered by proxmox.
Unfortunately I have no clue how to create a raw disk file from my (empty) disk). Furthermore I am not entirely certain on how to bind it to the VM.
I hope someone knows the relevant steps.
Side notes:
This question is just a measure I want to try out to achieve a certain goal. For the sake of simplicity I posed my question confined to the part above. However, if you have a better idea, feel free to give me a hint. So far I have tried a lot of things to achieve my ultimate goal.
Goal that I want to achieve:
I am using Proxmox VE 5.3-8 on a HP Proliant Gen 8 server. It hosts several VMs among which OMV should serve as a NAS. Since the files will not be accessed too often, I opt for a spindown of the drives.
My goal is reduction of noise and power savings.
Current status:
I passed through two disks by adding them to
/etc/pve/nodes/pve/qemu-server/vmid.conf
sata1: /dev/disk/by-id/{disk-id}
Through that I do see SMART stats and everything except disk spindown works fine. Using virtio instead of SATA does not give me SMART values.
using hdparm -y to put a drive to sleep does not work inside the VM. Doing the same on the proxmox console result in a sleep, but it wakes up a few seconds later.
Passing through the entire HBA is currently not an option.
I read in a forum that first installing Debian and then manually installing the proxmox packages resulted in a success. However that was still for Debian jessie and three years ago.
Install Proxmox VE on Debian Stretch
Before I try this as a last resort, I want to make sure if passing the disk through as a raw file will lead to the result.
Maybe someone has an idea on how to achieve my ultimate goal.
I do not have a clear answer to your question, as per "passing through" the disk, but i recently found a good enough solution for my use case.
I have an HDD that i planned to use as a backup dir for VMs, but i also wanted to put any kind of data on it, and share that disk with any VM that would like to.
The solution i found is to format the disk using ZFS, then creating mount points for different usage (vzdump backup, shared nas folder accross VMs + ISO mounting point etc...). I followed this guide: https://forum.level1techs.com/t/how-to-create-a-nas-using-zfs-and-proxmox-with-pictures/117375
I ended up installing samba on proxmox host itself, with a config to share some folder/mount point of the disk, via SMB. Now the device appears as a normal disk over the network, with excellent read/write speed as everything is local.
Sorry that this post does not "answer" your question (no SMART data or things low level like that :'( ) BUT shared storage ^^'

"Unstable" NFS mount point

First of all, this is the first time I'm posting a question on StackOverflow, so please don't kill me if I've done anything wrong.
There goes my issue:
We have few dedicated servers with a well known French provider. With one of those servers ewe have recently acquired a 5.000GB backup space which can be mounted via NFS, and that's what we've done.
The issue comes when backing up big files. Every night we back up several VM's running on that host and we know from fact that the backups are not being properly done (the file size differs a lot from one day to the other plus we've checked the content of the backup and there's stuff missing).
So, it seems like the mount point is not stable and the backups are not being properly done. Seems like there are micro network cuts and therefore the hypervisor finishes the current backup and starts with the next one.
This is how it's mounted right now:
xxx.xxx.xxx:/export/ftpbackup/xxx.ip-11-22-33.eu/ /NFS nfs auto,timeo=5,retrans=5,actimeo=10,retry=5,bg,soft,intr,nolock,rw,_netdev,mountproto=tcp 0 0
Any advise? Is there any parameter you would change?
We need to be sure that the NFS mount point is correctly working in order to have proper backups.
Thank you so much
By specifying "soft" as an option, you're saying that it's OK for the mount to be unreliable -- for the kernel to return an I/O error instead of running the I/O to completion when things are taking too long. Using a hard mount, without the "soft" option instructs the kernel to avoid returning I/O errors for timeouts.
This will fix your corrupted backups, but... your backup process will hang hard until I/O's complete. An alternative is to use much longer timeout values.
You're using TCP for the mount protocol, but not for NFS itself. If your server supports it, consider adding "tcp" to the options line.

Resources