Backup and Decommission Instance stores in AWS - linux

I have inherited some Instance Store-backed Linux AMIs that need to be archived and terminated. We run a Windows & VMWare environment, so I have little experience with Linux & AWS.
I have tried using the Windows EC2 command line tools to export to a vhdk disk image, but receive an error stating that the instance must be EBS-backed to do so.
What's the easiest way to get a complete backup? Keep in mind that we have no plans to actually use the instances again, this is just for archival purposes.

Assuming you have running instance-store instances (and not AMIs, which would mean you already have a backup), you can still create an AMI. Its not simple, and may not be worth the effort if you never plan to actually re-launch the instances, but the following page gives you a couple options:
(1) create an instance-store backed AMI from a running instance
(2) subsequently create an EBS-backed AMI from the instance-store AMI
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/creating-an-ami-instance-store.html
You can also do a sync of the filesystem directly to S3 or attach an EBS volume and copy the files there.

In the end, I used the dd command in combination with ssh to copy images of each relevant drive to offline storage. Here is a summary of the process:
Ssh into the remote machine and run df -aTh to figure out which drives to backup
Log out of ssh
For each desired disk, run the following ssh command to create and download the disk image (changing the if path to the desired disk): ssh root#[ipaddress] "dd if=/dev/sda1 | gzip -1 -" | dd of=outputfile.gz
Wait for the image to fully download. You may want to examine your network usage and make sure that an appropriate amount of incoming traffic occurs.
Double check the disk images for completeness and mount-ability
Terminate the Instance

Related

New Azure data disk does not appear when running lsblk from virtual machine's cli

I am attempting to create and attach a new data disk to an Azure linux VM per these instructions: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/attach-disk-portal
Azure Portal reported that the disk was created and attached to my VM successfully, and I can see it listed as a data disk under "Disks" for that VM in Azure Portal. However, when I run lsblk from the VM's command line, as instructed under "Find the disk" in the documentation, the new disk does not appear in the listing. Therefore I can't proceed in setting up the disk.
How can I get the disk to show up in lsblk, or at least begin to diagnose why it didn't? The VM is running Ubuntu 20.04, in case that matters.
For what it's worth, immediately before this, I executed the same process to add a different data disk to a different VM and it went very smoothly, so there seems to be some particular problem with this VM.
If the VM was running when you added the disk, you need to rescan for the new disk. Rebooting will work, but you can rescan without rebooting.
If the sg3-utils package is installed, you can use rescan-scsi-bus.sh to rescan. If not you can use the following:
for h in $(ls /sys/class/scsi_host); do
echo '- - -' > /sys/class/scsi_host/$h/scan
done
For more information Refer this Document : Virtual Hard Disk is added, but not showing using lsblk -d command
I've been in touch with Azure support and they've diagnosed (through some back-end method) that the storage module (hv_storvsc) on the virtual machine is down. They have stated that the only solution is to reboot the virtual machine.

Unable to connect to SSH on Google Cloud VM Instance

I have run into a problem today where I am unable to connect via SSH to my Google Cloud VM instance running debian-10-buster. SSH has been working until today when it suddenly lost connection while docker was running. I've tried rebooting the VM instance and resetting, but the problem still persists. This is the serial console output on GCE, but I am not sure what to look for in that, so any help would be highly appreciated.
Another weird thing is that earlier today before the problem started, my disk usage was fine and then suddenly I was getting a bunch of errors that the disk was out of space even after I tried clearing up a bunch of space. df showed that the disk was 100% full to the point where I couldn't even install ncdu to see what was taking the space. So then I tried rebooting the instance to see if that would help and that's when the SSH problem started. Now I am unable to connect to SSH at all (even through the online GCE interface), so I am not sure what next steps to take.
Your system has run out of disk space for the boot (root) file system.
The error message is:
Root filesystem has insufficient free space
Shutdown the VM, resize the disk larger in the Google Cloud Web GUI and then restart the VM.
Provided that there are no uncorrectable file system errors, your system will startup, resize the partition and file system, and be fine.
If you have modified the boot disk (partition restructuring, added additional partitions, etc) then you will need to repair and resize manually.
I wrote an article on resizing the Debian root file system. My article goes into more detail than you need, but I do explain the low level details of what happens.
Google Cloud – Debian 9 – Resize Root File System

Using a CLI to recover a disk image saved with clonezilla

I have setup a live CentOS 7 that is booted via PXE if the client is connected to a specified network port.
Once the Linux is booted up, I have scripted a small logic that compares if there is a newer image version available on a central host than it is already deployed on the client. This is done with comparing the contents of a versions file. If there is a newer version, the image should be deployed on the client. Else only parts of the Image (qcow2-Files) should be replaced to safe time.
Since the Image is up to 1TB I do not want to apply the image at any case. It would also take too long.
On the client, there is a volume group that consists of lvms in different sizes and also "normal" partitions (like /dev/sda1).
Is there a way to deploy a whole partition structure using a cli?
I already figured this to recover one disk out of the whole system.
But this would make a lot of effort to script around that to get the destination structure I want.
I found out that there is no way to "run" clonezilla as a cli (which I actually cannot understand why this does not exist). I was trying to use parts of the clonezilla live iso with the command "ocs-sr", but I stuck somewhere and it always gives me a "unknown commands"-Error.
For my case the best would be a thing like:
. clonezilla --restore /path/to/images/folder --dest /dev
Which applies all Images in the imagefolder that is generated by clonezilla to the client.
Any help highly appreciated.
I've found that using Clonezilla's preparation script does the thing for me. You can use ocs_prerun parameter that will run a script before clonezilla will do anything.
If you are stuck into a company hardened image, you can try this to setup a (ubuntu) Linux with the needed programs on it.

Move docker data volume containers between CoreOS hosts

For some scenarios a clustered file system is just too much. This is, if I got it right, the use case for the data volume container pattern. But even CoreOS needs updates from time to time. If I'd still like to minimise the down time of applications, I'd have to move the data volume container with the app container to an other host, while the old host is being updated.
Are there best practices existing? A solution mentioned more often is the "backup" of a container with docker export on the old host and docker import on the new host. But this would include scp-ing of tar-files to an other host. Can this be managed with fleet?
#brejoc, I wouldn't call this a solution, but it may help:
Alternative
1: Use another OS, which does have clustering, or at least - doesn't prevent it. I am now experimenting with CentOS.
2: I've created a couple of tools that help in some use cases. First tool, retrieves data from S3 (usually artifacts), and is uni-directional. Second tool, which I call 'backup volume container', has a lot of potential in it, but requires some feedback. It provides a 2-way backup/restore for data, from/to many persistent data stores including S3 (but also Dropbox, which is cool). As it is implemented now, when you run it for the first time, it would restore to the container. From that point on, it would monitor the relevant folder in the container for changes, and upon changes (and after a quiet period), it would back up to the persistent store.
Backup volume container: https://registry.hub.docker.com/u/yaronr/backup-volume-container/
File sync from S3: https://registry.hub.docker.com/u/yaronr/awscli/
(docker run yaronr/awscli aws s3 etc etc - read aws docs)

Should I put database and CMS files on a separate EBS or S3?

Is is possible, or even advisable to use and EBS instance that remains at Instance Termination, to store database/website files, and reattach to a new Amazon instance in case of failure? OR should I backup a volume-bundle to S3? Also, I need an application to accelerate terminal window functions intelligently. Can you tell I'm a linux NOob?
We do this with our Nexus installation - the data is stored on a separate EBS instance that's regularly snapshotted but the root disk isn't (since we can use Puppet to create a working Nexus instance using the latest base AMI, Java, Tomcat and Nexus versions). The one drawback of this approach (vs your other approach of backing up to S3) is that you can't retrieve it outside of AWS if needed - if that is an important use case I'd recommend either uploading a volume bundle or a .tar.gz backup to S3.
However, in your case if you have a single EBS-backed EC2 instance which is your CMS server you could run it with a large root volume and keep that regularly backed up (either using EBS Snapshots or backing up a .tar.gz to S3) - if you're not particularly familiar with Linux that'll likely be the easiest way to make sure all your data is backed up (and if you need to extract the data only you can always do this by attaching that volume (or an instantiation of a snapshot of it) to another machine - you'd also have access to all the config files which may be of use...
Bear in mind that if you only want to run your server some of the time you can always Stop the instance rather than Terminate it - the EBS Instances will remain. Once you take a snapshot your data is safe - if part of an EBS Instance fails but it hasn't been modified since the last snapshot then AWS will transparently restore it with the EBS Snapshot data.

Resources