LVM snapshot of mounted filesystem - linux

I'd like to programmatically make a snapshot of a live filesystem in Linux, preferably using LVM. I'd like not to unmount it because I've got lots of files opened (my most common scenario is that I've got a busy desktop with lots of programs).
I understand that because of kernel buffers and general filesystem activity, data on disk might be in some more-or-less undefined state.
Is there any way to "atomically" unmount an FS, make an LVM snapshot and mount it back? It will be ok if the OS will block all activity for few seconds to do this task. Or maybe some kind of atomic "sync+snapshot"? Kernel call?
I don't know if it is even possible...

You shouldn't have to do anything for most Linux filesystems. It should just work without any effort at all on your part. The snapshot command itself hunts down mounted filesystems using the volume being snapshotted and calls a special hook that checkpoints them in a consistent, mountable state and does the snapshot atomically.
Older versions of LVM came with a set of VFS lock patches that would patch various filesystems so that they could be checkpointed for a snapshot. But with new kernels that should already be built into most Linux filesystems.
This intro on snapshots claims as much.
And a little more research reveals that for kernels in the 2.6 series the ext series of filesystems should all support this. ReiserFS probably also. And if I know the btrfs people, that one probably does as well.

I know that ext3 and ext4 in RedHat Enterprise, Fedora and CentOS automatically checkpoint when a LVM snapshot is created. That means there is never any problem mounting the snapshot because it is always clean.
I believe XFS has the same support. I am not sure about other filesystems.

It depends on the filesystem you are using. With XFS you can use xfs_freeze -f to sync and freeze the FS, and xfs_freeze -u to activate it again, so you can create your snapshot from the frozen volume, which should be a save state.

Is there any way to "atomically" unmount an FS, make an LVM snapshot and mount it back?
It is possible to snapshot a mounted filesystem, even when the filesystem is not on an LVM volume. If the filesystem is on LVM, or it has built-in snapshot facilities (e.g. btrfs or ZFS), then use those instead.
The below instructions are fairly low-level, but they can be useful if you want the ability to snapshot a filesystem that is not on an LVM volume, and can't move it to a new LVM volume. Still, they're not for the faint-hearted: if you make a mistake, you may corrupt your filesystem. Make sure to consult the official documentation and dmsetup man page, triple-check the commands you're running, and have backups!
The Linux kernel has an awesome facility called the Device Mapper, which can do nice things such as create block devices that are "views" of other block devices, and of course snapshots. It is also what LVM uses under the hood to do the heavy lifting.
In the below examples I'll assume you want to snapshot /home, which is an ext4 filesystem located on /dev/sda2.
First, find the name of the device mapper device that the partition is mounted on:
# mount | grep home
/dev/mapper/home on /home type ext4 (rw,relatime,data=ordered)
Here, the device mapper device name is home. If the path to the block device does not start with /dev/mapper/, then you will need to create a device mapper device, and remount the filesystem to use that device instead of the HDD partition. You'll only need to do this once.
# dmsetup create home --table "0 $(blockdev --getsz /dev/sda2) linear /dev/sda2 0"
# umount /home
# mount -t ext4 /dev/mapper/home /home
Next, get the block device's device mapper table:
# dmsetup table home
home: 0 3864024960 linear 9:2 0
Your numbers will probably be different. The device target should be linear; if yours isn't, you may need to take special considerations. If the last number (start offset) is not 0, you will need to create an intermediate block device (with the same table as the current one) and use that as the base instead of /dev/sda2.
In the above example, home is using a single-entry table with the linear target. You will need to replace this table with a new one, which uses the snapshot target.
Device mapper provides three targets for snapshotting:
The snapshot target, which saves writes to the specified COW device. (Note that even though it's called a snapshot, the terminology is misleading, as the snapshot will be writable, but the underlying device will remain unchanged.)
The snapshot-origin target, which sends writes to the underlying device, but also sends the old data that the writes overwrote to the specified COW device.
Typically, you would make home a snapshot-origin target, then create some snapshot targets on top of it. This is what LVM does. However, a simpler method would be to simply create a snapshot target directly, which is what I'll show below.
Regardless of the method you choose, you must not write to the underlying device (/dev/sda2), or the snapshots will see a corrupted view of the filesystem. So, as a precaution, you should mark the underlying block device as read-only:
# blockdev --setro /dev/sda2
This won't affect device-mapper devices backed by it, so if you've already re-mounted /home on /dev/mapper/home, it should not have a noticeable effect.
Next, you will need to prepare the COW device, which will store changes since the snapshot was made. This has to be a block device, but can be backed by a sparse file. If you want to use a sparse file of e.g. 32GB:
# dd if=/dev/zero bs=1M count=0 seek=32768 of=/home_cow
# losetup --find --show /home_cow
/dev/loop0
Obviously, the sparse file shouldn't be on the filesystem you're snapshotting :)
Now you can reload the device's table and turn it into a snapshot device:
# dmsetup suspend home && \
dmsetup reload home --table \
"0 $(blockdev --getsz /dev/sda2) snapshot /dev/sda2 /dev/loop0 PO 8" && \
dmsetup resume home
If that succeeds, new writes to /home should now be recorded in the /home_cow file, instead of being written to /dev/sda2. Make sure to monitor the size of the COW file, as well as the free space on the filesystem it's on, to avoid running out of COW space.
Once you no longer need the snapshot, you can merge it (to permanently commit the changes in the COW file to the underlying device), or discard it.
To merge it:
replace the table with a snapshot-merge target instead of a snapshot target:
# dmsetup suspend home && \
dmsetup reload home --table \
"0 $(blockdev --getsz /dev/sda2) snapshot-merge /dev/sda2 /dev/loop0 P 8" && \
dmsetup resume home
Next, monitor the status of the merge until all non-metadata blocks are merged:
# watch dmsetup status home
...
0 3864024960 snapshot-merge 281688/2097152 1104
Note the 3 numbers at the end (X/Y Z). The merge is complete when X = Z.
Next, replace the table with a linear target again:
# dmsetup suspend home && \
dmsetup reload home --table \
"0 $(blockdev --getsz /dev/sda2) linear /dev/sda2 0" && \
dmsetup resume home
Now you can dismantle the loop device:
# losetup -d /dev/loop0
Finally, you can delete the COW file.
# rm /home_cow
To discard the snapshot, unmount /home, follow steps 3-5 above, and remount /home. Although Device Mapper will allow you to do this without unmounting /home, it doesn't make sense (since the running programs' state in memory won't correspond to the filesystem state any more), and it will likely corrupt your filesystem.

I'm not sure if this will do the trick for you, but you can remount a file system as read-only. mount -o remount,ro /lvm (or something similar) will do the trick. After you are done your snapshot, you can remount read-write using mount -o remount,rw /lvm.

FS corruption is "highly unlikely", as long as you never work in any kind of professional environment. otherwise you'll meet reality, and you might try blaming "bit rot" or "hardware" or whatever, but it all comes down to having been irresponsible. freeze/thaw (as mentioned a few times, and only if called properly) is sufficient outside of database environments. for databases, you still won't have a transaction-complete backup and if you think a backup that rolls back some transaction is fine when restored: see starting sentence.
depending on the activity you might just added another 5-10 mins of downtime if ever you need that backup.
Most of us can easily afford that, but it can not be general advice.
Be honest about downsides, guys.

Related

mount a sd-card image - change files on a partition and write back

i want to mount an IMG file (which >1 partitions on it), change some files at one (ext4) partition and write the result back to this img.
One way would be to write the img to a sd card, change there and make an image again. But i dont have a SDcard writer(!) and i think this way is abit complex anyway. Anyway, I tried this once a different computer, it works this way but its very complex and time consuming. Trying with a "loopback device" didnt worked for me.
Can someone tell me how to do this on an Ubuntu (for example with a loopback device?).
You have to create loopback device with:
losetup -P /dev/loop0 file
then it will show all partitions on that file in the form of:
/dev/loop0
/dev/loop0p1
/dev/loop0p2
here is a quote from the man losetup
-P, --partscan
Force the kernel to scan the partition table on a newly created loop device.

Deployment over GPRS to embedded devices

I've got quite a head scratcher here. We have multiple Raspberry Pis on the field hundreds of kilometers apart. We need to be able to safe(ish)ly upgrade them remotely, as the price for local access can cost up to few hundred euros.
The raspis run rasbian, / is on SD-card mounted in RO to prevent corruption when power is cut (usually once/day). The SD cards are cloned from same base image, but contain manually installed packages and modified files that might differ between devices. The raspis all have a USB flash as a more corruption resistant RW-drive and a script to format it on boot in case the drive is corrupted. They call home via GPRS connection with varying reliability.
The requirements for the system are as follows:
Easy versioning of config files, scripts and binaries, at leasts /etc, /root and home preferably Git
Efficient up-/downgrade from any verion to other over GPRS -> transfer file deltas only
Possibility to automatically roll back recently applied patch, if connection is no longer working
Root file system cannot be in RW mode while downloading changes, the changes need to be stored locally before applying to /
The simple approach might be keeping a complete copy of the file system in a remote git repository, generate a diff file between commits, upload the patch to the field and apply it. However, at the the moment the files on different raspis are not identical. This means, at least when installing the system, the files would have to be synchronized through something similar to rsync -a.
The procedure should be along the lines of "save diff between / and ssh folder to a file on the USB stick, mount / RW, apply diff from file, mount / RO". Rsync does the diff-getting and applying simultaneously, so my first question becomes:
1 Does there exist something like rsync that can save the file deltas from local and remote and apply them later?
Also, I have never made a system like this and the drawt is "closest to legit I can come up with". There's a lot of moving parts here and I'm terrified that something I didn't think of beforehand will cause things to go horribly wrong. Rest of my questions are:
Am I way off base here and is there actually a smarter/safe(r) way to do this?
If not, what kind of best practices should I follow and what kind of things to be extremely careful with (to not brick the devices)?
How do I handle things like installing new programs? Bypass packet manager, install in /opt?
How to manage permissions/owners (root+1 user for application logic)? Just run everything as root and hope for the best?
Yes, this is a very broad question. This will not be a direct answer to your questions, but rather provide guidelines for your research.
One means to prevent file system corruption is use an overlay file system (e.g., AUFS, UnionFS) where the root file system is mounted read-only and a tmpfs (RAM based) or flash based read-write is mount "over" the read-only root. This requires your own init scripts including use of the pivot_root command. Since nothing critical is mounted RW, the system robustly handles power outages. The gist is before the pivot_root, the FS looks like
/ read-only root (typically flash)
/rw tmpfs overlay
/aufs AUFS union overlay of /rw over /
after the pivot_root
/ Union overlay (was /aufs
/flash read only root (was /)
Updates to the /flash file system are done by remounting it read-write, doing the update, and remounting read-only. For example,
mount -oremount,rw <flash-device> /flash
cp -p new-some-script /flash/etc/some-script
mount -oremount,ro <flash-device> /flash
You may or may not immediately see the change reflected in /etc depending upon what is in the tmpfs overlay.
You may find yourself making heavy use of the chroot command especially if you decide to use a package manager. A quick sample
mount -t proc none /flash/proc
mount -t sysfs none /flash/sys
mount -o bind /dev /flash/dev
mount -o bind /dev/pts /flash/dev/pts
mount -o bind /rw /flash/rw #
mount -oremount,rw <flash-device> /flash
chroot /flash
# do commands here to install packages, etc
exit # chroot environment
mount -oremount,ro <flash-device> /flash
Learn to use the patch command. There are binary patch commands How do I create binary patches?.
For super recovery when all goes wrong, you need hardware support with watchdog timers and the ability to do fail-safe boot from alternate (secondary) root file system.
Expect to spend significant amount of time and money if you want a bullet-proof product. There are no shortcuts.

Azure Linux remove and add another disk

I had a need to increase the disk space, for my Linux azure, we attached a new empty disk and followed the steps here http://azure.microsoft.com/en-in/documentation/articles/virtual-machines-linux-how-to-attach-disk , the only difference is that the newly added deviceid was not found in /var/log/messages.
Now I need to add another disk, and we attached another disk, the problem is that for doing the first step of fdisk
sudo fdisk /dev/sdc
I have no idea where the recent disk is attached, total clueless, also what are the steps if i want to remove a disk altogether, I know umount will unmount a disk, but that doesn't neccessarily takes off the device from the instance, i want a total detachment.
Finally figured it out - the additional SCSI disks added are started from /dev/sda, /dev/sdb , /dev/sdc, /dev/sdd, /dev/sde.... The reason why the MS tutorial talks about /dev/sdc is because its the 3rd disk in the system, 1st your root volume, second your ephemeral temp storage, and this is your 3rd one, now as the /dev/sdc is not good enough for you and you want to remove it
Remove entry from /etc/fstab file
umount /datadrive
you can now remove the attached disk from your Azure console.
Lets say that sdc is still there and you want to add another, just attach from the azure console, and do the same steps as given in http://azure.microsoft.com/en-in/documentation/articles/virtual-machines-linux-how-to-attach-disk/#initializeinlinux the only diff is that the another disk would be at /dev/sdd places where you make a partitions at /dev/sdc1 will become /dev/sdd1 that's pretty much about it.
References
http://www.yolinux.com/TUTORIALS/LinuxTutorialAdditionalHardDrive.html
http://azure.microsoft.com/en-in/documentation/articles/virtual-machines-linux-how-to-attach-disk/#initializeinlinux
When adding more than one disk, it will also become important to start using uuid in fstab. Search for uuid in this article for more.

remount disk with running process

I have an embedded application that I am working on. To protect the data on this image its partitions are mounted RO (this helps prevent flash errors when the power is lost unexpectedly since I cannot guarantee clean shutdowns, you could pull the plug)
An application I am working that needs to be protected resides on this RO partition, however this program also needs to be able to change configuration files on the same RO file system. I have code that allows me to remount this partition RW as needed (eg for firmware updates), but this requires all the processes to be stopped that are running from the read only partition (ie killall my_application). Hence it is not possible for my application to remount the partition it needs to modify without first killing itself (I am not sure which one is the chicken and which one is the egg, but you get the gist).
Is there a way to start my application in such a way that the entire binary is kept in RAM and there is no link back to the partition from which it was run so that the unmount reports the partition as busy?
Or alternatively is there a way to safely remount this RO partition without first killing the process running on it?
You can copy it to a tmpfs filesystem and execute it from there. A tmpfs filesystem stores all data in RAM and sometimes on your SWAP partition.
Passing the -oremount flag to mount should also work.

how to edit initramfs to add a new partition after boot in CentOs

I want to add a new ext3 partition by editing existed scripts or adding new scripts in initramfs in boot folder of installed CentOs. and then by replacing the new initramfs image to other installed centos and just by a reboot, a new partition ( and its file system) appear in new CentOs.
my problem is that , i don't know which script in initramfs i should change, or which new shell script i should write there? with which command and how? for example should i use fdisk command? i try but not success..
any help will be appreciated.
I'm gonna go on a guess here.
If I understand what you want to do correctly, what you want to do is to make another ext3 partition visible in CentOS.
If so, you want to make sure that the partition exists. Maybe you have a disk called /dev/sda it might have 2 partitions:
sda1 ext3 mounted at /boot
and sda2 ext3 mounted at /
To view this use sudo blkid, fdisk-l or similar. These partitions are loaded at boot from the list found in /etc/fstab.
Say you still have space left on the disk. Use fdisk/gparted to create a new ext3 partition sda3. Add a line for that disk in /etc/fstab.
Now it should be accessible after a reboot or after sudo mount -a.
I might have completely misunderstood your question.
Edit #1
I think I understand what you want to do now, and you probably want to edit the /etc/fstab within the initramfs. I have never tried doing this and I don't know if it would work, but it should. In any event, unless you really need to mount the partitions in initramfs, use the above to mount it in fstab.
Extracting the contents of and repacking an edited initramfs is rather complex, but here are some links explaining how to extract and repack, they are for gentoo but there should be no difference:
http://forums.gentoo.org/viewtopic-t-429263-highlight-initramfs.html
http://forums.gentoo.org/viewtopic-t-383198-highlight-cpio.html
http://forums.gentoo.org/viewtopic-t-388539-highlight-.html
If you read those threads you should be able to use a script or C program to extract the contents of the initramfs, change the contents of /etc/fstab within initrams, and extract it back together. If you need to do this for different computers they will need to need the same contents in initramfs, or you need to do this for each computer setup.

Resources