I've got quite a head scratcher here. We have multiple Raspberry Pis on the field hundreds of kilometers apart. We need to be able to safe(ish)ly upgrade them remotely, as the price for local access can cost up to few hundred euros.
The raspis run rasbian, / is on SD-card mounted in RO to prevent corruption when power is cut (usually once/day). The SD cards are cloned from same base image, but contain manually installed packages and modified files that might differ between devices. The raspis all have a USB flash as a more corruption resistant RW-drive and a script to format it on boot in case the drive is corrupted. They call home via GPRS connection with varying reliability.
The requirements for the system are as follows:
Easy versioning of config files, scripts and binaries, at leasts /etc, /root and home preferably Git
Efficient up-/downgrade from any verion to other over GPRS -> transfer file deltas only
Possibility to automatically roll back recently applied patch, if connection is no longer working
Root file system cannot be in RW mode while downloading changes, the changes need to be stored locally before applying to /
The simple approach might be keeping a complete copy of the file system in a remote git repository, generate a diff file between commits, upload the patch to the field and apply it. However, at the the moment the files on different raspis are not identical. This means, at least when installing the system, the files would have to be synchronized through something similar to rsync -a.
The procedure should be along the lines of "save diff between / and ssh folder to a file on the USB stick, mount / RW, apply diff from file, mount / RO". Rsync does the diff-getting and applying simultaneously, so my first question becomes:
1 Does there exist something like rsync that can save the file deltas from local and remote and apply them later?
Also, I have never made a system like this and the drawt is "closest to legit I can come up with". There's a lot of moving parts here and I'm terrified that something I didn't think of beforehand will cause things to go horribly wrong. Rest of my questions are:
Am I way off base here and is there actually a smarter/safe(r) way to do this?
If not, what kind of best practices should I follow and what kind of things to be extremely careful with (to not brick the devices)?
How do I handle things like installing new programs? Bypass packet manager, install in /opt?
How to manage permissions/owners (root+1 user for application logic)? Just run everything as root and hope for the best?
Yes, this is a very broad question. This will not be a direct answer to your questions, but rather provide guidelines for your research.
One means to prevent file system corruption is use an overlay file system (e.g., AUFS, UnionFS) where the root file system is mounted read-only and a tmpfs (RAM based) or flash based read-write is mount "over" the read-only root. This requires your own init scripts including use of the pivot_root command. Since nothing critical is mounted RW, the system robustly handles power outages. The gist is before the pivot_root, the FS looks like
/ read-only root (typically flash)
/rw tmpfs overlay
/aufs AUFS union overlay of /rw over /
after the pivot_root
/ Union overlay (was /aufs
/flash read only root (was /)
Updates to the /flash file system are done by remounting it read-write, doing the update, and remounting read-only. For example,
mount -oremount,rw <flash-device> /flash
cp -p new-some-script /flash/etc/some-script
mount -oremount,ro <flash-device> /flash
You may or may not immediately see the change reflected in /etc depending upon what is in the tmpfs overlay.
You may find yourself making heavy use of the chroot command especially if you decide to use a package manager. A quick sample
mount -t proc none /flash/proc
mount -t sysfs none /flash/sys
mount -o bind /dev /flash/dev
mount -o bind /dev/pts /flash/dev/pts
mount -o bind /rw /flash/rw #
mount -oremount,rw <flash-device> /flash
chroot /flash
# do commands here to install packages, etc
exit # chroot environment
mount -oremount,ro <flash-device> /flash
Learn to use the patch command. There are binary patch commands How do I create binary patches?.
For super recovery when all goes wrong, you need hardware support with watchdog timers and the ability to do fail-safe boot from alternate (secondary) root file system.
Expect to spend significant amount of time and money if you want a bullet-proof product. There are no shortcuts.
Related
I am working on a linux computer which is locked down and used in kiosk mode to run only one application. This computer cannot be updated or modified by the user. When the computer crashes or freezes the OS rebuilds or modifies the ld-2.5.so file. This file needs to be locked down without allowing even the slightest change to it (there is an application resident which requires ld-2.5.so to remain unchanged and that is out of my control). Below are the methods I can think of to protect ld-2.5.so but wanted to run it by the experts to see if I am missing anything.
I modified the fstab to mount the EXT3 filesystem as EXT2 to disable journaling. Also set the DUMP and FSCK values to "0" to disable those processes.
Performed a "chattr +i ld-2.5.so" on the file but there are still system processes that can overwrite this protection.
I could attempt to trap the name of the processes which are hitting ld-2.5.so and prevent this.
Any ideas or hints would be greatly appreciated.
-Matt (CentOS 5.0.6)
chattr +i should be fine in most circumstances.
The ld-*.so files are under /usr/lib/ and /usr/lib64/. If /usr/ is a separate partition, you also might want to mount that partition read only on a kiosk system.
Do you have, by any chance, some automated updating/patching of said PC configured? ld-*.so is part of glibc and basically should only change if the glibc package is updated.
Does Linux need a writeable file system to function correctly? I'm just running a very simple init programme. Presently I'm not mounting any partitions. The Kernel has mounted the root partition as read-only. Is Linux designed to be able run with just a read-only file system as long as I stick to mallocs, readlines and text to standard out (puts), or does Linux require a writeable file system in-order even to perform standard text input and output?
I ask because I seem to be getting kernel panics and complaints about the stack. I'm not trying to run a useful system at the moment. I already have a useful system on another partition. I'm trying to keep it as simple as possible so as I can fully understand things before adding in an extra layer of complexity.
I'm running a fairly standard x86-64 desktop.
No, writable file system is not required. It is theoretically possible to run GNU/Linux with the only read-only file system.
In practice you probably want to mount /proc, /sys, /dev, possibly /dev/pts to everything work properly. Note that even some bash commands requires writable /tmp. Some other programs - writable /var.
You always can mount /tmp and /var as ramdisk.
Yes and No. No it doesn't need to be writeable if it did almost nothing useful.
Yes, you're running a desktop so it's needed to be writeable.
Many processes actually need a writeable filesystem as many system calls can create files. e.g. Unix Domain Sockets can create files.
Also many applications write into /var, and /tmp
The way to get around this is to mount the filesystem read/only and use a filesystem overlay to overlay an in memory filesystem. That way, the path will be writable but they go to ram and any changes are thrown away on reboot.
See: overlayroot
No it's not required. For example as most distributions have a live version of Linux for booting up for a cd or usb disk with actually using and back end hdd.
Also on normal installations, the root partitions are changed to read-only when there are corruptions on the disk. This way the system still comes up as read-only partition.
You need to capture the vmcore and the stack trace of the panic form the dmesg output to analyse further.
I'm developing a linux-based appliance using an alix 2d13.
I've developed a script that takes care of creating an image file, creating the partitions, installing the boot loader (syslinux), the kernel and the initrd and, that takes care to put root filesystem files into the right partition.
Configuration files are on tmpfs filesystem and gets created on system startup by a software that reads an XML file that resides on an own partition.
I'm looking a way to update the filesystem and i've considered two solutions:
the firmware update is a compressed file that could contain kernel, initrd and/or the rootfs partition, in this way, on reboot, initrd will takes care to dd the rootfs image to the right partition;
the firmware update is a compressed file that could contain two tar archives, one for the boot and one for the root filesystem.
Every solution has its own advantages:
- a filesystem image will let me to delete any unused files but needs a lot of time and it will kill the compact flash memory fastly;
- an archive is smaller, needs less time for update, but i'll have the caos on the root filesystem in short time.
An alternative solution could be to put a file list and to put a pre/post update script into the tar archive, so any file that doesn't reside into the file list will be deleted.
What do you think?
I used the following approach. It was somewhat based on the paper "Building Murphy-compatible embedded Linux systems," available here. I used the versions.conf stuff described in that paper, not the cfgsh stuff.
Use a boot kernel whose job is to loop-back mount the "main" root file system. If you need a newer kernel, then kexec into that newer kernel right after you loop-back mount it. I chose to put the boot kernel's complete init in initramfs, along with busybox and kexec (both statically linked), and my init was a simple shell script that I wrote.
One or more "main OS" root file systems exist on an "OS image" file system as disk image files. The boot kernel chooses one of these based on a versions.conf file. I only maintain two main OS image files, the current and fall-back file. If the current one fails (more on failure detection later), then the boot kernel boots the fall-back. If both fail or there is no fall-back, the boot kernel provides a shell.
System config is on a separate partition. This normally isn't upgraded, but there's no reason it couldn't be.
There are four total partitions: boot, OS image, config, and data. The data partition is for user application stuff that is intended for frequent writing. boot is never mounted read/write. OS image is only (re-)mounted read/write during upgrades. config is only mounted read/write when config stuff needs to change (hopefully never). data is always mounted read/write.
The disk image files each contain a full Linux system, including a kernel, init scripts, user programs (e.g. busybox, product applications), and a default config that is copied to the config partition on the first boot. The files are whatever size is necessary to fit everything in them. As long I allowed enough room for growth so that the OS image partition is always big enough to fit three main OS image files (during an upgrade, I don't delete the old fall-back until the new one is extracted), I can allow for the main OS image to grow as needed. These image files are always (loop-back) mounted read-only. Using these files also takes out the pain of dealing with failures of upgrading individual files within a rootfs.
Upgrades are done by transferring a self-extracting tarball to a tmpfs. The beginning of this script remounts the OS image read/write, then extracts the new main OS image to the OS image file system, and then updates the versions.conf file (using the rename method described in the "murphy" paper). After this is done, I touch a stamp file indicating an upgrade has happened, then reboot.
The boot kernel looks for this stamp file. If it finds it, it moves it to another stamp file, then boots the new main OS image file. The main OS image file is expected to remove the stamp file when it starts successfully. If it doesn't, the watchdog will trigger a reboot, and then the boot kernel will see this and detect a failure.
You will note there are a few possible points of failure during an upgrade: syncing the versions.conf during the upgrade, and touching/removing the stamp files (three instances). I couldn't find a way to reduce these further and achieve everything I wanted. If anyone has a better suggestion, I'd love to hear it. File system errors or power failures while writing the OS image could also occur, but I'm hoping the ext3 file system will provide some chance of surviving in that case.
You can have a seperate partition for update(Say Side1/Side2).
The existing kernel,rootfs is in Side1, then put the update in Side2 and switch.
By this you can reduce wear leveling and increase the life but the device gets costlier.
You can quick format the partitions before extracting the tar files. Or go with the image solution but use the smallest possible image and after dd do a filesystem resize (although that is not necessary for readonly storage)
I can know inode of device/socket with stat, so seems like I can somehow "copy" this file for backup. Of course the solution is "dd", but I have no idea what can I do if the device is infinity (like the random one). And can I just copy the inode somehow?
These are referred to as "special files" or "special nodes". Copying their contents doesn't make sense, as the contents are generated in one way or another programatically by the kernel as needed.
Programs like "tar" know how to copy the contents of the inode, which will refer to the portion of the kernel that support each of these different nodes. See the documentation of the "mknod" command for some more details.
And if you need one-liner to copy device nodes with tar, here it is:
cd /dev && tar -cpf- sda* | tar -xf- -C /some/destination/path/
Found out the major and minor number of the device file you need to copy then use mknod to create the device file with the same major and minor number. Major number is used for a program to access to kernel device switch table and calling the proper kernel function (usually device drive). Minor number is used as a parameter for calling those functions (like different density, disk, .... etc).
24 July 2022
There is one legitimate use case for copying (archiving) a socket.
I have a program that gathers and summarizes attribute data in a file system tree. In order to regression test, I created a directory that contains one example of every type of file the program might encounter. I run my program on this directory to test it whenever I alter the code.
It is necessary to backup this directory along with other more valuable data, and it is necessary to restore it, should the storage device fail.
tar is the program of choice, and of course tar can not archive a socket. Doing so in most situations is senseless - any program that uses the socket will have to delete it and recreate it before use.
In the case of the test directory, there is one named socket, for it is possible that my program will encounter such things and it needs to correctly gather attributes for a complete summary.
As noted by others, that socket is not useful for anything directly. It does, however, occupy a little storage space, much as an empty file occupies storage space. That is why you can see it in the directory listing.
You can copy it successfully with the command:
cp -ar --parents <path> <backup_device_directory>
and restore it with:
cp -ar --parents <backup_device_directory>/<path> <directory>
The socket is not useful for anything except probing its attributes with a program during a regression test.
Archiving it saves the trouble of having to remember to recreate it after a restoration. The extra nuisance of archiving the sockets is easily codified in a script and forgotten. That is what we all want - easy to use solutions whose implementation you can ignore after you have solved the problem.
You can copy from a working system as below to some shared location between the machines and copy from the shared location to the other system.
Machine A
cp -rf /dev/SRC shared_directory
Machine B
cp -rf shared_directory /dev/
I'd like to programmatically make a snapshot of a live filesystem in Linux, preferably using LVM. I'd like not to unmount it because I've got lots of files opened (my most common scenario is that I've got a busy desktop with lots of programs).
I understand that because of kernel buffers and general filesystem activity, data on disk might be in some more-or-less undefined state.
Is there any way to "atomically" unmount an FS, make an LVM snapshot and mount it back? It will be ok if the OS will block all activity for few seconds to do this task. Or maybe some kind of atomic "sync+snapshot"? Kernel call?
I don't know if it is even possible...
You shouldn't have to do anything for most Linux filesystems. It should just work without any effort at all on your part. The snapshot command itself hunts down mounted filesystems using the volume being snapshotted and calls a special hook that checkpoints them in a consistent, mountable state and does the snapshot atomically.
Older versions of LVM came with a set of VFS lock patches that would patch various filesystems so that they could be checkpointed for a snapshot. But with new kernels that should already be built into most Linux filesystems.
This intro on snapshots claims as much.
And a little more research reveals that for kernels in the 2.6 series the ext series of filesystems should all support this. ReiserFS probably also. And if I know the btrfs people, that one probably does as well.
I know that ext3 and ext4 in RedHat Enterprise, Fedora and CentOS automatically checkpoint when a LVM snapshot is created. That means there is never any problem mounting the snapshot because it is always clean.
I believe XFS has the same support. I am not sure about other filesystems.
It depends on the filesystem you are using. With XFS you can use xfs_freeze -f to sync and freeze the FS, and xfs_freeze -u to activate it again, so you can create your snapshot from the frozen volume, which should be a save state.
Is there any way to "atomically" unmount an FS, make an LVM snapshot and mount it back?
It is possible to snapshot a mounted filesystem, even when the filesystem is not on an LVM volume. If the filesystem is on LVM, or it has built-in snapshot facilities (e.g. btrfs or ZFS), then use those instead.
The below instructions are fairly low-level, but they can be useful if you want the ability to snapshot a filesystem that is not on an LVM volume, and can't move it to a new LVM volume. Still, they're not for the faint-hearted: if you make a mistake, you may corrupt your filesystem. Make sure to consult the official documentation and dmsetup man page, triple-check the commands you're running, and have backups!
The Linux kernel has an awesome facility called the Device Mapper, which can do nice things such as create block devices that are "views" of other block devices, and of course snapshots. It is also what LVM uses under the hood to do the heavy lifting.
In the below examples I'll assume you want to snapshot /home, which is an ext4 filesystem located on /dev/sda2.
First, find the name of the device mapper device that the partition is mounted on:
# mount | grep home
/dev/mapper/home on /home type ext4 (rw,relatime,data=ordered)
Here, the device mapper device name is home. If the path to the block device does not start with /dev/mapper/, then you will need to create a device mapper device, and remount the filesystem to use that device instead of the HDD partition. You'll only need to do this once.
# dmsetup create home --table "0 $(blockdev --getsz /dev/sda2) linear /dev/sda2 0"
# umount /home
# mount -t ext4 /dev/mapper/home /home
Next, get the block device's device mapper table:
# dmsetup table home
home: 0 3864024960 linear 9:2 0
Your numbers will probably be different. The device target should be linear; if yours isn't, you may need to take special considerations. If the last number (start offset) is not 0, you will need to create an intermediate block device (with the same table as the current one) and use that as the base instead of /dev/sda2.
In the above example, home is using a single-entry table with the linear target. You will need to replace this table with a new one, which uses the snapshot target.
Device mapper provides three targets for snapshotting:
The snapshot target, which saves writes to the specified COW device. (Note that even though it's called a snapshot, the terminology is misleading, as the snapshot will be writable, but the underlying device will remain unchanged.)
The snapshot-origin target, which sends writes to the underlying device, but also sends the old data that the writes overwrote to the specified COW device.
Typically, you would make home a snapshot-origin target, then create some snapshot targets on top of it. This is what LVM does. However, a simpler method would be to simply create a snapshot target directly, which is what I'll show below.
Regardless of the method you choose, you must not write to the underlying device (/dev/sda2), or the snapshots will see a corrupted view of the filesystem. So, as a precaution, you should mark the underlying block device as read-only:
# blockdev --setro /dev/sda2
This won't affect device-mapper devices backed by it, so if you've already re-mounted /home on /dev/mapper/home, it should not have a noticeable effect.
Next, you will need to prepare the COW device, which will store changes since the snapshot was made. This has to be a block device, but can be backed by a sparse file. If you want to use a sparse file of e.g. 32GB:
# dd if=/dev/zero bs=1M count=0 seek=32768 of=/home_cow
# losetup --find --show /home_cow
/dev/loop0
Obviously, the sparse file shouldn't be on the filesystem you're snapshotting :)
Now you can reload the device's table and turn it into a snapshot device:
# dmsetup suspend home && \
dmsetup reload home --table \
"0 $(blockdev --getsz /dev/sda2) snapshot /dev/sda2 /dev/loop0 PO 8" && \
dmsetup resume home
If that succeeds, new writes to /home should now be recorded in the /home_cow file, instead of being written to /dev/sda2. Make sure to monitor the size of the COW file, as well as the free space on the filesystem it's on, to avoid running out of COW space.
Once you no longer need the snapshot, you can merge it (to permanently commit the changes in the COW file to the underlying device), or discard it.
To merge it:
replace the table with a snapshot-merge target instead of a snapshot target:
# dmsetup suspend home && \
dmsetup reload home --table \
"0 $(blockdev --getsz /dev/sda2) snapshot-merge /dev/sda2 /dev/loop0 P 8" && \
dmsetup resume home
Next, monitor the status of the merge until all non-metadata blocks are merged:
# watch dmsetup status home
...
0 3864024960 snapshot-merge 281688/2097152 1104
Note the 3 numbers at the end (X/Y Z). The merge is complete when X = Z.
Next, replace the table with a linear target again:
# dmsetup suspend home && \
dmsetup reload home --table \
"0 $(blockdev --getsz /dev/sda2) linear /dev/sda2 0" && \
dmsetup resume home
Now you can dismantle the loop device:
# losetup -d /dev/loop0
Finally, you can delete the COW file.
# rm /home_cow
To discard the snapshot, unmount /home, follow steps 3-5 above, and remount /home. Although Device Mapper will allow you to do this without unmounting /home, it doesn't make sense (since the running programs' state in memory won't correspond to the filesystem state any more), and it will likely corrupt your filesystem.
I'm not sure if this will do the trick for you, but you can remount a file system as read-only. mount -o remount,ro /lvm (or something similar) will do the trick. After you are done your snapshot, you can remount read-write using mount -o remount,rw /lvm.
FS corruption is "highly unlikely", as long as you never work in any kind of professional environment. otherwise you'll meet reality, and you might try blaming "bit rot" or "hardware" or whatever, but it all comes down to having been irresponsible. freeze/thaw (as mentioned a few times, and only if called properly) is sufficient outside of database environments. for databases, you still won't have a transaction-complete backup and if you think a backup that rolls back some transaction is fine when restored: see starting sentence.
depending on the activity you might just added another 5-10 mins of downtime if ever you need that backup.
Most of us can easily afford that, but it can not be general advice.
Be honest about downsides, guys.