How can I quickly erase all partition information and data on partitions in LInux? - linux

I'm testing a program to use on Raspberry Pi OS. A good part of what it does is read the partitioning info on the system drive, which is going to be (in this case), /boot and / and no extra partitions, just the two. I'm using a Python script that calls sfdisk. I do what so many examples show: I get the info from the system drive, read it as output, then use it as input to run the command to format the target drive.
I'm using Python and doing this with subprocess.run(). The script I'm writing, when it writes the 2nd partition on the target drive, writes it as a small size, then I use parted to extend the partition to the end of the drive. In between tests, to wipe my data so I can start fresh, I've been using sfdisk to make one partition for the full size of the drive. Also, I'm using USB memory sticks at this point for testing. I'll generally be using that for drives or using SD cards.
The problem I'm finding is that the file structure is persistent on the partitions on the target drive. (All this paragraph is about ONLY the target drive.) If I divide it up into 2 partitions (as I need to use, eventually), I find that /boot, the small 1st partition, still has all the files from previous usage of the partition. If I've tried to wipe the information by making only one big partition on the drive, I still see only, in that one partition, the original files for the /boot partition. If I split it into 2 partitions, the locations are going to be the same as when I normally make a Raspbian image and I find the files in both /boot and the system drive are still there.
So repartitioning, with the partitions in the same location, leaves me with the files still intact from the previous incarnation of a partition in the same sectors.
I'd like to, for testing, just wipe out all the information so I start fresh with each test, but I do not want to just use dd and write gigabytes of 0s or 1s to the full drive to wipe out the data.
What can I do to make sure:
The partition table is wiped out between tests
Any directory structure or file information for the partitions is wiped out so there are no files still surviving on any partitions when I start my testing?

A "nice" thing about linux filesystems is that they are separate from partition tables. This has saved me in the past when partition tables have been accidentally deleted or corrupted - recreate the partition table and the filesystem is still there! For your use case, if you want the files to be "gone", you need to destroy the filesystem superblocks. Destroying just the first one is probably sufficient for your use case.
Using dd to overwrite just the first MB of each of your filesystems should get you what you need. So, if you're starting your first partition/FS on block 0, you could do something like
# write 1MB of zeros to wipe out /boot
dd if=/dev/zero of=/dev/path_to_your_device bs=1024 count=1024
That ought to wipe out the /boot file system. From there you'll need to calculate the start of your root volume and you can use skip as per https://superuser.com/questions/380717/how-to-output-file-from-the-specified-offset-but-not-dd-bs-1-skip-n to write a meg of zeros at the start of your root filesystem.
Alternately, if /boot is small, you can just write sizeof(/boot)+1MB (assuming you start /root immediately after /boot) and it'll overwrite the primary superblock from /root too while saving you some calculations.
Note that the alternate superblocks will still exist, so at some point if you (or someone) wanted to get back what was there previously then recovery of alternate superblocks might be possible, except that whatever files were present in that first 1MB of disk would be corrupt due to overwrite.

Related

Cloning only the filled portion of a read only raw data HDD source (without source partition resizing)

I often copy raw data from HDD's with FAT32 partitions at the file level. I would like to switch to bitwise cloning this raw data that consists of thousands of 10MiB files that are sequentially written across a single FAT32 partition.
The idea is on the large archival HDD, have a small partition which contains a shadow directory structure with symbolic links to separate raw data image partitions. Each additional partition being the aforenoted raw data, but sized to only the size consumed on the source drive. The number of raw data files on each source drive can be in the tens up through the tens of thousands.
i.e.: [[sdx1][--sdx2--][-------------sdx3------------][--------sdx4--------][-sdx5-][...]]
Where 'sdx1' = directory of symlinks to sdx2, sdx3, sdx4, ... such that the user can browse to multiple partitions but it appears to them as if they're just in subfolders.
Optimally I'd like to find both a Linux and a Windows solution. If the process can be scripted or a software solution that exists can step through a standard workflow, that'd be best. The process is almost always 1) Insert 4 HDD's with raw data 2) Copy whatever's in them 3) Repeat. Always the same drive slots and process.
AFAIK, in order to clone a source partition without cloning all the free space, one conventionally must resize the source HDD partition first. Since I can't alter the source HDD in any way, how can I get around that?
One way would be clone the entire source partition (incl. free space) and resize the target backup partition afterward, but that's not going to work out because of all the additional time that would take.
The goal is to retain bitwise accuracy and to save time (dd runs about 200MiB/s whereas rsync runs about 130MiB/s, however also needing to copy a ton of blank space every time makes the whole perk moot). I'd also like to be running with some kind of --rescue flag so when bad clusters are hit on the source drive it just behaves like clonezilla and just writes ???????? in place of the bad clusters. I know I said "retain bitwise accuracy" but a bad cluster's a bad cluster.
If you think one of the COTS or GOTS software like EaseUS, AOMEI, Paragon and whatnot are able to clone partitions as I've described please point me in the right direction. If you think there's some way I can dd it up with some script which sizes up the source, makes the right size target partition, then modifies the target FAT to its correct size, chime in I'd love many options and so would future people with a similar use case to mine that stumble on this thread :)
Not sure if this will fit you, but is very simple.
Syncthing https://syncthing.net/ will sync the content of 2 or more folders, works on Linux and Windows.

Cassandra backup to tape or real snapshots

Is there a way to backup Cassandra directly to tape (streaming device)?
Or to perform real snapshots?
The snapshot Cassandra is referring to is not what I want to call a snapshot.
It is more a consistent copy of the database files to a directory.
Regards Tomas
First, let's clarify the Cassandra write path, so we know what we need to back up. Writes come in and are first journaled in the commitlog, then written to the memtable, then eventually flushed to sstables. When sstables flush, the relevant commitlog segments are deleted.
If you want a consistent backup of Cassandra, you need at the very least the sstables, but ideally the sstables + commitlog, so you can replay any data between the commitlog and the most recent flush.
If you're using tape backup, you can treat the files on disk (both commitlog and sstables) as typical data files - you can tar them, rsync them, copy them as needed, or point amanda or whatever tape system you're using at the data file directory + commitlog directory, and it should just work - there's not a lot of magic there, just grab them and back them up. One of the more common backup processes involves using tablesnap, which watches for new sstables and uploads them to s3.
You can backup Cassandra directly to Tape using SPFS
SPFS is a file system for Spectrum Protect.
Just mount the SPFS file system where you want the backups to land.
Eg
mount -t spfs /backup
And backup Cassandra to this path.
All operations that goes via this mountpoint (/backup), will automatically be translated to Spectrum Protect Client API calls.
On the Spectrum Protect backup server, one can use any type of supported media.
For instance: CD, Tape, VTL, SAS, SATA, SSD, Cloud etc..
In this way, you can easily backup your Cassandra directly to a backup server.

FreeNAS/ZFS with both RAID-Z and Mirror?

I'm considering switching to FreeNAS at the same time I'm acquiring some new disks for my home server. The end configuration will have a 1.5TB drive (currently the largest disk in the set) and two 3TB drives.
The "obvious" way to structure this (to me) would be to create partitions on the 3TB drives equal in size to the full 1.5TB drive, then RAID-Z those partitions together for 3TB of redundant storage. The remainder of the 3TB drives could be mirrored together for another 1.5TB of redundant storage. This seems like it gives me no wasted space, and a full 4.5TB of redundant storage to work with.
The problem is that I can't find anything that would let me treat these two segments as a single pool. I don't really care if any given data is written to parity vs. mirrored space, so long as it's all resilient to a single disk failure.
Am I stuck with two virtual spaces and allocating data between them, or is there a ZFS option I'm not finding that would let me pool the whole thing?
Technically you should be able to build a pool with two vdevs -- one with RAID-Z with 3 partitions and another a mirror with 2 partitions.
Something like this should work:
zpool create tank raidz da0p0 da1p0 da2p0 mirror da0p1 da1p1
That said, you don't want to do that for performance reasons. Reads and writes will be distributed across all vdevs, and, as the result, across *all your partitions for every chunk of data ZFS needs to write out. In the end your 3GB hard drives will have to do two seeks access data on different partitions each time ZFS writes out each transaction group. Once data is written, similar seeks will be needed to read data that's not in ARC yet. At 10-20ms per seek performance will be rather terrible.

Cassandra Scrub - define a destination directory for snapshot

In my C* 1.2.4 setup, I have an ssd drive of 200Gb for the data and a rotational drive for commit logs of 500Gb.
I had the unpleasant surprise during a scrub operation to fill in my ssd drive with the snapshots. That made the cassandra box unresponsive but it kept the status as up when doing nodetool status.
I am wondering if there is a way to specify the target directory for snapshots when doing a scrub.
Otherwise if you have ideas for workarounds?
I can do a column family at a time and then copy the snapshots folder, but I am open for smarter solutions.
Thanks,
H
Snapshots in Cassandra are created as hard links to your existing data files. This means at the time the snapshot is taken, it takes up almost no extra space. However, it causes the old files to remain so if you delete or update data, the old version is still there.
This means snapshots must be taken on the drive that stores the data. If you don't need the snapshot any more, just delete it with 'nodetool clearsnapshot' (see the nodetool help output for how to decide which snapshots to delete). If you want to keep the snapshot then you can move it elsewhere. It will only start using much disk space after a while, so you could keep it until you are happy the scrub didn't delete important data then delete the snapshot.

read data from ext4 filesystem directly from the raw partition without mounting the file system

Is it possible to add data of fixed size to an ext4 image such that its available at the last block of the partition (or say last 100KB)? I want to be able to to add data to the ext4 image such that I can read the data from the corresponding raw partition without any knowledge of the filesystem.
Is this possible?
You could build what you want using e2fslibs in e2fsprogs. That library gives you low-level access to reading the filesystem metadata.
First pass, you could dump all the metadata about "blocks in use" to see if those last 100K of blocks are in use or not. If not, just write over them.
You can use e2image utility for dump or restore metadata

Resources