Cassandra backup to tape or real snapshots - cassandra

Is there a way to backup Cassandra directly to tape (streaming device)?
Or to perform real snapshots?
The snapshot Cassandra is referring to is not what I want to call a snapshot.
It is more a consistent copy of the database files to a directory.
Regards Tomas

First, let's clarify the Cassandra write path, so we know what we need to back up. Writes come in and are first journaled in the commitlog, then written to the memtable, then eventually flushed to sstables. When sstables flush, the relevant commitlog segments are deleted.
If you want a consistent backup of Cassandra, you need at the very least the sstables, but ideally the sstables + commitlog, so you can replay any data between the commitlog and the most recent flush.
If you're using tape backup, you can treat the files on disk (both commitlog and sstables) as typical data files - you can tar them, rsync them, copy them as needed, or point amanda or whatever tape system you're using at the data file directory + commitlog directory, and it should just work - there's not a lot of magic there, just grab them and back them up. One of the more common backup processes involves using tablesnap, which watches for new sstables and uploads them to s3.

You can backup Cassandra directly to Tape using SPFS
SPFS is a file system for Spectrum Protect.
Just mount the SPFS file system where you want the backups to land.
Eg
mount -t spfs /backup
And backup Cassandra to this path.
All operations that goes via this mountpoint (/backup), will automatically be translated to Spectrum Protect Client API calls.
On the Spectrum Protect backup server, one can use any type of supported media.
For instance: CD, Tape, VTL, SAS, SATA, SSD, Cloud etc..
In this way, you can easily backup your Cassandra directly to a backup server.

Related

How can I quickly erase all partition information and data on partitions in LInux?

I'm testing a program to use on Raspberry Pi OS. A good part of what it does is read the partitioning info on the system drive, which is going to be (in this case), /boot and / and no extra partitions, just the two. I'm using a Python script that calls sfdisk. I do what so many examples show: I get the info from the system drive, read it as output, then use it as input to run the command to format the target drive.
I'm using Python and doing this with subprocess.run(). The script I'm writing, when it writes the 2nd partition on the target drive, writes it as a small size, then I use parted to extend the partition to the end of the drive. In between tests, to wipe my data so I can start fresh, I've been using sfdisk to make one partition for the full size of the drive. Also, I'm using USB memory sticks at this point for testing. I'll generally be using that for drives or using SD cards.
The problem I'm finding is that the file structure is persistent on the partitions on the target drive. (All this paragraph is about ONLY the target drive.) If I divide it up into 2 partitions (as I need to use, eventually), I find that /boot, the small 1st partition, still has all the files from previous usage of the partition. If I've tried to wipe the information by making only one big partition on the drive, I still see only, in that one partition, the original files for the /boot partition. If I split it into 2 partitions, the locations are going to be the same as when I normally make a Raspbian image and I find the files in both /boot and the system drive are still there.
So repartitioning, with the partitions in the same location, leaves me with the files still intact from the previous incarnation of a partition in the same sectors.
I'd like to, for testing, just wipe out all the information so I start fresh with each test, but I do not want to just use dd and write gigabytes of 0s or 1s to the full drive to wipe out the data.
What can I do to make sure:
The partition table is wiped out between tests
Any directory structure or file information for the partitions is wiped out so there are no files still surviving on any partitions when I start my testing?
A "nice" thing about linux filesystems is that they are separate from partition tables. This has saved me in the past when partition tables have been accidentally deleted or corrupted - recreate the partition table and the filesystem is still there! For your use case, if you want the files to be "gone", you need to destroy the filesystem superblocks. Destroying just the first one is probably sufficient for your use case.
Using dd to overwrite just the first MB of each of your filesystems should get you what you need. So, if you're starting your first partition/FS on block 0, you could do something like
# write 1MB of zeros to wipe out /boot
dd if=/dev/zero of=/dev/path_to_your_device bs=1024 count=1024
That ought to wipe out the /boot file system. From there you'll need to calculate the start of your root volume and you can use skip as per https://superuser.com/questions/380717/how-to-output-file-from-the-specified-offset-but-not-dd-bs-1-skip-n to write a meg of zeros at the start of your root filesystem.
Alternately, if /boot is small, you can just write sizeof(/boot)+1MB (assuming you start /root immediately after /boot) and it'll overwrite the primary superblock from /root too while saving you some calculations.
Note that the alternate superblocks will still exist, so at some point if you (or someone) wanted to get back what was there previously then recovery of alternate superblocks might be possible, except that whatever files were present in that first 1MB of disk would be corrupt due to overwrite.

How to modify the memtable flush time interval in cassandra?

I have enabled the incremental backup in the cassandra.yaml file. As I know when we enable incremental backups, cassandra will backup the data (in backups directory) only when the memtable is flushed. But what if the memtable is yet to be flushed? I won't be able to get the incremental backup right?. I know that for the memtable to be flushed there are certain conditions to be met such as time interval or memtable space. My question is how do I modify this so that even if I enter one record after the last snapshot, I can still backup entire data along with that latest entry?
Consider this example
Take the snapshot.
Clear incremental backup (backups directory)
Enter a record to a table.
Check for the incremental backup in backups directory. It is still empty.
Now how do I backup the record which is written after the last snapshot?In general how do we backup the entire upto-date data unless we take the snapshot?
You can flush the files manually with nodetool flush just before taking the backup. That way you'll always have the latest memtable flushed.
nodetool docs
If you want to backup a cluster without taking a snapshot you can do it by simply saving everything under /data folder from every node (this includes mainly the .db files stats files etc).
In order to not override files you should store it with the token information as well.
When you want to restore from this backup, you should spin up a cluster with the same number of nodes, and simply copy the data, one-to-one from each backed-up node to a restored node. Pay attention that you'll have to modify cassandra.yaml to include the relevant token in cassandra.yaml (as well as the peers/seeds/etc) for each restored node.
After all the data is copied, you can start C* process on all the nodes.

Proper cassandra keyspace restore procedure

I am looking for confirmation that my Cassandra backup and restore procedures are sound and I am not missing anything. Can you please confirm, or tell me if something is incorrect/missing?
Backups:
I run daily full backups of the keyspaces I care about, via "nodetool snapshot keyspace_name -t current_timestamp". After the snapshot has been taken, I copy the data to a mounted disk, dedicated to backups, then do a "nodetool clearsnapshot $keyspace_name -t $current_timestamp"
I also run hourly incremental backups - executing a "nodetool flush keyspace_name" and then moving files from the backup directory of each keyspace, into the backup mountpoint
Restore:
So far, the only valid way I have found to do a restore (and tested/confirmed) is to do this, on ALL Cassandra nodes in the cluster:
Stop Cassandra
Clear the commitlog *.log files
Clear the *.db files from the table I want to restore
Copy the snapshot/full backup files into that directory
Copy any incremental files I need to (I have not tested with multiple incrementals, but I am assuming I will have to overlay the files, in sequence from oldest to newest)
Start Cassandra
On one of the nodes, run a "nodetool repair keyspace_name"
So my questions are:
Does the above backup and restore strategy seem valid? Are any steps inaccurate or anything missing?
Is there a way to do this without stopping Cassandra on EVERY node? For example, is there a way to restore the data on ONE node, then somehow make it "authoritative"? I tried this, and, as expected, since the restored data is older, the data on the other nodes (which is newer) overwrites in when they sync up during repair.
Thank you!
There's two ways to restore Cassandra backups without restarting C*:
Copy the files into place, then run "nodetool refresh". This has the caveat that the rows will still be older than tombstones. So if you're trying to restore deleted data, it won't do what you want. It also only applies to the local server (you'll want to repair after)
Use "sstableloader". This will load data to all nodes. You'll need to make sure you have the sstables from a complete replica, which may mean loading the sstables from multiple nodes. Added bonus, this works even if the cluster size has changed. I'm not sure if ordering matters here (that is, I don't know if row timestamps are preserved through the load or if they're redefined during load)

moving Cassandra snapshots to a different disk/server/datacenter

I have Cassandra 1.2.6 cluster running on datacenter A, each node has a solid state drive with somewhat limited space (aprox 50% of disk space is free).
Now I need to implement somehow a way of having automatic backups of each node. Ideally I want to have a way of moving all of the cluster's datafiles to a different disk (standard cheaper disks), or even to a different server in the same datacenter A and possibly moving all the data once in a while to a datacenter B in a different location.
From what I've read I can use snapshots on each node to get the files to copy using whatever tool I want and in this case I have the option to move the data to a different disk/server/datacenter.
My question is, since each of my nodes is about 50% full, taking a snapshot will consume all that space? or the hard links will consume way less space than I anticipate?, if so, is there a better way of doing this, maybe with an already made tool, or everything should be custom made when it comes to this type of backups in Cassandra?
Thanks in advance!
A hard link just creates a new directory entry for the same file (http://en.wikipedia.org/wiki/Hard_link). So a snapshot takes up effectively zero space, but you'll want to clean it up after you're done copying it off to whatever your archive is, because when the "original" sstable is deleted (typically post-compaction), space won't be reclaimed as long as the snapshot reference is still there.
My impression is that tablesnap is the most popular tool for automating backups to s3. It also supports Cassandra incremental backups. If you want more control over where you're backing up to, DataStax OpsCenter supports running a custom script when it takes snapshots.

Cassandra Scrub - define a destination directory for snapshot

In my C* 1.2.4 setup, I have an ssd drive of 200Gb for the data and a rotational drive for commit logs of 500Gb.
I had the unpleasant surprise during a scrub operation to fill in my ssd drive with the snapshots. That made the cassandra box unresponsive but it kept the status as up when doing nodetool status.
I am wondering if there is a way to specify the target directory for snapshots when doing a scrub.
Otherwise if you have ideas for workarounds?
I can do a column family at a time and then copy the snapshots folder, but I am open for smarter solutions.
Thanks,
H
Snapshots in Cassandra are created as hard links to your existing data files. This means at the time the snapshot is taken, it takes up almost no extra space. However, it causes the old files to remain so if you delete or update data, the old version is still there.
This means snapshots must be taken on the drive that stores the data. If you don't need the snapshot any more, just delete it with 'nodetool clearsnapshot' (see the nodetool help output for how to decide which snapshots to delete). If you want to keep the snapshot then you can move it elsewhere. It will only start using much disk space after a while, so you could keep it until you are happy the scrub didn't delete important data then delete the snapshot.

Resources