Cassandra: concurrent snapshot and clear snapshot? - cassandra

I have a daily cron job[1] which take snapshots of cassandra, and upload it to s3 buckets. After doing that, the snapshots will be deleted.
However, there is also a pipeline job that takes snapshot of cassandra, which I cannot modify. This job does not delete snapshots after it's done and it relies on another daily cron job[2] to delete all snapshots (basically call nodetool clearsnapshot).
My concern now is that, the daily cron job[2] might delete my snapshots, and thus my cron job[1] will not be able to upload them into s3 buckets. What will happen if my nodetool snapshot and nodetool clearsnapshot of another job happens at the same time? Is there a way to require the daily cron job[2] to happen after my cron job[1]?

nodetool snapshot has the functionality to tag the snapshots. One way to solve this is to compromise with the owner of the other process so every time that a snapshot is taken, it is properly tagged.
Your backup procedure should be something similar to:
nodetool snapshot -t backup
... upload to s3 ...
nodetool clearsnapshot -t backup
The other pipeline can have its own tag:
nodetool snapshot -t pipeline
And the crontab should include the pipeline's tag
nodetool clearsnapshot -t pipeline
If there is no chance to change the pipeline to include the tag, you may restrict the execution of the cron job so it will verify that no backup process is running (like looking for a PID) before doing the clearsnapshot.

Related

Is it safe to copy cassandra snapshot files over sstable files in a running node?

Edited after reading nodetool tagged questions.
We take snapshots of our single node cassandra database daily. If I want to restore a snapshot either on that node, or on our staging server which is running a different instance of cassandra, my understanding is I have to:
nodetool disablegossip
nodetool disablebinary
nodetool drain
Copy the sstable files from the snapshot directories to the sstable directories under the keyspace directory.
Run nodetool refresh on each table.
Enable binary & gossip.
Is this sufficient to safely bring the snapshot sstable files in without cassandra overwriting them while I'm doing the refresh?
What is the opposite of nodetool drain?
Another edit: What about sstableloader? Should I use that instead? If so, how? I looked at the "documentation" and am none the wiser.
The steps you outlined isn't quite right. You don't shutdown Cassandra and you shouldn't just copy the files on top of the existing SSTables.
At a high level, the steps to restore table snapshots on a node are:
TRUNCATE the table you want to restore (will remove the SSTables from the data directories).
Copy the SSTables from data/ks_name/table-UUID/snapshots/snapshot_name subdirectory into the "live" data directory data/ks_name/table-UUID.
Run nodetool refresh -- ks_name table_name.
You will need to repeat these steps for each application table you want to restore. NOTE: Do NOT restore system tables, only application tables.
The detailed steps are documented in Restoring from a snapshot in Cassandra.
To restore a snapshot into another cluster, I prefer to refer to this as "cloning". The procedure for cloning snapshots to another cluster depends on whether the source and destination clusters have identical configuration.
If both source and destination clusters are identical, follow the steps I documented here -- https://community.datastax.com/questions/4534/. I've explained what identical configuration means in this post.
If they are not identical, follow the steps I documented here -- https://community.datastax.com/questions/4477/. Cheers!

Elassendra backup Solution

We have three node cluster for Cassandra/Elassandra and I needs to setup backup for this. I used "nodetool snapshot" command for taking backup, but as we are using elasticserach so do I need to take separate backups of Indices or taking backup from "nodetool snapshot" is enough for this.
if separate backup is required for indices then can you pls suggest me how to take backup/restore because there is no proper documentation for taking elassendra backup/restore
Thanks
Since
Elassandra = Elasticsearch + Cassandra, So you need backup from Cassandra on the same time of backup from Elasticsearch.
By design, Elassandra synchronously updates Elasticsearch indices on the Cassandra write path. Therefore, Elassandra can backup data by taking a snapshot of Cassandra SSTables and Elasticsearch Lucene files on the same time on each node, as follows :
For Cassandra SSTables use:
nodetool snapshot --tag <snapshot_name> <keyspace_name>
And for index files use copy them by:
cp -al $CASSANDRA_DATA/elasticsearch.data/<cluster_name>/nodes/0/indices/<index_name>/0/index/(_*|segment*) $CASSANDRA_DATA/elasticsearch.data/snapshots/<index_name>/<snapshot_name>/
However there is a documentation on Elassandra to Backup and Restore.

How to get when nodetool snapshot is finished on cassandra

I am writing python script creating and uploading Cassandra backups to external storage. Cassandra cluster is presented by three nodes and every node has two JBOD disks.
The script starts nodetool snapshot command and this command starts snapshot creation job on cluster and exits immediately while snapshot creation process continues its operation. Now script should wait until snapshot creation will be finished and continue with upload. Question is how to get snapshot creation status?
nodetool snapshot creates hard links to the sstables. When the command exits, the snapshot is complete. it's typically v. fast

Cassandra - What precaustion should I take while deleting the backup files?

As per documentation at http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_backup_incremental_t.html,
As with snapshots, Cassandra does not automatically clear incremental backup files. DataStax recommends setting up a process to clear incremental backup hard-links each time a new snapshot is created.
So is it safe to trigger deleting all the files in backups directory immediately after invoking snapshot?
How can I check, whether snapshot was, not only invoked successfully, but also completed successfully?
What if I end-up deleting a backup hard-link which got created "just after" invoking the snapshot, but before the moment I triggered deletion of backup files?

Backups folder in Opscenter keyspace growing really huge

We have a 10 node Cassandra cluster. We configured a repair in Opscenter. We find there is a backups folder created for every table in Opscenter keyspace. It keeps growing huge. Is there a solution to this, or do we manually delete the data in each backups folder?
First off, Backups are different from snapshots - you can take a look at the backup documentation for OpsCenter to learn more.
Incremental backups:
From the datastax docs -
When incremental backups are enabled (disabled by default), Cassandra
hard-links each flushed SSTable to a backups directory under the
keyspace data directory. This allows storing backups offsite without
transferring entire snapshots. Also, incremental backups combine with
snapshots to provide a dependable, up-to-date backup mechanism.
...
As with snapshots, Cassandra does not automatically clear
incremental backup files. DataStax recommends setting up a process to
clear incremental backup hard-links each time a new snapshot is
created.
You must have turned on incremental backups by setting incremental_backups to true in cassandra yaml.
If you are interested in a backup strategy, I recommend you use the OpsCenter Backup Service instead. That way, you're able to control granularly which keyspace you want to back up and push your files to S3.
Snapshots
Snapshots are hardlinks to old (no longer used) SSTables. Snapshots protect you from yourself. For example you accidentally truncate the wrong keyspace, you'll still have a snapshot for that table that you can bring back. There are some cases when you have too many snapshots, there's a couple of things you can do:
Don't run Sync repairs
This is related to repairs because synchronous repairs generate a Snapshot each time they run. In order to avoid this, you should run parallel repairs instead (-par flag or by setting the number of repairs in the opscenter config file note below)
Clear your snapshots
If you have too many snapshots and need to free up space (maybe once you have backed them up to S3 or glacier or something) go ahead and use nodetool clearsnapshots to delete them. This will free up space. You can also go in and remove them manually from your file system but nodetool clearsnapshots removes the risk of rm -rf ing the wrong thing.
Note: You may also be running repairs too fast if you don't have a ton of data (check my response to this other SO question for an explanation and the repair service config levers).

Resources