I am learning Cassandra now-a-days, i have successfully backup and restore tables or keyspace mentioned in this URL.
But i am looking for following options
1)Take complete backup of a keyspace at different location other then mentioned directory in cassandra.yaml. -t option create directory in snapshot folder not different HDD location.
2) Or backup/restore procedure same like mysql.
Thanks
You have a few options...for small amounts of data, you can use COPY to backup / restore from csv:
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/copy_r.html
For larger stuff, you've got the right link. You essentially take a snapshot (which puts it in the folder you mention), and then use something like tar to zip the files and output to a different directory. This is what we're doing in production... we clear the previous snapshot, take a snapshot and tar the folder to a network backup.
Related
I am new to cassandra and trying to figure out how to restore data from Snapshots+incremental backups onto a new node.
Below is my understanding :
1.take snapshot of the cassandra node
2.enable incremental backups
3.copy the snapshots / incremental backups to remote location
4.for restore data on new node :
- install cassandra
- copy the keyspace / table structure plus snapshot data on the new node from remote location
Now my real question is where do i place the incremental backup files ?
do they go in same backups folder on new node under /keyspace/table/backups or i need to copy these files to the main folder /keyspace/table ?
Also how to copy when I have multiple incremental backup files ?
Edited after reading nodetool tagged questions.
We take snapshots of our single node cassandra database daily. If I want to restore a snapshot either on that node, or on our staging server which is running a different instance of cassandra, my understanding is I have to:
nodetool disablegossip
nodetool disablebinary
nodetool drain
Copy the sstable files from the snapshot directories to the sstable directories under the keyspace directory.
Run nodetool refresh on each table.
Enable binary & gossip.
Is this sufficient to safely bring the snapshot sstable files in without cassandra overwriting them while I'm doing the refresh?
What is the opposite of nodetool drain?
Another edit: What about sstableloader? Should I use that instead? If so, how? I looked at the "documentation" and am none the wiser.
The steps you outlined isn't quite right. You don't shutdown Cassandra and you shouldn't just copy the files on top of the existing SSTables.
At a high level, the steps to restore table snapshots on a node are:
TRUNCATE the table you want to restore (will remove the SSTables from the data directories).
Copy the SSTables from data/ks_name/table-UUID/snapshots/snapshot_name subdirectory into the "live" data directory data/ks_name/table-UUID.
Run nodetool refresh -- ks_name table_name.
You will need to repeat these steps for each application table you want to restore. NOTE: Do NOT restore system tables, only application tables.
The detailed steps are documented in Restoring from a snapshot in Cassandra.
To restore a snapshot into another cluster, I prefer to refer to this as "cloning". The procedure for cloning snapshots to another cluster depends on whether the source and destination clusters have identical configuration.
If both source and destination clusters are identical, follow the steps I documented here -- https://community.datastax.com/questions/4534/. I've explained what identical configuration means in this post.
If they are not identical, follow the steps I documented here -- https://community.datastax.com/questions/4477/. Cheers!
We have a regular backup of our cluster and we store schema and snapshot back up on aws s3 on daily basis.
Somehow we have lost all the data and while recovering the data from backup we are able to recover schema but while copying snapshots files to /var/lib/cassandra/data directory its not showing up the data in the tables.
After copying the data we have done nodetool refresh -- keyspace table but still nothing is working out.
could you please help on this ?
Im new at Apache Cassandra, but my first focus at this topic was the Backup.
If you want to restore from a Snapshot (on new node/cluster) you have to shut down Cassandra on any node and clear any existing data from these folders:
/var/lib/cassandra/data -> If you want to safe your System Keyspaces so delete only your Userkeyspaces folders
/var/lib/cassandra/commitlog
/var/lib/cassandra/hints
/var/lib/cassandra/saved_cashes
After this, you have to start Cassandra again (the whole Cluster). Create the Keyspace like the one you want to restore and the table you want to restore. In Your Snapshot folder you will find a schema.cql script for the creation of the table.
After Creating the Keyspaces an tables again, wait a moment (time depends on the ammount of nodes in your cluster and keypsaces you want to restore.)
Shut down the Cassandra Cluster again.
Copy the Files from the Snapshot folder to the new folders of the tables you want to restore. Do this on ALL NODES!
After copying the files, start the nodes one by one.
If all nodes are running, run the nodetool repair command.
If you try to check the data via CQLSH, so think of the CONSISTENCY LEVEL! (ALL/QUORUM)
Thats the way, wich work at my Cassandra cluster verry well.
The general steps to follow for restoring a snapshot is:
1.Shutdown Cassandra if still running.
2.Clear any existing data in commitlogs, data and saved caches directories
3.Copy snapshots to relevant data directories
4.Copy incremental backups to data directory (if incremental backups are enabled)
If required, set restore_point_in_time parameter in commitlog_archiving.properties to
restore point.
5.Start Cassandra.
6.Run repair
So try running repair after copying data.
We have a 10 node Cassandra cluster. We configured a repair in Opscenter. We find there is a backups folder created for every table in Opscenter keyspace. It keeps growing huge. Is there a solution to this, or do we manually delete the data in each backups folder?
First off, Backups are different from snapshots - you can take a look at the backup documentation for OpsCenter to learn more.
Incremental backups:
From the datastax docs -
When incremental backups are enabled (disabled by default), Cassandra
hard-links each flushed SSTable to a backups directory under the
keyspace data directory. This allows storing backups offsite without
transferring entire snapshots. Also, incremental backups combine with
snapshots to provide a dependable, up-to-date backup mechanism.
...
As with snapshots, Cassandra does not automatically clear
incremental backup files. DataStax recommends setting up a process to
clear incremental backup hard-links each time a new snapshot is
created.
You must have turned on incremental backups by setting incremental_backups to true in cassandra yaml.
If you are interested in a backup strategy, I recommend you use the OpsCenter Backup Service instead. That way, you're able to control granularly which keyspace you want to back up and push your files to S3.
Snapshots
Snapshots are hardlinks to old (no longer used) SSTables. Snapshots protect you from yourself. For example you accidentally truncate the wrong keyspace, you'll still have a snapshot for that table that you can bring back. There are some cases when you have too many snapshots, there's a couple of things you can do:
Don't run Sync repairs
This is related to repairs because synchronous repairs generate a Snapshot each time they run. In order to avoid this, you should run parallel repairs instead (-par flag or by setting the number of repairs in the opscenter config file note below)
Clear your snapshots
If you have too many snapshots and need to free up space (maybe once you have backed them up to S3 or glacier or something) go ahead and use nodetool clearsnapshots to delete them. This will free up space. You can also go in and remove them manually from your file system but nodetool clearsnapshots removes the risk of rm -rf ing the wrong thing.
Note: You may also be running repairs too fast if you don't have a ton of data (check my response to this other SO question for an explanation and the repair service config levers).
I have a development machine that had a single-node Cassandra 2.1.2 setup. The main drive failed, but the secondary drive, mounted at /var, is good. I was able to connect this drive to another system with a working OS and Cassandra install and mount it.
I can see the files for my database under /var/cassandra/data// and would like to recover that data. Nothing outside of /var survived -- no configs or binaries.
Is it as simple as copying that directory to the /var/cassandra/data/ directory on the good system, or is there some other more detailed procedure for recovering/importing that data?
All you need to remember from your config files - is what was the partitioner. Now you should save your data files somewhere safe, create another single-node cluster, then follow the procedure described here.
The files in your saved data folder is what in these manuals is referred to as snapshot.