If I make snapshots on every node in 10 node cluster, how to restore them into 5 node cluster where every node has stronger CPU and more storeage ?
The traditional way to restore sstable backups is by copying the sstable files to the data directory and calling 'refresh' to load the data into the running cluster. If the topology has changed, if you're unable to access the data directory, if you have filename collisions, or if you don't have sufficient room or time to deal with lots of nodes having a ton of data they don't own, then nodetool refresh may be less than ideal.
However, cassandra includes a bundled tool called 'sstableloader', which reads sstables from disk and writes them into a running cluster. sstableloader may be a good fit to load data from your sstables into the cluster, without worrying about the changed topology.
More info is available: http://www.pythian.com/blog/bulk-loading-options-for-cassandra/
Related
Edited after reading nodetool tagged questions.
We take snapshots of our single node cassandra database daily. If I want to restore a snapshot either on that node, or on our staging server which is running a different instance of cassandra, my understanding is I have to:
nodetool disablegossip
nodetool disablebinary
nodetool drain
Copy the sstable files from the snapshot directories to the sstable directories under the keyspace directory.
Run nodetool refresh on each table.
Enable binary & gossip.
Is this sufficient to safely bring the snapshot sstable files in without cassandra overwriting them while I'm doing the refresh?
What is the opposite of nodetool drain?
Another edit: What about sstableloader? Should I use that instead? If so, how? I looked at the "documentation" and am none the wiser.
The steps you outlined isn't quite right. You don't shutdown Cassandra and you shouldn't just copy the files on top of the existing SSTables.
At a high level, the steps to restore table snapshots on a node are:
TRUNCATE the table you want to restore (will remove the SSTables from the data directories).
Copy the SSTables from data/ks_name/table-UUID/snapshots/snapshot_name subdirectory into the "live" data directory data/ks_name/table-UUID.
Run nodetool refresh -- ks_name table_name.
You will need to repeat these steps for each application table you want to restore. NOTE: Do NOT restore system tables, only application tables.
The detailed steps are documented in Restoring from a snapshot in Cassandra.
To restore a snapshot into another cluster, I prefer to refer to this as "cloning". The procedure for cloning snapshots to another cluster depends on whether the source and destination clusters have identical configuration.
If both source and destination clusters are identical, follow the steps I documented here -- https://community.datastax.com/questions/4534/. I've explained what identical configuration means in this post.
If they are not identical, follow the steps I documented here -- https://community.datastax.com/questions/4477/. Cheers!
I have below cassandra query ;
Few days ago i have developed application using c# and Single node Cassandra db. While the application in production, power failure occurred and cassandra commitlog got corrupt. Because of it cassandra node not starting, so i have shifted all commitlog files to another directory and started the cassandra node.
Recently i noticed the power failure day's data not available in database, I have all commitlog files with corrupted commitlog file name.
Can you please suggest, is there a way to recover data using commitlog files.
As well how to avoid commitlog file corruption issue, so that in production data loss can be avoid.
Thank you.
There is no way to restore back the node to the previous state if your commit logs have got corrupted and you have no SSTables with you.
If your commit logs are healthy (meaning it's not corrupted), then you just need to restart your node . It will be replayed,as a result will rebuild the memtable(s) and flush generation-1 SSTables on the disk.
What you can ideally do is to forcibly create SSTables.
You can do that under the apache-cassandra/bin directory by
nodetool flush
So if you are wary of losing commit logs .You can rebuild your node to previous states using SSTables so created above using
nodetool.bat refresh [keyspace] [columnfamily].
Alternatively you can also try creating snapshots.
nodetool snapshot
This command will take a snapshot of all keyspaces on the node.You also have the option of creating backups but this one will only keep record of the latest operations.
For more info try reading
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsNodetool_r.html
I suggest you can also try having more nodes and thus increase the replication factor to avoid such scenarios in future.
Hope it helps!
We have set up a backup/restore procedure for our Cassandra production environment via snapshots. The snapshot files, schema and token ring information are copied to S3.
The production cluster is a 3-node-cluster with a replication factor of 3.
For development and test, I would like to restore the snapshots from production into separated clusters. To save money and to keep maintenance easy, it would be nice to restore only the snapshot from one production node. Since we are using a replication factor of 3 in a 3-node-cluster, each snapshot should have all rows. Consistency is also not important for our use-case.
Is it possible (and how) to restore only a single snapshot?
All of your data should exist on all 3 nodes so copying the sstables from any 1 node to your test cluster should be sufficient. Making sure theres a recent repair beforehand may be good idea if worried about consistency.
First create the same schema on the test cluster. Then you can simply take a snapshot with nodetool snapshot -t cloneme. Once complete, copy all the sstables from the folder that is created (cloneme) into the equivalent tables folder on your test cluster. Then run nodetool refresh.
It gets much more complicated if you have a different topology (more nodes, different RF) but since your going with "every node has all the data" its pretty trivial.
Worth mentioning that OpsCenter has a feature to automate the copying of a backup to other clusters.
I am trying to backup the whole cluster consistently. What are different ways to backup and restore Cassandra cluster?
If you are using the DataStax Enterprise version, then the easiest way is to perform the backups and restore using OpsCenter.
If you are using the DataStax Community or open-sourced version of Cassandra, then use nodetool snapshot to create backups of tables and/or keyspaces.
Please bear in mind that SSTables are immutable, i.e. they never change once they are written to disk. So unlike RDBMS data files, SSTables are not updated.
To perform a snapshot cluster-wide, use SSH tools such as pssh to perform parallel snapshots on all nodes.
More information on the snapshot utility is available here.
There are several ways to restore from snapshots. One way is to re-load the data using the sstableloader tool where the data is read back into the cluster. Another way is by copying the SSTable directory from snapshot and running nodetool refresh. Finally, you can replace the existing data with the snapshot and restarting the node.
More information on backups and restores are available here.
I m facing a problem, Cassandra is not storing the data but other commit-log is working. I saw as per seeing the configuration .yaml file. I have checked the folder and Cassandra has created a folder in MYKEYSPACENAME, but the data is not been stored there.
Is there something that I need to store the data? I m using the Cassandra 1.0.7 version.
It sounds like everything is working normally. When Cassandra receives data to write, it first writes to a commit log and to an in memory data structure (called a Memtable). Once the Memtable is full Cassandra will flush it to an SSTable on disk. You can force Cassandra to flush its Memtables using nodetool:
nodetool flush [keyspace] [cfnames]
This is not something that you need to do for normal operation of a Cassandra ring. Cassandra will eventually flush the Memtables to disk. If for some reason one of your Cassandra machines goes down, when it restarts it will replay the commit log so you will not lose any previously received writes.
The Cassandra wiki has more information on Memtables and SSTables.