Restore Cassandra snapshot to new keyspace in same cluster - cassandra

I've found documentation on restoring a keyspace snapshot to the same keyspace and also restoring it to a new cluster. However, I'm trying to make a copy of a keyspace in Cassandra and cannot find how to restore a snapshot to a new keyspace. Does anyone know if this is possible or have other recommendations on how to make a copy of the keyspace?

Step 1:
In your new keyspace, redefine the column families the same way as they were defined in the old keyspace. You can get the list of commands by running this cql:
DESCRIBE KEYSPACE ;
Note that here, your keyspace replication factor, etc shall remain the same.
Step 2 (do this on each node):
Under the old keyspace folder inside Cassandra data directory, there should be a snapshot folder per ColumnFamily. Copy the SSTables directly from the snapshot folders to the relevant ColumnFamily folders of the new keyspace inside Cassandra directory.
Step 3:
Do a rolling restart, and run repair on each node.

Related

Cassandra - alter Keyspace without downtime

I'm coming from a relational databases background. In PostgreSQL or MySQL if we do any alter statement it'll lock the entire table.
I have a Cassandra cluster(3 nodes) that has SimpleStretegy for all the keyspaces. One keyspace has 4 tables and each table has 500GB data.
So if I alter the keyspace to change the strategy, is there any lock or block?
The change itself is fast, keyspace in Cassandra is not the physical object, it's just a metadata. But when you change replication, you need to run repair operation as per documentation:
Simply altering the keyspace may lead to faulty data replication.
If you have one datacenter, then you may not get this problem, but it still better to run the repair.

Lost data after running nodetool decommission

I have a 3 node cluster with 1 seed and nodes in different zones. All running in GCE with GoogleCLoudSnitch.
I wanted to change hardware on each node so I started with adding a new seed in a different region which joined perfectly to the cluster. Then I started with "nodetool decommission" and when done I removed the the node when it is down and "nodetool status" states it's not in the cluster. I did this for all nodes and lastly I did it on the "extra" seed in the different region just to remove it to get back to a 3 node cluster.
We lost data! What can possibly be the problem? I saw a commando, "nodetool rebuild", which I ran and actually got some data back. "nodetool cleanup" didn't help either. Should I have run "nodetool flush" prior to "decommission"?
At the time of running "decommission" most keyspaces had ..
{'class' : 'NetworkTopologyStrategy', 'europe-west1' : 2}"
Should I first altered key spaces to include the new region/datacenter, which would be "'europe-west3' : 1" since only one node exist in that datacenter? I also noted that some keyspaces in the cluster had by mistake ..
{ 'class' : 'SimpleStrategy', 'replication_factor' : 1 }
Could this have caused the loss of data? It seems that it was in the "SimpleStrategy keyspaces" the data was lost.
(Disclaimer: I'm a ScyllaDB employee)
Did you 1st add new nodes to replace the ones you are decommissioning and configured the keyspace replication strategy accordingly? (you only mentioned the new seed node in your description, you did not mention if you did it for the other nodes).
Your data loss can very well be a result of the following:
Not altering the keyspaces to include the new region/zone with the proper replication strategy and replication factor.
Keyspaces that were configured with simple strategy (no netwrok aware) replication policy and replication factor 1. This means that the data was stored only in 1 node, and once that node went down and decommissioned, you basically lost the data.
Did you by any chance took snapshots and stored them outside your cluster? If you did you could try and restore them.
I would highly recommend reviewing these procedures for better understanding and the proper way to perform the procedure you intended to perform:
http://docs.scylladb.com/procedures/add_dc_to_exist_dc/
http://docs.scylladb.com/procedures/replace_running_node/

Cassandra Drop Keyspace Snapshot Cleaning

Was reading in Cassandra Documentation that:
Cassandra takes a snapshot of the keyspace before dropping it. In Cassandra 2.0.4 and earlier, the user was responsible for removing the snapshot manually.
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/drop_keyspace_r.html
This would imply that in versions after Cassandra 2.0.4, this is done automatically. If so, what configuration parameter (if any) sets the time before snapshot is automatically removed when doing a DROP KEYSPACE?
For example, in the case of DROP TABLE, gc_grace_seconds is
the number of seconds after data is marked with a tombstone (deletion marker) before it is eligible for garbage-collection.
I believe this reference is not accurate, Cassandra does not automatically clean up snapshots for you.
Cassandra won’t clean up the snapshots for you
http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#snapshot-before-compaction
You can remove snapshots using the nodetool clearsnapshot command, or manually delete the directories & files yourself (this is safe as snapshots are just file hard-links).
Note also that gc_grace_seconds is not related to snapshots, it is used during compactions only.

Cassandra: Migrate keyspace data from Multinode cluster to SingleNode Cluster

I have a keyspace in a multi-node cluster in QA environment. I want to copy that keyspace to my local single-node cluster. Is there any direct way to do this? I can't afford to write some code like SSTableLoader implementation at this point of time. Please suggest the quickest way.
Make sure you have plenty of free disk space on your new node and that you've properly set replication factor and consistency levels in your tests/build for your new, single node "cluster"
First, restore the exact schema from the old cluster to your new node. After that the data can be loaded in two ways:
1.) Execute the "sstableloader" utility on every node in your old cluster and point it at your new node. sstableloader is token aware, but in your case it will end up shipping all data to your new, single node cluster.
sstableloader -d NewNode /Path/To/OldCluster/SStables
2.) Snapshot the keyspace and copy the raw sstable files from the snapshot folders of each table in your old cluster to your new node. Once they're all there, copy the files to their corresponding table directory and run "nodetool refresh."
# Rinse and repeat for all tables
nodetool snapshot -t MySnapshot
cd /Data/keyspace/table-UUID/snapshots/MySnapshot/
rsync -avP ./*.db User#NewNode:/NewData/Keyspace/table-UUID
...
# when finished, exec the following for all tables in your new node
nodetool refresh keyspace table
Option #1 is probably best because it will stream the data and compact naturally on the new node. It's also less manual work. Option #2 is good, quick, and dirty if you don't have a direct line from one cluster to the other. You probably won't notice much difference since it's probably a relatively small keyspace for QA.

Cassandra keyspace disappeared leading to data loss

I was adding a node (cassandra-03) to my Cassandra 2.1.8 cluster (2 existing nodes, cassandra-01 and cassandra-02, 160+GB each, 1 keyspace), following http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_node_to_cluster_t.html.
At stage #3 (after restarting each nodes), I realized that on my existing nodes (cassandra-01 and cassandra-02), my keyspace disappeared, but the data are still on the filesystem.
nodetool status gives the expected output (3 nodes cluster), except on the data column (I was expecting 160GB on cassandra-01 and cassandra-02), where I only have a few KB.
I moved forward on step #4 and ran nodetool cleanup on cassandra-01. It worked in a few seconds, but my keyspace is still missing.
I re-created my keyspace via cqlsh, hoping cassandra will use the data sitting on the filesystem, with no luck.
Nothing weird on the logs, as far as I can tell.
How could I get my keyspace data back?
I wasn't able to use the SSTable files in my new keyspace (created with the same name as the original one), so I used sstableloader tool to reinject my data into my newly created keyspace (with all the tables created):
$ sudo mv /var/lib/cassandra/data/mykeyspace /otherlocation/mykeyspace
$ sstableloader -d <host> -f /etc/cassandra/cassandra.yaml -v /otherlocation/mykeyspace/tablename-<token>;

Resources