Cassandra - copying sstable snapshot from one cluster to another - cassandra

I know there are several similar questions out there, but I'm still confused about this. As there is a need for this mechanism (copying data from one cluster to another), I'm looking for a little clarification.
Let's assume a very simple scenario. I want to copy a table from one cassandra cluster (C1) to another (C2). The table I'm copying is called "item".
Let's assume the node count of each cluster is the same (source and target nodes in cluster is 4 each). Not sure that matters or not.
I'm attempting to use snapshots and sstableloader to do the trick. I have been able to create a snapshot, copy the snapshot files from C1:N1 (cluster 1 node 1 .../myspace/item-xxxxxx/snapshot/######) to target table directory C2:N1 (cluster 2 node 1: .../myspace/item-xxxxxx). I used sstableloader to load the data and ran nodetool repair. Perfect. The only problem is that as the loaded snapshot was only from one of the source nodes, I only "restored" part of the data (about 485 of the 1k rows). So I'm thinking I'll copy the snapshot from C1:N2 to C2:N1 again and load it up. The problem is that all of the table files already exist on the C2:N1. If I copy the snapshot files from C1:N2 to table directory on C2:N1, I'll blow away the files that are already there. I didn't check all 4 target nodes, but I did check node 2 of the target and the item table directory already existed there too with data files. I'm guessing all of nodes on the target have data files, so I'm stuck with how to sstableload the other 3 source node snapshot files.
So long story short (if that's possible):
How am I supposed to load multiple source snapshot files (one from each host on the source cluster) to a target cluster? And to complicate matters, will it matter if the source and target clusters have a different number of nodes (I would think that having less nodes on the target would be potentially be a bigger problem).
What is really needed here, in my opinion, is a way to run the ssableloader on the SOURCE cluster and have it stream the data to a target cluster. Would make life a lot easier, I would think.
Thanks in advance.
-Jim

There are two options for bulk loading, It seems you may have them semi-merged together. You are mostly referring to the "copy the sstables" mechanism which is pretty manual and may not be worth the trouble unless performance of the restore is top priority. Using sstable loader is different though and doesn't require that.
sstableloader tool will connect to a node, find all the nodes in that nodes cluster and uses the connection to build metadata/discovery. It will split/stream the sstables that you select to the target cluster in the appropriate token ranges (you wont need the repair). You can run sstableloader from the source clusters nodes, and point it to the destination cluster, you dont need to copy the sstables over yourself (although if they are in different DCs it may be a bit faster).
If you have OpsCenter the automation of these steps can be done for you with a GUI https://docs.datastax.com/en/opscenter/5.2/opsc/online_help/services/opscBackupCloneCluster.html

Related

Cassandra: How to find node with matching token for restoring to newer cluster?

I want to restore data from an existing cluster to newer cluster. I want to do so using the method, that of, copying the snapshot SSTables from old cluster to keyspaces of newer cluster, as explained in http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html.
The same document says, " ... the snapshot must be copied to the correct node with matching tokens". What does it really mean by "node with matching tokens"?
My current cluster is of 5 nodes and for each node num_tokens: 256. I am gonna create another cluster with same no of nodes and num_tokens and same schema. Do I need to follow the ring order while copying SSTables to newer cluster? How do I find matching target node for a given source node?
I tried command "nodetool ring" to check if I can use token values to match. But this command gives all the tokens for each host. How can I get the single token value (which determines the position of the node in the ring)? If I can get it, then I can find the matching nodes as well.
With vnodes its really hard to copy the sstables over correctly because its not just one assigned token that you have to reassign, but 256. To do what your asking you need to do some additional steps described http://datascale.io/cloning-cassandra-clusters-fast-way/. Basically reassign the 256 tokens of each node to a new node in other cluster so the ring is the same. The article you listed describes loading it on the same cluster which is a lot simpler because you dont have to worry about different topologies. Worth noting that even in that scenario, if a new node was added or a node was removed since the snapshot it will not work.
Safest bet will be to use sstableloader, it will walk through the sstable and distribute the data in the appropriate node. It will also open up possibility of making changes without worrying if everything is correct. Also it ensures everything is on the correct nodes so no worries about human errors. Each node in the original cluster can just run sstableloader on each sstable to the new cluster and you will parallelize the work pretty well.
I would strongly recommend you use this opportunity to decrease the number of vnodes to 32. The 256 default is excessive and absolutely horrible for rebuilds, solr indexes, spark, and most of all it ruins repairs. Especially if you use incremental repairs (default), the additional ranges will cause much more anticompactions and load. If you use sstableloader on each sstable it will just work. Increasing your streaming throughput in the cassandra.yaml will potentially speed this up a bit as well.
If by chance your using OpsCenter this backup and restore to new cluster is automated as well.

Cassandra - avoid nodetool cleanup

If we have added new nodes to a C* ring, do we need to run "nodetool cleanup" to get rid of the data that has now been assigned elsewhere? Or is this going to happen anyway during normal compactions?
During normal compactions, does C* remove data that does no longer belong on this node, or do we need to run "nodetoool cleanup" for that? Asking because "cleanup" takes forever and crashes the node before finishing.
If we need to run "nodetool cleanup", is there a way to find out which nodes now have data they should no longer own? (i.e data that now belongs on the new nodes, but is still present on the old nodes because no one removed it. This is the data that "nodetool cleanup" would remove.) We have RF=3 and two data centers, each of which has a complete copy of the data. I assume we need to run cleanup on all nodes in the data center where we have added nodes, because each row on the new node used to be on another node (primary), plus two copies (replicas) on two other nodes.
If you are on Apache Cassandra 1.2 or newer, cleanup checks the meta data on files so that it only does something if it needs to. So you are safe to just run it on every node, and only those nodes with extra data will do something. The data will not be removed during the normal compaction process, you have to call cleanup to remove it.
What I found helpful is to just compare how much space each node occupies in the data folder (for me it was /var/lib/cassandra/data). Some things like snapshots might differ between the nodes but when you see that newer nodes use much less disk space than older ones it might be because they did not have a cleanup after the newer ones where added. While you are there, you can also check what is the biggest .db file in there and check if your storage is has enough free space to store another file of that size The cleanup seems to copy the data of the .db files into new ones, minus the data that is now on other nodes. So you might need that extra space while it runs.

Cassandra multiple disk per node setup

Intro
I have a cassandra 1.2 cluster, all the nodes have SSDs. Now I want to add more disks to the existing nodes, but I want to be able to choose which tables are stored on different disks.
Problem
For example, node 1 will have 3 SSDs and 1 regular disk drive and I want all the column families except 1 (let's call it "discord" table) to be stored on the SSDs only, the final table "discord" needs to be stored on the regular disk.
According to the documentation this should be possible; however, the only way of doing it that I can see is:
Setting up Cassandra to use multiple data_files_directories in cassandra.yaml.
Creating the tables.
Creating a link from the data directory on each SSD to the directory on the hard disk where I want to store the column family.
Question
Is this the only way of doing it? Or there is a simpler way of configuring a node to work in this way?
You can set multiples files using the data_file_directories property, but the data is distributed over the folders internally by Cassandra. You can not take decisions on which keyspace or column family goes to each directory.
So the symbolic links is the way to go in my opinion.

copy data from cassandra to cassandra

I had a production cluster of 20 nodes with 3 replication and I want to copy a part of data i.e, ~600GB ( with 3 replication ) to my test environment with only 1 replication.
I know we can use sstableloader but do we need to copy all the 600GB over network to the other cluster?
Is their a way to move only one copy of data to other cluster?
What is the best way to do it?
I am assuming your are using RandomPartitioner. What you are doing depends on how many nodes are in your test environment.
In case of SimpleStrategy:
A. If you are using 20 nodes in your test environment:
Assign same token to each node in your test environment;
Use nodetool snapshot on all nodes at the same time;
Copy the data from snapshot directories from your production node with the same token to the test node with the same token;
To change the replication factor to 1, simply update the keyspace with new replication setting like here:http://wiki.apache.org/cassandra/Operations#Replication
Run cleanup on each node.
B. If you are using less number of nodes than production:
Evenly assign tokens to new nodes to get a balanced ring;
Use nodetool snapshot on all nodes at the same time;
You will have to copy all the data from all the nodes in production to each node in your test environment;
If you are using LevelCompaction, make sure you remove metadata.json from the date directory of the column family using that compaction before starting the node. This make LevelCompaction to compact and group the labels correctly on your new setup.
Same as 4 above;
Same as 5 above;
You can skip the snapshot and copy the data directories straight over if you don't care about consistency of data at a point in time in your restored version for testing.
Things to consider:
This process effects your Disk I/O dramatically. If you are doing it on a live cluster, use the snapshot to at least lock the state at a point in time and copy gradually.
In case of NetworkTopologyStrategy:
You can repeat the process above but only copy from combination of nodes that are in one rack and form 100% of data. If you absolutely care about possible missed writes to nodes on other racks that were not replication to the nodes in this rack, then you will have to copy everything from all the nodes like above.
Ideal solution:
If you are going to do this every day for testing like I do for my company, you want to make some automation around it. The best automation for backup and restore in my opinion is Netflix's Priam https://github.com/Netflix/Priam
I have production backups stored in S3. Code will bring up new machines in test, assigns the same token for one zone and I set the priam snapshot time to the range from last day's backup, then the test nodes will automatically receive the data from s3 backups.
Hope my answer helped you.

How does cassandra split keyspace data when multiple directories are configured?

I have configured three separate data directories in cassandra.yaml file as given below:
data_file_directories:
- E:/Cassandra/data/var/lib/cassandra/data
- K:/Cassandra/data/var/lib/cassandra/data
when I create keyspace and insert data my key space got created in both two directories and data got scattered. what I want to know is how cassandra splits the data between multiple directories?. And what is the rule behind this?
You are using the JBOD feature of Cassandra when you add multiple entries under data_file_directories. Data is spread evenly over the configured drives proportionate to their available space.
This also let's you take advantage of the disk_failure_policy setting. You can read about the details here:
http://www.datastax.com/dev/blog/handling-disk-failures-in-cassandra-1-2
In short, you can configure Cassandra to keep going, doing what it can if the disk becomes full or fails completely. This has advantages over RAID0 (where you would effectively have the same capacity as JBOD) in that you do not have to replace the whole data set from backup (or full repair) but just run a repair for the missing data. On the other hand, RAID0 provides higher throughput (depending how well you know how to tune RAID arrays to match filesystem and drive geometry).
If you have the resources for fault-tolerant/more performant RAID setup (like RAID10 for example), you may want to just use a single directory for simplicity. Most deployments are starting to lean towards the density route, using JBOD rather than systems-level tolerance though.
You can read about the thought process behind the development of this issue here:
https://issues.apache.org/jira/browse/CASSANDRA-4292
Some what I am able to guess how the keyspace is split between multiple data directories. Based on the maximum available space and load on directories, SSTables of same column family written to the different data directories..

Resources