Cassandra: Can't one use snapshots to rapidly scale out a cluster? - cassandra

This details how to replicate data to a new cluster:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html
Can't a similar scheme be used to rapidly scale out a cluster with existing data? Say take a snapshot of all the nodes, copy them all to new nodes, set the tokens in the yaml, set the peers to point to the old instances, and then join them up?
Won't they be treated like nodes that once were part of the cluster and were rejoined?

That won't work, because snapshots are specific to the node which they are taken on. Once you add (or remove) a node, the token ranges on all nodes are recalculated, and you immediately invalidate any existing snapshots. Restoring the snapshots to another node would appear to work, but it would only serve the data which happened to match its token ranges.
Plus, it would try to serve data which matches its token ranges whether or not the snapshot you restored from had that data or not. Not a good scenario.

Related

Cassandra: How to find node with matching token for restoring to newer cluster?

I want to restore data from an existing cluster to newer cluster. I want to do so using the method, that of, copying the snapshot SSTables from old cluster to keyspaces of newer cluster, as explained in http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html.
The same document says, " ... the snapshot must be copied to the correct node with matching tokens". What does it really mean by "node with matching tokens"?
My current cluster is of 5 nodes and for each node num_tokens: 256. I am gonna create another cluster with same no of nodes and num_tokens and same schema. Do I need to follow the ring order while copying SSTables to newer cluster? How do I find matching target node for a given source node?
I tried command "nodetool ring" to check if I can use token values to match. But this command gives all the tokens for each host. How can I get the single token value (which determines the position of the node in the ring)? If I can get it, then I can find the matching nodes as well.
With vnodes its really hard to copy the sstables over correctly because its not just one assigned token that you have to reassign, but 256. To do what your asking you need to do some additional steps described http://datascale.io/cloning-cassandra-clusters-fast-way/. Basically reassign the 256 tokens of each node to a new node in other cluster so the ring is the same. The article you listed describes loading it on the same cluster which is a lot simpler because you dont have to worry about different topologies. Worth noting that even in that scenario, if a new node was added or a node was removed since the snapshot it will not work.
Safest bet will be to use sstableloader, it will walk through the sstable and distribute the data in the appropriate node. It will also open up possibility of making changes without worrying if everything is correct. Also it ensures everything is on the correct nodes so no worries about human errors. Each node in the original cluster can just run sstableloader on each sstable to the new cluster and you will parallelize the work pretty well.
I would strongly recommend you use this opportunity to decrease the number of vnodes to 32. The 256 default is excessive and absolutely horrible for rebuilds, solr indexes, spark, and most of all it ruins repairs. Especially if you use incremental repairs (default), the additional ranges will cause much more anticompactions and load. If you use sstableloader on each sstable it will just work. Increasing your streaming throughput in the cassandra.yaml will potentially speed this up a bit as well.
If by chance your using OpsCenter this backup and restore to new cluster is automated as well.

cassandra cluster, 1 table, how to plan forward

I am planning to create an application that will use just 1 cassandra table. Replication factor will be probably 2 or 3. I might start initially with 2 cassandra server and then keep adding servers as needed. But I am not sure if I need to pre-plan anything so that the table is distributed uniformly when I add more servers. Are there any best practices or things I need to be aware? I read about tokens , http://www.datastax.com/docs/1.1/initialize/token_generation , but I am not sure what I need to do.
I suppose the keys have to be distrubuted uniformly in the cluster, so:
how will that happen i.e. when I add the 2nd server and say the 1st one already has 1 million keys
do I need to pre-plan the keyspace or tables?
I can suggest two things.
First, when designing your schema, pick a good partition key (1st column in the primary key). You need to ensure a couple of things:
There are enough values such that you can distribute it to an arbitrary amount of nodes. For example, sex would be a bad partition key, because you only have two values and therefore can only distribute it to two nodes.
The distribution across different partition key values is more or less uniform. For example, country might not be best, because you will most likely have most of your rows in just a few unique countries.
Secondly, to ease deployment of new nodes later consider setting up your cluster to use virtual nodes (vnodes). If you do that you will be able to skip a few steps when expanding your cluster.
To configure virtual nodes, set num_tokens in cassandra.yaml to more than 1. This will decide how many virtual nodes your node will have. A recommended value is 256.
Later, when you add new nodes, you need to make sure add_bootstrap is true in cassandra.yaml for your new nodes. Then you configure network parameters as usual to match your cluster, and finally start your node. It should automatically bootstrap and start streaming appropriate data. After everything is settled down, you can run cleanup (nodetool clean) on your other nodes to make sure they purge redundant data that they're no longer responsible for.
For more detailed documentation, please see http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

copy data from cassandra to cassandra

I had a production cluster of 20 nodes with 3 replication and I want to copy a part of data i.e, ~600GB ( with 3 replication ) to my test environment with only 1 replication.
I know we can use sstableloader but do we need to copy all the 600GB over network to the other cluster?
Is their a way to move only one copy of data to other cluster?
What is the best way to do it?
I am assuming your are using RandomPartitioner. What you are doing depends on how many nodes are in your test environment.
In case of SimpleStrategy:
A. If you are using 20 nodes in your test environment:
Assign same token to each node in your test environment;
Use nodetool snapshot on all nodes at the same time;
Copy the data from snapshot directories from your production node with the same token to the test node with the same token;
To change the replication factor to 1, simply update the keyspace with new replication setting like here:http://wiki.apache.org/cassandra/Operations#Replication
Run cleanup on each node.
B. If you are using less number of nodes than production:
Evenly assign tokens to new nodes to get a balanced ring;
Use nodetool snapshot on all nodes at the same time;
You will have to copy all the data from all the nodes in production to each node in your test environment;
If you are using LevelCompaction, make sure you remove metadata.json from the date directory of the column family using that compaction before starting the node. This make LevelCompaction to compact and group the labels correctly on your new setup.
Same as 4 above;
Same as 5 above;
You can skip the snapshot and copy the data directories straight over if you don't care about consistency of data at a point in time in your restored version for testing.
Things to consider:
This process effects your Disk I/O dramatically. If you are doing it on a live cluster, use the snapshot to at least lock the state at a point in time and copy gradually.
In case of NetworkTopologyStrategy:
You can repeat the process above but only copy from combination of nodes that are in one rack and form 100% of data. If you absolutely care about possible missed writes to nodes on other racks that were not replication to the nodes in this rack, then you will have to copy everything from all the nodes like above.
Ideal solution:
If you are going to do this every day for testing like I do for my company, you want to make some automation around it. The best automation for backup and restore in my opinion is Netflix's Priam https://github.com/Netflix/Priam
I have production backups stored in S3. Code will bring up new machines in test, assigns the same token for one zone and I set the priam snapshot time to the range from last day's backup, then the test nodes will automatically receive the data from s3 backups.
Hope my answer helped you.

What would be the exact procedure to add new nodes to a Cassandra cluster so that the cluster remains balanced?

I've read the relevant documentation I could find, but I still have doubts.
What I read
From http://wiki.apache.org/cassandra/Operations#Moving_nodes
If you add nodes to your cluster your ring will be unbalanced and only way to get perfect balance is to compute new tokens for every node and assign them to each node manually by using nodetool move command.
and from http://www.datastax.com/docs/1.1/operations/cluster_management#adding-capacity-to-an-existing-cluster
If you need to increase capacity by a non-uniform number of nodes, you must recalculate tokens for the entire cluster, and then use nodetool move to assign the new tokens to the existing nodes. After all nodes are restarted with their new token assignments, run a nodetool cleanup to remove unused keys on all nodes
But I'm not clear on the order of these things.
Could you explain how to do it in the following scenario?
I'm using cassandra 1.1.9, so no virtual nodes are in use.
I have a cluster ring with 5 nodes, and each owns 20%
Their tokens are
0
34028236692093846346337460743176821145
68056473384187692692674921486353642291
102084710076281539039012382229530463436
136112946768375385385349842972707284582
I want to add 2 additional nodes.
What steps do I have to follow? I know I should install and configure cassandra, use the original 5 as seeds, and calculate their new tokens, but in what order should I move the data using nodetool move? Is it one at a time?
What happens with the data when I move the first one? Is it available at all times?
Should I start the two new nodes before moving the original 5 to their new tokens?
A step by step guide would be ideal.
Please note that I need to do it pre version 1.2
The new tokens should be
0
24305883351495604533098186245126300818
48611766702991209066196372490252601636
72917650054486813599294558735378902454
97223533405982418132392744980505203272
121529416757478022665490931225631504090
145835300108973627198589117470757804908
calculated using 2^127/7 * {0-7}.
What steps do I have to follow?
in what order should I move the data using nodetool move?
You should
Bootstrap in one node at 48611766702991209066196372490252601636
Bootstrap the other node at 121529416757478022665490931225631504090
Move 34028236692093846346337460743176821145 to 24305883351495604533098186245126300818
Move 68056473384187692692674921486353642291 to 72917650054486813599294558735378902454
Move 102084710076281539039012382229530463436 to 97223533405982418132392744980505203272
Move 136112946768375385385349842972707284582 to 145835300108973627198589117470757804908
(I tried to minimise the amount of data transferred - might not be optimal but is close enough to not make much difference given the inbalance of data you probably have already.)
Is it one at a time?
You should bootstrap one node and once and move one token at once. This avoids placing excess load on the cluster while streaming data.
What happens with the data when I move the first one? Is it available at all times?
Data is fully available during the move. The node participates in reads and writes for the old and new range so you can read and write during the move.
Should I start the two new nodes before moving the original 5 to their new tokens?
Always better to have more nodes in the cluster - if you moved first, you'd have some nodes with twice as much data as the others.
From Cassandra 1.2, keeping a cluster balanced when adding nodes is very easy, because of the new vnodes (multiple seeds per node) feature. Cassandra now automatically balances the cluster for you. If you upgrade from an earlier version you will have to activate the vnode feature yourself

how to efficiently manage cassandra initial token?

I'm new cassandra user. I know that there is initial token configuration and how to generate it.
The question is if I have an existen cluster with x nodes and I want to add additional node (one or more) should I reconfigure all the nodes to the new tokens (according to new generated values)?
Or is there more efficient way to manage this?
If you're looking for what the best practices are for handling such tasks, take a look at this section of the Cassandra 1.0 docs dedicated to token strategy.
Shortened version of your options, from the documentation:
Add capacity by doubling the cluster size -- [..] nodes can keep their existing token assignments, and new nodes are assigned tokens that bisect (or trisect) the existing token ranges.
Recalculate new tokens for all nodes and move nodes -- [..] you will have to recalculate tokens for the entire cluster. Existing nodes will have to have their new tokens assigned using nodetool move.
Add one node at a time and leave initial_token empty -- [..] splits the token range of the heaviest loaded node and places the new node into the ring at that position. [..] not result in a perfectly balanced ring, but it will alleviate hot spots.
link
If you were seeking a management solution Priam (from Netflix) might be worth looking at. It's open source and Apache-licensed, but requires some amount of configuration and is probably only worth investing [time] in for larger clusters.

Resources