How to avoid "Bootstrap Token collision" between cassandra nodes initialized concurrently? - cassandra

I crafted an OpenStack HOT template to automatically deploy a single-datacenter, 3-node Cassandra cluster (1 seed node only) from scratch. However, I'm unable to deterministically create it without bootstrap token collisions, since all cassandra nodes are initialized at the same time: sometime all 3 nodes are shown with nodetool status, sometimes one is ignored. I may have an idea on how to circumvent this by using a shared volume between virtual machines to "sequentialize" the initialization processes, but that would be the last resort.
Any ideas? Should I try to manually allocate the initial tokens (How? dis/advantages?)?
I'm using the default configuration with auto_bootstrap=false for all nodes (Cassandra version: 4.0.6).
Thank you

Related

Cassandra cluster - Migrating all hosts in cluster

I am using Cassandra(3.5) with 20 nodes with data center-1 with 10 nodes and data center-2 with 10 nodes and has huge data. All hosts belong to say legacy hosts. Now we have newer generation hosts say generation-2.
I have tried adding new nodes and decommissioning old node. But this will be tie consuming.
Q1: How can I migrate all hosts from legacy hosts to generation-2 host? What is the best approach for that?
Q2: What will be rollback strategy?
Q3: Finally, How can I validate data once I migrate to generation-2 hosts?
If you just replacing the nodes with newer hardware, keeping the same number of nodes, then it's simple (operations should be done on every node):
prepare the new installation on every node, with configuration identical to existing nodes, but with different IP addresses but don't start the nodes
(optional) disable autocompaction with nodetool disableautocompaction - this could help to execute step 5 faster
copy data from old node to new node using rsync (this could take long time)
execute nodetool drain & stop old node
use rsync to synchronize changes happened since initial copying (it should be relatively fast)
make sure that the old node won't start again (for example, remove Cassandra package) - otherwise it could be a chaos
start the new node
This works because Cassandra node is identified by the UUID that is stored in the local table, so changing of IP doesn't affect the operations.
P.S. In future, if you'll need to replace node (not as described, but completely died), use the procedure of node replacement - in this case, you won't stream data twice, as happened when you do decomissioning and then re-adding node.

Adding new node cassandra 3.x

We are in the process of adding new nodes to our existing cassandra 3.x cluster. Basically following the steps outlined in this article (https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_node_to_cluster_t.html).
Currently, out 3.x cluster does incremental repairs. What I'm not 100 percent sure about is if we need to do anything after we do the node cleanup. Specifically, are our new added nodes setup to do incremental repairs after following the procedure listed above?
Thanks
Marshall
Adding a node involves rebalancing the token distribution among nodes in the cluster and bootstrapping the new node. Repair is a regular maintenance process and it's not necessarily needed in the adding node process.
For the token redistribution part, simplified example is if your have 20 tokens and 4 nodes initially. If random enough, each node would be the primary node for 5 tokens. When add a node and change the configuration, 20 tokens would be distributed to 5 nodes and each node would be the primary node for 4 tokens. When you run bootstrap to add the new node, the new node knows what tokens belongs to it and it will stream missing tokens from other existing nodes. After bootstrapping is done, nodetool cleanup on the existing nodes would remove tokens which don't belong to them anymore. To sum up, bootstrapping new node would redistribute tokens and stream objects to new nodes based on the distribution. you don't need repair in this process to stream data. cleanup would remove objects which the ownership is changed.
However, repair is a regular process to guarantee data consistency and incremental or full options are out of scope in terms of talking about adding node.
Good reference to read under the hood on what happens when you change the topology of a cluster

Enabling vNodes in Cassandra 1.2.8

I have a 4 node cluster and I have upgraded all the nodes from an older version to Cassandra 1.2.8. Total data present in the cluster is of size 8 GB. Now I need to enable vNodes on all the 4 nodes of cluster without any downtime. How can I do that?
As Nikhil said, you need to increase num_tokens and restart each node. This can be done one at once with no down time.
However, increasing num_tokens doesn't cause any data to redistribute so you're not really using vnodes. You have to redistribute it manually via shuffle (explained in the link Lyuben posted, which often leads to problems), by decommissioning each node and bootstrapping back (which will temporarily leave your cluster extremely unbalanced with one node owning all the data), or by duplicating your hardware temporarily just like creating a new data center. The latter is the only reliable method I know of but it does require extra hardware.
In the conf/cassandra.yaml you will need to comment out the initial_token parameter, and enable the num_tokens parameter (by default 256 I believe). Do this for each node. Then you will have to restart the cassandra service on each node. And wait for the data to get redistributed throughout the cluster. 8 GB should not take too much time (provided your nodes are all in the same cluster), and read requests will still be functional, though you might see degraded performance until the redistribution of data is complete.
EDIT: Here is a potential strategy to migrate your data:
Decommission two nodes of the cluster. The token-space should get distributed 50-50 between the other two nodes.
On the two decommissioned nodes, remove the existing data, and restart the Cassandra daemon with a different cluster name and with the num_token parameters enabled.
Migrate the 8 GB of data from the old cluster to the new cluster. You could write a quick script in python to achieve this. Since the volume of data is small enough, this should not take too much time.
Once the data is migrated in the new cluster, decommission the two old nodes from the old cluster. Remove the data and restart Cassandra on them, with the new cluster name and the num_tokens parameter. They will bootstrap and data will be streamed from the two existing nodes in the new cluster. Preferably, only bootstrap one node at a time.
With these steps, you should never face a situation where your service is completely down. You will be running with reduced capacity for some time, but again since 8GB is not a large volume of data you might be able to achieve this quickly enough.
TL;DR;
No you need to restart servers once the config has been edited
The problem is that enabling vnodes means a lot of the data is redistributed randomly (the docs say in a vein similar to the classic ‘nodetool move’

Cassandra ring configuration

I am trying to connect Apache Cassandra nodes into a ring. They are not Datastax versions, but Cassandra 1.2.8 from the Apache website. When trying to add one as the seed of the other I get following exception:
Unable to find compaction strategy class 'com.datastax.bdp.hadoop.cfs.compaction.CFSCompactionStrategy'
Before that I change the "listen_address" and "rpc_address" to local IP address of each node. Next step I add one IP as a seed to another node. The nodes start up, an exception is printed but both nodes run fine until restart. After restarting either node the exception is printed and nodes do not run.
This is very strange - I do not have any DSE components.
Did you previously use any DSE components? If you did and are using the same data directory on any of your nodes, it may find old column families that were created with this compaction strategy. If you have no data you want in the data directories on all your nodes, you should clear them by stopping all nodes, deleting the directories, then starting the nodes.
Or if you have any DSE nodes still up, they may be joining the new cluster and propagating their schema, so creating column families with this compaction strategy. You can find out by looking in the logs and seeing which nodes try to connect. If any aren't from your 1.2.8 ring then this is probably the cause.
That error means you either had a DSE Analytics node in your ring at some point, or you restored your schema from someplace that had an Analytics node.
I would check if you have the folder /etc/dse/ on your VM, that would mean DSE was installed there.
To just wipe the node and start from scratch schema wise, you can stop the node, remove the /system/schema_* folders, then start the node. When it starts it will have no schema. Re-create any keyspaces/column families you had before, and they will get read from disk.

Adding a new node to existing cluster

Is it possible to add a new node to an existing cluster in cassandra 1.2 without running nodetool cleanup on each individual node once data has been added?
It probably isn't but I need to ask because I'm trying to create an application where each user's machine is a server allowing for endless scaling.
Any advice would be appreciated.
Yes, it is possible. But you should be aware of the side-effects of not doing so.
nodetool cleanup purges keys that are no longer allocated to that node. According to the Apache docs, these keys count against the allocated data for that node, which can cause the auto bootstrap process for the next node to not properly balance the ring. So depending on how you are bringing new user machines into the ring, this may or may not be a problem.
Also keep in mind that nodetool cleanup only needs to be run on nodes that lost keyspace to the new node - i.e. adjacent nodes, not all nodes, in the cluster.

Resources