How to clean Cassandra cluster - cassandra

I had setup a 50 node Apache Cassandra cluster
I took one node and wanted to install DSE on it and make is a single node DSE cluster
I have removed /var/lib/cassandra and /var/log/cassandra
I have truncated systems.peers table on the single node
When I start dse cassandra on this node, I still see the remaining nodes doing handshake and being added to this cluster.
What is the best way to complete remove any traces of existing Cassandra cluster from this node?

You need to change the cluster_name directive in cassandra.yaml to a different name to the rest of the cluster.

Related

Migrate Datastax Enterprise Cassandra to Apache Cassandra

We have currently using DSE 4.8 and 5.12. we want to migrate to apache cassandra .since we don't use spark or search thought save some bucks moving to apache. can this be achieved without down time. i see sstableloader works other way. can any one share me the steps to follow to migrate from dse to apache cassandra. something like this from dse to apache.
https://support.datastax.com/hc/en-us/articles/204226209-Clarification-for-the-use-of-SSTABLELOADER
Figure out what version of Apache Cassandra is being run by DSE. Based on the DSE documentation DSE 4.8.14 is using Apache Cassandra 2.1 and DSE 5.1 is using Apache Cassandra 3.11
Simplest way to do this is to build another DC (Logical DC per Cassandra) and add it to the existing cluster.
As usual, with a "Nodetool Rebuild {from-old-DC}" on to the new DC nodes, let Cassandra take care of streaming data to the new Apache Cassandra nodes naturally.
Once data streaming is completed, based on the LoadBalancingPolicy being used by applications, switch their local_dc to DC2 (the new DC). Once the new DC starts taking traffic, shutdown nodes in old DC say DC1 one by one.
alter keyspace dse_system and dse_security not using everywhere
on non-seed nodes, cleanup cassandra data directory
turn on replace in cassandra-env.sh
start instance
monitoring streaming process using command 'nodetool netstats|grep Receiving'
change seeds node definition and rolling restart before finally migrate previous seeds nodes.

How to reuse a cassandra node after decommission?

I had a cluster with 2 nodes (node 1 and node 2).
After decommissioning node 2 I wanted to use the server as a fresh Cassandra database for other purposes, but as soon as I restart this message appears:
org.apache.cassandra.exceptions.ConfigurationException: This node was
decommissioned and will not rejoin the ring unless
cassandra.override_decommission=true has been set, or all existing
data is removed and the node is bootstrapped again
So I removed all existing data.
But I don't want the node to be bootstrapped again (neither rejoin the previous ring) but to be a fresh new and pure Cassandra database to be used.
The old node is not on the seed list.
Cassandra version: 3.9
EDIT: I think I was missunderstood, sorry for that. After the decommission I want to have:
Db1: node 1
Db2: node 2
Two diferent databases with no correlation, totally separated. That's because we want to reuse the machine where node2 is hosted again to deploy a Cassandra DB in another enviroment.
Don't use override_decommission. That flag is only used for rejoining the same cluster.
You should remove all data files on the node (Cassandra will recreate system tables on start). Most importantly you need to change the seed in cassandra.yaml. I suspect that it is still the ip of node 1, so you need to change it to node 2 (itself).
Use option
cassandra.override_decommission: true
Use that option, cassandra.override_decommission=true. Also, be aware what is the definition of cluster_name is cassandra.yaml:
The name of the cluster. This setting prevents nodes in one logical
cluster from joining another. All nodes in a cluster must have the
same value.
So, to be sure, also use another value for cluster_name option in cassandra.yaml.
Try these steps:
run in cqlsh: UPDATE system.local SET cluster_name = 'new_name'
where key='local';
nodetool flush in order to persist the data
nodetool decommission
stop node
change name in cassandra.yaml
clean node sudo rm -rf /var/lib/cassandra/* /var/log/cassandra/* but I would just move those file in some other place until you get the state that you want
start node
Please check 1, 2

DSE changing cassandra to use vnodes

We want to deploy DSE cluster of 3 nodes, where each node is Analytics running Spark.
We want to use vnodes in cassandra, because it enables much more even data distribution and easier adding of the nodes. We deploy DSE on AWS, using one of the available AMI images.
Although DSE by default deploys Cassandra cluster using single token nodes, we have to manually change cassandra.yaml file on all the nodes.
According to datastax documentation, I should:
uncomment num_tokens field (I left 256 default value)
leave initial_token field unassigned
After that, when I do nodetool status command, I see that my cluster still uses single token mode.
According to this, I should restart nodes in the cluster, so that changes take effect.
But after nodes are restarted both thru OPS center or AWS console, I get errors, nodes are in unresponsive state, and I cannot use nodetool command on my nodes, with error:
Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused'.
Is there something that I am doing wrong?
How to enable vnodes on DSE when deployed using AMI image?
Thank you

How can the seemingly odd behavior in Cassandra cluster be explained?

I created an Apache Cassandra 2.1.2 cluster of 50 nodes. I named the cluster as "Test Cluster", the default. Then for some testing, I separated one node out of the 50 node cluster. I shut down Cassandra, deleted data dirs, flushed nodetool. Then I edited the single node cluster and called it as "Single Node Test Cluster" I edited seeds, cluster_name and listen_address fields appropriately. I also setup JMX correctly. Now here is what happens.
1. When I run nodetool status on the single node, I see only one node as up and running. If I run nodetool describecluster, I see the new cluster name - "Single Node Test Cluster"
2. When I run nodetool commands on one of the 49 nodes, I see total 50 nodes with the single node as down and I see cluster name as "Test Cluster"
3. There are datastax-agents installed on each node and I had also setup OpsCenter to monitor the cluster. In OpsCenter, I still see 50 nodes as up and cluster name as "Test Cluster"
So my question is why I am seeing these 3 different depictions of same topology and is this expected?
Another issue is, when I start Cassandra on the single node, I still see that it somehow tries to communicate with other nodes and I keep getting cluster name mismatch (Test Cluster != Single Node Test Cluster) WARN on the console before the single node cluster starts.
Is this as expected or is this is bug in Cassandra?
Yes if you remove a node from your cluster you need to inform the restore of the cluster that it is gone.
You do that by decommissioning the node when its still in the cluster or by saying nodetool remove node from another node when the node is gone. I.E. you no longer have access to the box.
If you do neither of the above, you'll still see the node in the other's system.peers table.

Cannot change the number of tokens from 1 to 256

I am using Cassandra 2.0 and cluster has been setup with 3 nodes. Nodetool status and ring showing all the three nodes. I have specified tokens for all the nodes.
I followed the below steps to change the configuration in one node:
1) sudo service cassandra stop
2) updated cassandra.yaml (to update thrift_framed_transport_size_in_mb)
3) sudo srevice cassandra start
The specific not started successfully and system.log shows below exception:
org.apache.cassandra.exceptions.ConfigurationException: Cannot change
the number of tokens from 1 to 256
What is best mechanism to restart the node without losing the existing data in the node or cluster ?
Switching from Non-Vnodes to Vnodes has been a slightly tricky proposition for C* and the mechanism for previously performing this switch (shuffle) is slightly notorious for instability.
The easiest way forward is to start fresh nodes (in a new datacenter) with vnodes enabled and to transfer data to those nodes via repair.
I was also getting this error while I was trying to change the number of tokens from 1 to 256. To solve this I tried the following:
Scenario:
I have 4 node DSE (4.6.1) cassandra cluster. Let say their FQDNs are: d0.cass.org, d1.cass.org, d2.cass.org, d3.cass.org. Here, the nodes d0.cass.org and d1.cass.org are the seed providers. My aim is to enable nodes by changing the num_token attribute in the cassandra.yaml file.
Procedure to be followed for each node (one at a time):
Run nodetool decommission on one node: nodetool decommission
Kil the cassandra process on the decommissioned node. Find the process id for dse cassandra using ps ax | grep dse and kill <pid>
Once the decommissioning of the node is successful, go to one of the remaining nodes and check the status of the cassandra cluster using nodetool status. The decommissioned node should not appear in the list.
Go to one of the active seed_providers and type nodetool rebuild
On the decommissioned node, open the cassandra.yaml file and uncomment the num_tokens: 256. Save and close the file. If this node was originally seed provider, make sure that it's ip-address is removed from the seeds: lists from cassandra.yaml file. If this is not done, the stale information about the cluster topology it has will hinder with the new topology which is being provided by the new seed node. On successful start, it can be added again in the seed list.
Restart the remaining cluster either using the corresponding option in opscenter or manually stopping cassandra on each node and starting it again.
Finally, start cassandra on it using dse cassandra command.
This should work.

Resources