nodetool removenode allows dead node to rejoin when it comes back online - cassandra

I remove a dead node from my Cassandra cluster using nodetool removenode.
Running nodetool status after indicates that the deleted node no longer exists. However when the dead node comes online, nodetool status indicates that the removed node is UP.
Is there any command that prevents dead node from joining the cluster when it is back online?

Once you remove the node you should change cluster name in cassandra.yaml. This will prevent this node from rejoining.
The other methods, like changing seeds may or may not work, depending on how the node was expelled from the cluster. It may still have the cluster info in the cache and therefore will use that instead of the list of seed nodes.
But changing the cluster name will 100% prevent it from rejoining.

Related

nodetool decommision is behaving strange

I tried removing a node from a cluster by issuing "nodetool decommission"
and have seen netstats to find out how much data is being distributed to other nodes which is all fine.
After the node has been decommissioned, I could see the status of few nodes in a cluster as 'UD' when I run nodetool status on few nodes(Not the one I decommissioned) and few nodes are showing 'UN' as status
I'm quite confused about why the status on nodes is showing such behavior, and not same on all nodes after the decommissioned the node.
Am I missing any steps before and after?
Any comments/Help is highly appreciated!
If gossip information is not the same in all nodes, then you should do a rolling restart on the cluster. That will make gossip reset in all nodes.
Was the node you removed a seed node? If it was, don't forget to remove the IP from the cassandra.yaml in all nodes.

Cassandra removing node : is it possible to stop decommission and lauch removenode?

I have an issue with a cassandra node on a 10 nodes cluster.
I first launched a decommission on that node to remove it from the cluster.
The decommission is currently active but the load on this node is such that it takes an infinite time and I would like to go faster.
What I thought to do was to stop this node and launch a removenode from another one.
The DataStax documentation explains that we should use decommission or removenode depending on the up/down status of the node. But there is no information about removenode while targeted node has already leaving status.
So my question is: Is it possible to launch a removenode of a stopped node while this one has already a leaving status?
So my question is: Is it possible to launch a removenode of a stopped node while this one has already a leaving status?
I had to do this last week, so "yes" it is possible.
Just be careful, though. At the time, I was working on bringing up a new DC in a non-production environment, so I didn't care about losing the data that was on the node (or in the DC, for that matter).
What I thought to do was to stop this node and launch a removenode from another one.
You can do exactly that. Get the Host ID of the node you want to drop, and run:
$ nodetool removenode 2e143c2b-0571-4c5d-22d5-9a2668648710
And if that gets stuck, ctrlc out of it, and (on the same node) you can run:
$ nodetool removenode force
Decommissioning a node in Cassandra can only be stopped , if that node is restarted .
Its status will change from UL to UN .
This approach is tested and cassandra cluster worked well afterwards.
Following this safe approach , trigger nodetool remove for consistent data.

How can the seemingly odd behavior in Cassandra cluster be explained?

I created an Apache Cassandra 2.1.2 cluster of 50 nodes. I named the cluster as "Test Cluster", the default. Then for some testing, I separated one node out of the 50 node cluster. I shut down Cassandra, deleted data dirs, flushed nodetool. Then I edited the single node cluster and called it as "Single Node Test Cluster" I edited seeds, cluster_name and listen_address fields appropriately. I also setup JMX correctly. Now here is what happens.
1. When I run nodetool status on the single node, I see only one node as up and running. If I run nodetool describecluster, I see the new cluster name - "Single Node Test Cluster"
2. When I run nodetool commands on one of the 49 nodes, I see total 50 nodes with the single node as down and I see cluster name as "Test Cluster"
3. There are datastax-agents installed on each node and I had also setup OpsCenter to monitor the cluster. In OpsCenter, I still see 50 nodes as up and cluster name as "Test Cluster"
So my question is why I am seeing these 3 different depictions of same topology and is this expected?
Another issue is, when I start Cassandra on the single node, I still see that it somehow tries to communicate with other nodes and I keep getting cluster name mismatch (Test Cluster != Single Node Test Cluster) WARN on the console before the single node cluster starts.
Is this as expected or is this is bug in Cassandra?
Yes if you remove a node from your cluster you need to inform the restore of the cluster that it is gone.
You do that by decommissioning the node when its still in the cluster or by saying nodetool remove node from another node when the node is gone. I.E. you no longer have access to the box.
If you do neither of the above, you'll still see the node in the other's system.peers table.

Cannot change the number of tokens from 1 to 256

I am using Cassandra 2.0 and cluster has been setup with 3 nodes. Nodetool status and ring showing all the three nodes. I have specified tokens for all the nodes.
I followed the below steps to change the configuration in one node:
1) sudo service cassandra stop
2) updated cassandra.yaml (to update thrift_framed_transport_size_in_mb)
3) sudo srevice cassandra start
The specific not started successfully and system.log shows below exception:
org.apache.cassandra.exceptions.ConfigurationException: Cannot change
the number of tokens from 1 to 256
What is best mechanism to restart the node without losing the existing data in the node or cluster ?
Switching from Non-Vnodes to Vnodes has been a slightly tricky proposition for C* and the mechanism for previously performing this switch (shuffle) is slightly notorious for instability.
The easiest way forward is to start fresh nodes (in a new datacenter) with vnodes enabled and to transfer data to those nodes via repair.
I was also getting this error while I was trying to change the number of tokens from 1 to 256. To solve this I tried the following:
Scenario:
I have 4 node DSE (4.6.1) cassandra cluster. Let say their FQDNs are: d0.cass.org, d1.cass.org, d2.cass.org, d3.cass.org. Here, the nodes d0.cass.org and d1.cass.org are the seed providers. My aim is to enable nodes by changing the num_token attribute in the cassandra.yaml file.
Procedure to be followed for each node (one at a time):
Run nodetool decommission on one node: nodetool decommission
Kil the cassandra process on the decommissioned node. Find the process id for dse cassandra using ps ax | grep dse and kill <pid>
Once the decommissioning of the node is successful, go to one of the remaining nodes and check the status of the cassandra cluster using nodetool status. The decommissioned node should not appear in the list.
Go to one of the active seed_providers and type nodetool rebuild
On the decommissioned node, open the cassandra.yaml file and uncomment the num_tokens: 256. Save and close the file. If this node was originally seed provider, make sure that it's ip-address is removed from the seeds: lists from cassandra.yaml file. If this is not done, the stale information about the cluster topology it has will hinder with the new topology which is being provided by the new seed node. On successful start, it can be added again in the seed list.
Restart the remaining cluster either using the corresponding option in opscenter or manually stopping cassandra on each node and starting it again.
Finally, start cassandra on it using dse cassandra command.
This should work.

How to balance the cassandra cluster while node is DOWN

Our cluster is unbalanced and most of the data is in one node. Now the node which as most of the data is dead because of out of space.
How to bring the node UP with ready only mode and rebalance the cluster?
We are using vnodes and DSE 4.0.3
There is no explicit read only mode for Cassandra. As such you're likely to need to temporarily add some disk space to the node to get it online and then rebalance the cluster.
If that's not an option then removing snapshots can sometimes give you enough space to get going. Running nodetool cleanup can also help if it's not previously been run.
If you're using vnodes - then a common problem is if you've converted an old style token node to vnodes. The node will just grab an even range of tokens that maps to it's original range. If the other nodes in the cluster have randomly generated tokens it'll lead to a huge imbalance between them. Decommissioning the node and then re-adding it should resolve the problem.
The output of nodetool ring will show us if that's happened. Actually the chances are a decom and re-add will be the solution in any case.
Use nodetool drain to stop receiving writes on a certain node.

Resources