nodetool decommision is behaving strange - cassandra

I tried removing a node from a cluster by issuing "nodetool decommission"
and have seen netstats to find out how much data is being distributed to other nodes which is all fine.
After the node has been decommissioned, I could see the status of few nodes in a cluster as 'UD' when I run nodetool status on few nodes(Not the one I decommissioned) and few nodes are showing 'UN' as status
I'm quite confused about why the status on nodes is showing such behavior, and not same on all nodes after the decommissioned the node.
Am I missing any steps before and after?
Any comments/Help is highly appreciated!

If gossip information is not the same in all nodes, then you should do a rolling restart on the cluster. That will make gossip reset in all nodes.
Was the node you removed a seed node? If it was, don't forget to remove the IP from the cassandra.yaml in all nodes.

Related

Cassandra 2.2.8: routing traffic from a node using GossipingPropertyFileSnitch

using Cassandra 2.2.8 , gossipingpropertyfilesnitch
I'm repairing a node and compacting large number of sstables - i'm thinking to alleviate load on the cpu/node and want to routing incoming web traffic to other nodes in the cluster.
may you guys please share how i can route internet traffic to other nodes in the cluster so to let the node keep using cpu on the major maintenance work?
thanks in advance
Providing you have a replication factor and consistency level that can handle a node being down, you can remove the node from the cluster during the compactions
nodetool disablebinary
nodetool disablethrift
This would prevent your client application from sending requests and it acting as coordinator but it will still recieve the mutations from writes so it wont get behind. If you want to reduce load further you can completely remove it with
nodetool disablebinary
nodetool disablethrift
nodetool disablegossip
But make sure you enable gossip again before your max_hint_window_in_ms which is defined in the cassandra.yaml (default 3 hours). If you dont the hints for that node will expire and not be delivered, leading to a consistency issue that will not be resolved without a repair.
Once you reconnect wait for the pending hints and active hints are down to 0 before disabling gossip again. Note: pending will always be +1 since it has a regular scheduled task, so 1 not zero.
Can check the hint thread pool with OpsCenter, nodetool tpstats or via JMX with org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintedHandoff,name=PendingTasks and org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintedHandoff,name=ActiveTasks

Temporarily change multi node to single node

I have configured cassandra 3.0.9 on 3 nodes but I have to use only 1 node for sometime. I have disconnected other 2 nodes from network also removed the entries of both the nodes from Cassandra.yaml, rackdc and topology files.
When I check node tool status it shows me both the down nodes. When I try to execute any query on cqlsh it gives me below error:
Blockquote
OperationTimedOut: errors={'127.0.0.1': 'Request timed out while waiting for schema agreement. See Session.execute_async and Cluster.max_schema_agreement_wait.'}, last_host=127.0.0.1
Blockquote
Warning: schema version mismatch detected; check the schema versions of your nodes in system.local and system.peers.
How I can resolve this?
That's not how you remove a node from a Cassandra cluster. In fact, what you're doing is quite dangerous. Typically, you'd use nodetool decommission. If your other two nodes are still intact and just offline, I suggest bringing them back online temporarily and let decommission do its thing.
I'm going to also throw this out there - it's possible you're missing a good portion of your data with the steps you did above unless all keyspaces had RF=3. Cassandra distributes data evenly between the nodes in a respective DC. The decommission step I mention above redistributes the data.
Now if you don't have the other 2 nodes to run a nodetool decommission, you may have to remove the node with nodetool removenode and in the worst case, nodetool assassinate.
Check these docs for reference and the full steps to removing a node: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddingRemovingNodeTOC.html

nodetool removenode allows dead node to rejoin when it comes back online

I remove a dead node from my Cassandra cluster using nodetool removenode.
Running nodetool status after indicates that the deleted node no longer exists. However when the dead node comes online, nodetool status indicates that the removed node is UP.
Is there any command that prevents dead node from joining the cluster when it is back online?
Once you remove the node you should change cluster name in cassandra.yaml. This will prevent this node from rejoining.
The other methods, like changing seeds may or may not work, depending on how the node was expelled from the cluster. It may still have the cluster info in the cache and therefore will use that instead of the list of seed nodes.
But changing the cluster name will 100% prevent it from rejoining.

Cannot change the number of tokens from 1 to 256

I am using Cassandra 2.0 and cluster has been setup with 3 nodes. Nodetool status and ring showing all the three nodes. I have specified tokens for all the nodes.
I followed the below steps to change the configuration in one node:
1) sudo service cassandra stop
2) updated cassandra.yaml (to update thrift_framed_transport_size_in_mb)
3) sudo srevice cassandra start
The specific not started successfully and system.log shows below exception:
org.apache.cassandra.exceptions.ConfigurationException: Cannot change
the number of tokens from 1 to 256
What is best mechanism to restart the node without losing the existing data in the node or cluster ?
Switching from Non-Vnodes to Vnodes has been a slightly tricky proposition for C* and the mechanism for previously performing this switch (shuffle) is slightly notorious for instability.
The easiest way forward is to start fresh nodes (in a new datacenter) with vnodes enabled and to transfer data to those nodes via repair.
I was also getting this error while I was trying to change the number of tokens from 1 to 256. To solve this I tried the following:
Scenario:
I have 4 node DSE (4.6.1) cassandra cluster. Let say their FQDNs are: d0.cass.org, d1.cass.org, d2.cass.org, d3.cass.org. Here, the nodes d0.cass.org and d1.cass.org are the seed providers. My aim is to enable nodes by changing the num_token attribute in the cassandra.yaml file.
Procedure to be followed for each node (one at a time):
Run nodetool decommission on one node: nodetool decommission
Kil the cassandra process on the decommissioned node. Find the process id for dse cassandra using ps ax | grep dse and kill <pid>
Once the decommissioning of the node is successful, go to one of the remaining nodes and check the status of the cassandra cluster using nodetool status. The decommissioned node should not appear in the list.
Go to one of the active seed_providers and type nodetool rebuild
On the decommissioned node, open the cassandra.yaml file and uncomment the num_tokens: 256. Save and close the file. If this node was originally seed provider, make sure that it's ip-address is removed from the seeds: lists from cassandra.yaml file. If this is not done, the stale information about the cluster topology it has will hinder with the new topology which is being provided by the new seed node. On successful start, it can be added again in the seed list.
Restart the remaining cluster either using the corresponding option in opscenter or manually stopping cassandra on each node and starting it again.
Finally, start cassandra on it using dse cassandra command.
This should work.

How to repair cassandra. I am not sure how to use the nodetool after changing snitch

I am an intern with not much experience so I am sure you guys can help me out here a bit. I get all types of nodetool syntax from google and I am not sure how to use it and when i should use it.
My 3 node cluster was only showing 1 node on the ring. So i changed the snitch to rackInferring and restarted cassandra. Now do I have to run nodetool repair? How?
I am concerned that nodetool ring isn't seeing all of your nodes. Fix that first before running nodetool repair. Otherwise, nodetool repair has no other nodes to talk to and will not completely correctly.
To get your ring intact, make sure all of your nodes are configured to talk gossip on the same network. Make sure they all have at least one seed node which is configured the same on all nodes and is up. Make sure they all have the same cluster name. If that fails, bring them all down and then bring them back up one by one.

Resources