How to repair cassandra. I am not sure how to use the nodetool after changing snitch - cassandra

I am an intern with not much experience so I am sure you guys can help me out here a bit. I get all types of nodetool syntax from google and I am not sure how to use it and when i should use it.
My 3 node cluster was only showing 1 node on the ring. So i changed the snitch to rackInferring and restarted cassandra. Now do I have to run nodetool repair? How?

I am concerned that nodetool ring isn't seeing all of your nodes. Fix that first before running nodetool repair. Otherwise, nodetool repair has no other nodes to talk to and will not completely correctly.
To get your ring intact, make sure all of your nodes are configured to talk gossip on the same network. Make sure they all have at least one seed node which is configured the same on all nodes and is up. Make sure they all have the same cluster name. If that fails, bring them all down and then bring them back up one by one.

Related

cassandra enable hints and repair

I am adding a new node to my cassandra cluster which is currently 5 nodes. The nodes have hints turned on and I am also running repairs using cassandra reaper. When adding the node node, the node addition is taking foreever and the other nodes are becoming unresponsive. I am running cassandra 3.11.13.
questions
As I understand hints are used to make sure writes are correctly propagated to all replicas
Cassandra is designed to remain available if one of it’s nodes is down or unreachable. However, when a node is down or unreachable, it needs to eventually discover the writes it missed. Hints attempt to inform a node of missed writes, but are a best effort, and aren’t guaranteed to inform a node of 100% of the writes it missed.
repairs do something similar
Repair synchronizes the data between nodes by comparing their respective datasets for their common token ranges, and streaming the differences for any out of sync sections between the nodes.
If I am running repairs with cassandra reaper, do I need to disable hints?
If hints are enabled and repairs are carried. Does it cause double writes of data in nodes?
Is it okay to carry repair while a node is joining?

Nodetool rebuild not working reliably on Cassandra 3.11.3

I presently have a cassandra 3.11.3 cluster with a single DC. I recently added another dc to my cluster. And I followed the instructions #
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
As per the instructions I ran 'nodetool rebuild -ks -- dc1 on each of the nodes. However this rebuild command did not actually work as intended. My data is partially missing in the new nodes. This I know because I sampled the data in the new dc through my app using consistency local_one. I dont see the data replenish through read repair either. Oh and I should mention that there were no errors in the logs following the rebuild command. So everything appeared to have succeeded.
What am I missing here? Is there a known issue reported on this?
You should run nodetool rebuild --<existing DC> on each node. this command will pull all keyspace data from existing data center based on allocated tokens and RF. To ensure consistency please run full repair on nodes as well.

nodetool decommision is behaving strange

I tried removing a node from a cluster by issuing "nodetool decommission"
and have seen netstats to find out how much data is being distributed to other nodes which is all fine.
After the node has been decommissioned, I could see the status of few nodes in a cluster as 'UD' when I run nodetool status on few nodes(Not the one I decommissioned) and few nodes are showing 'UN' as status
I'm quite confused about why the status on nodes is showing such behavior, and not same on all nodes after the decommissioned the node.
Am I missing any steps before and after?
Any comments/Help is highly appreciated!
If gossip information is not the same in all nodes, then you should do a rolling restart on the cluster. That will make gossip reset in all nodes.
Was the node you removed a seed node? If it was, don't forget to remove the IP from the cassandra.yaml in all nodes.

Which are reasons restart cassandra's cluster?

I just have one reason to restart cluster below :
All the nodes have the same hardware configuration
1. When i update file cassandra.yaml
Are there other reasons ?
The thing you are asking for is Rolling Restart a cassandra cluster. There are so many reason to restart a cassandra cluster. I'm just mentioning some below-
when you update any value in cassandra.yaml. (As you mentioned above)
When your nodetool got stucked somehow. such as- you gave command nodetool repair and cancelled the command but it got stucked behind, then you won't be able to give another nodetool repair command.
When you are adding a new node to cluster and you got stream_failed due to nproc limit. That time your running cluster nodes could be down to this issue and going to hold the status.
When you don't want to use sstableloader and you need to restore your data from snapshots. That time you need to provide your snapshots to the data_directory on each node and rolling restart.
When you are about to upgrade your cassandra_version.
For example when you upgrading Cassandra version.

How to balance the cassandra cluster while node is DOWN

Our cluster is unbalanced and most of the data is in one node. Now the node which as most of the data is dead because of out of space.
How to bring the node UP with ready only mode and rebalance the cluster?
We are using vnodes and DSE 4.0.3
There is no explicit read only mode for Cassandra. As such you're likely to need to temporarily add some disk space to the node to get it online and then rebalance the cluster.
If that's not an option then removing snapshots can sometimes give you enough space to get going. Running nodetool cleanup can also help if it's not previously been run.
If you're using vnodes - then a common problem is if you've converted an old style token node to vnodes. The node will just grab an even range of tokens that maps to it's original range. If the other nodes in the cluster have randomly generated tokens it'll lead to a huge imbalance between them. Decommissioning the node and then re-adding it should resolve the problem.
The output of nodetool ring will show us if that's happened. Actually the chances are a decom and re-add will be the solution in any case.
Use nodetool drain to stop receiving writes on a certain node.

Resources