Cassandra unable to repair - cassandra

I have a 6 node Cassandra cluster and i've tested following scenario
i'm turn off 3 nodes, and on remaining 3 nodes i'm drop the table and re-create it, and after 3 node comes up, i'm unable to do repair, its says
[Uzbekistan#Gentoo]: nodetool repair --full
Repair command #2 failed with error Got negative replies from endpoints [ ip's of nodes that i turned off ]
and in the logs from node that i turned off
ERROR [AntiEntropyStage:1] 2020-08-21 16:13:12,497 RepairMessageVerbHandler.java:177 - Table with id 6a483210-e395-11ea-8da8-990844948c57 was dropped during prepare phase of repair
but why this case happens? and how to fix this? thanks

You are having schema disagreement between nodes of the cluster. If you run nodetool describecluster, then you will see that. For resolving it restart all the nodes and run nodetool describecluster. If no schema mismatch, then you should be able to run repair.

Related

While restarting one node other nodes are showing down in the Cassandra cluster

Whenever I am restarting my any Cassandra node in my cluster after few minutes other nodes are showing down, sometimes other nodes also hanging. We need to restart other nodes to up the services.
During restart cluster seems unstable and one after other showing stress and DN status however JVM and nodetool services are running fine but when we are describing the cluster it is showing unreachable.
We don't have much traffic and load in our environment. Can you please give me any suggestion.
Cassandra version is 3.11.2
Do you see any error/warning in your system.log after the restart of the node?

nodetool failed, check server logs - error during repair

I am using Cassandra v2.1.13. Cluster with 21 nodes and 3 DC's. This is our new cluster, migrated data using sstableloader which in online now. No repairs are done before. When I run nodetetool repair for the first time after it was online, on seed nodes, I get this error
[2019-01-04 06:36:31,897] Repair command #21 finished
error: nodetool failed, check server logs
-- StackTrace --
java.lang.RuntimeException: nodetool failed, check server logs
at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:294)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)
Above error says nodetool failed but the repair command was finished. I checked logs, there are no WARN/ERROR messages. I performed repair on 3 seed nodes on 3 DC's. got the same error. Repaired one node at a time.
What does this error mean exactly? Is this because nodetool lost connection after repair? Is repair completed successfully on these 3 nodes? Is it safe to continue repair on other nodes in the cluster with this error message?
Can someone please help me to understand my questions and troubleshoot this issue?

nodetool decommision is behaving strange

I tried removing a node from a cluster by issuing "nodetool decommission"
and have seen netstats to find out how much data is being distributed to other nodes which is all fine.
After the node has been decommissioned, I could see the status of few nodes in a cluster as 'UD' when I run nodetool status on few nodes(Not the one I decommissioned) and few nodes are showing 'UN' as status
I'm quite confused about why the status on nodes is showing such behavior, and not same on all nodes after the decommissioned the node.
Am I missing any steps before and after?
Any comments/Help is highly appreciated!
If gossip information is not the same in all nodes, then you should do a rolling restart on the cluster. That will make gossip reset in all nodes.
Was the node you removed a seed node? If it was, don't forget to remove the IP from the cassandra.yaml in all nodes.

Which are reasons restart cassandra's cluster?

I just have one reason to restart cluster below :
All the nodes have the same hardware configuration
1. When i update file cassandra.yaml
Are there other reasons ?
The thing you are asking for is Rolling Restart a cassandra cluster. There are so many reason to restart a cassandra cluster. I'm just mentioning some below-
when you update any value in cassandra.yaml. (As you mentioned above)
When your nodetool got stucked somehow. such as- you gave command nodetool repair and cancelled the command but it got stucked behind, then you won't be able to give another nodetool repair command.
When you are adding a new node to cluster and you got stream_failed due to nproc limit. That time your running cluster nodes could be down to this issue and going to hold the status.
When you don't want to use sstableloader and you need to restore your data from snapshots. That time you need to provide your snapshots to the data_directory on each node and rolling restart.
When you are about to upgrade your cassandra_version.
For example when you upgrading Cassandra version.

Temporarily change multi node to single node

I have configured cassandra 3.0.9 on 3 nodes but I have to use only 1 node for sometime. I have disconnected other 2 nodes from network also removed the entries of both the nodes from Cassandra.yaml, rackdc and topology files.
When I check node tool status it shows me both the down nodes. When I try to execute any query on cqlsh it gives me below error:
Blockquote
OperationTimedOut: errors={'127.0.0.1': 'Request timed out while waiting for schema agreement. See Session.execute_async and Cluster.max_schema_agreement_wait.'}, last_host=127.0.0.1
Blockquote
Warning: schema version mismatch detected; check the schema versions of your nodes in system.local and system.peers.
How I can resolve this?
That's not how you remove a node from a Cassandra cluster. In fact, what you're doing is quite dangerous. Typically, you'd use nodetool decommission. If your other two nodes are still intact and just offline, I suggest bringing them back online temporarily and let decommission do its thing.
I'm going to also throw this out there - it's possible you're missing a good portion of your data with the steps you did above unless all keyspaces had RF=3. Cassandra distributes data evenly between the nodes in a respective DC. The decommission step I mention above redistributes the data.
Now if you don't have the other 2 nodes to run a nodetool decommission, you may have to remove the node with nodetool removenode and in the worst case, nodetool assassinate.
Check these docs for reference and the full steps to removing a node: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddingRemovingNodeTOC.html

Resources