Could not replace dead cassandra node due to Host ID collision issue - cassandra

We have installed cassandra 3.9 in 6 EC2 nodes. One of the node died due to some reasons and showed DN status in nodetool status. So i am trying to replace the node based on the instructions provided here.
http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node
in short using the -Dcassandra.replace_address and -Dcassandra.replace_address_first_boot when starting the cassandra. However, this does not seems to be working.
I am receiving the error
java.lang.RuntimeException: Host ID collision between active endpoint
I tried to remove the node using nodetool remove as well, and tried again. But, whatever i tried seems to be in vain.
The machine is not the seed node. I want to to start it directly without using replace, but would definitely want to know the reason why replace isn't working.

I agree, you definitely need to remove the problem node.
I tried to remove the node using nodetool remove as well, and tried again. But, whatever i tried seems to be in vain.
Sometimes nodetool removenode appears to do nothing or even hangs. At this point, I'd suggest the following:
Execute your removenode command.
After a minute or two, hit ctrl+c to return to your command prompt.
Verify that removenode is actually doing something:
$ nodetool removenode status
Expedite the removal process with the "force" command:
$ nodetool removenode force
Once that node is gone, try replacing it again. Note, if the IPs are the same, you may still need to use the replace_address line in the cassandra-env.sh.

Related

How do I bring back a Cassandra 2.0 node that's been down for a long time

We have a Cassandra 2.0.17 cluster with 3 DCs, where each DC has 8 nodes and RF of 3. We have not been running regular repairs on it.
One node has been down for 2 months due to hardware issue with one of the drives.
We finally got a new drive to replace the faulty one, and are trying to figure out the best way to bring the node back into the cluster.
We initially thought to just run nodetool repair but from my research so far it seems like that would only be good if the node was down for less than gc_grace_seconds which is 10 days.
Seems like that would mean removing the node and then adding it back in as a new node.
Someone mentioned somewhere that rather than completely removing the node and then bootstrapping it back in, I could potentially use the same procedure used for replacing a node, using the replace_address flag (or replace_address_first_boot if available), to replace the node with itself. But I couldn't find any real documentation or case studies of doing this.
It seems like this is not a typical situation - normally, either a node goes down for a short period of time and you can just run repair on it, or it needs to be replaced altogether. But it's hard to find much prior art on our exact use case.
What would be the best options for bringing this node back into service in a safe way, ideally with the least amount of impact to the rest of the cluster?
Is repair really not a good option here?
Also, whatever the answer is, how would I monitor the process and ensure that it's successful?
So here's what I would do:
If you haven't already, run a removenode on the "dead" node's host ID.
Fire-up the old node, making sure that it is not a seed node and that auto_bootstrap is either true or not specified. It defaults to true unless explicitly set otherwise.
It should join right back in, and re-stream its data.
You can monitor it's progress by running nodetool netstats | grep Already, which returns a status by each node streaming, specifying completion progress in terms of # of files streamed vs. total files.
The advantage of doing it this way, is that the node will not attempt to serve requests until bootstrapping is completed.
If you run into trouble, feel free to comment here or ask for help in the cassandra-admins channel on DataStax's Discord server.
You have mentioned already that you are aware that node has to be removed if it is down for more than gc_grace_seconds
What would be the best options for bringing this node back into service in a safe way, ideally with the least amount of impact to the rest of the cluster? Is repair really not a good option here?
So the answer is that only. You cannot safely bring that node back if it is down more than gc_grace_seconds. It needs to be removed to prevent possible deleted data from appearing back.
https://stackoverflow.com/a/69098765/429476
From https://community.datastax.com/questions/3987/one-of-my-nodes-powered-off.html
Erick Ramirez answered • May 12 2020 at 1:19 PM | Erick Ramirez edited • Dec 03 2021 at 4:49 AM BEST ANSWERACCEPTED ANSWER
#cache_drive If the node has been down for less than the smallest gc_grace_seconds, it should be as simple as starting Cassandra on the node then running a repair on it.
If the node has been down longer than the smallest GC grace, you will need to wipe the node clean including deleting all the contents of data/, commitlog/ and saved_caches/. Then replace the node "with itself" by adding the replace_address flag and specifying its own IP. For details, see Replacing a dead node. Cheers!

Does nodetool cleanup affect Apache Spark rdd.count() of a Cassandra table?

I've been tracking the growth of some big Cassandra tables using Spark rdd.count(). Up 'till now the expected behavior was consistent, the number of rows is constantly growing.
Today I ran nodetool cleanup on one of the seeds and as usual it ran for a 50+ minutes.
And now rdd.count() returns one third of the rows it did before....
Did I destroy data using nodetool cleanup? Or is the Spark count unreliable and was counting ghost keys? I got no errors during cleanup and lots don't show anything out of the usual. It did seem like a successful operation, until now.
Update 2016-11-13
Turns out the Cassandra documentation set me up for the loss of 25+ million rows of data.
The documentation is explicit:
Use nodetool status to verify that the node is fully bootstrapped and
all other nodes are up (UN) and not in any other state. After all new
nodes are running, run nodetool cleanup on each of the previously
existing nodes to remove the keys that no longer belong to those
nodes. Wait for cleanup to complete on one node before running
nodetool cleanup on the next node.
Cleanup can be safely postponed for low-usage hours.
Well you check the status of the other nodes via nodetool status and they are all UP and Normal (UN), BUT here's the catch, you also need to run the command is nodetool describecluster where you might find that the schemas were not synced.
My schemas were not synced and I ran cleanup, when all nodes were UN, up and running normally as per the documentation. The Cassandra documentation does not mention nodetool describecluster after adding new nodes.
So I merrily added nodes, waited till they were UN (Up / Normal) and ran cleanup.
As a result, 25+ million rows of data are gone. I hope this helps others avoid this dangerous pitfall. Basically the Datastax documentation sets you up to destroy data by recommending cleanup as a step of the process of adding new nodes.
In my opinion, that cleanup step should be taken out of the new node procedure documentation altogether. It should be mentioned, elsewhere, that cleanup is a good practice but not in the same section as adding new nodes...it's like recommending rm -rf / as one of the steps for virus removal. Sure will remove the virus...
Thank you Aravind R. Yarram for your reply, I came to the same conclusion as your reply and came here to update this. Appreciate your feedback.
I am guessing you might have either added/removed nodes from the cluster or decreased replication factor before running nodetool cleanup. Until you run the cleanup, I guess Cassandra still reports the old key ranges as part of the rdd.count() as old data still exists on those nodes.
Reference:
https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCleanup.html

cassandra nodetool repair/upgrade

I have a cassandra cluster with version 2.0.9 running.
Nodetool hasn't been running since the start (as it was not requested to schedule these repairs).
Each node has around 8GB of data. That seems rather small to me.
When I try to run nodetool repair it seems to take forever (not finished after 2 days).
I don't see any progress. I've been reading threads where they tell you to check compactionstats and netstats but those indicate no traffic. However the nodetool repair command never exits. That doesn't seem normal to me. I got messages about the system keyspace being repaired and being ok. However the actual data we put in it doesn't return anything.
All nodes are up. I've checked in the system.log (CentOS 6 BTW) for errors and there aren't any. I've started a command that checks if the number of commands and responses are still going up (which is the case) however I wonder if this might be from something else or if this is directly linked to the nodetool repair.
There doesn't seem to be any IO/net saturation.
So yesterday I started the repair again with a tool range-repair.py.
The last 12 hours there has been no extra output.
Last output was:
INFO 2015-11-01 20:55:46,268 repair line: 296 : [1/256] repairing range (-09214247901397780884, -09166106147119295777) in 100 steps for keyspace <all>
The main issue with this repair taking forever(or just repair being hung) is that we want to upgrade cassandra for app deployment. The procedure says do a nodetool repair first. Is this actually necessary before you start the upgrade? Maybe nodetool works more efficient (you now also have an incremental option).
Who can help me here? Thanks a lot in advance!
I'm not sure if this fully resolved the issue, however after doing a rolling restart of the whole cluster it seemed that nodetool repair was able to finish on where it didn't before. For another keyspace I got an issue that I had to start the process over and over again to get any progress. I used range_repair.py which allowed me to skip to a certain token so I could slowly go up.
In the end I used dry-run and steps option (1 step) and directed that to a file. Then I filtered the first column with sed and executed that file. If the command seems to hang you can note it down, CTRL-C it and rerun again afterwards. Generally it succeeded the second or third time I ran it.

How to restart a seed node after its process crashes?

Is there any differences between replacing a dead node and restarting a dead node, specially for seed nodes ? Actually, I'm a little bit confused about how to restart a dead seed node.
When the process of a seed node crashes, should I restart it without doing any changes to cassandra.yaml ? Or, like replacing a seed node, should I remove its IP address from the seeds list (cassandra.yml) on each node ?
The documentation is not clear about that. It only deals about how to replace a dead node by another machine.
Thanks you
If you are simply restarting a dead seed node, then you shouldn't need to alter your cassandra.yaml file before the restart. As long as you have addressed whatever caused the node to die, and your node has not been down longer than gc_grace_seconds (see note below), then restarting shouldn't be an issue.
The concerns noted in the documentation you have linked center around replacing dead seed nodes. The problem with replacing seed nodes, is that the new node will not bootstrap into the cluster if it is configured as a seed. In that case, a different node in the cluster should be promoted to be a seed node.
Note: the About Deletes section of the documentation warns about bringing a node back that has been down a long time. Specifically, longer than the value set for gc_grace_seconds (or the shortest value set, if you have changed it on any individual tables).
...if a node is down
longer than the grace period, the node can miss the delete because the
tombstone disappears after gc_grace_seconds. Cassandra always attempts
to replay missed updates when the node comes back up again. After a
failure, it is a best practice to run node repair to repair
inconsistencies across all of the replicas when bringing a node back
into the cluster. If the node doesn't come back within
gc_grace,_seconds, remove the node, wipe it, and bootstrap it again.

Cassandra: nodetool repair not working

Cassandra service on one of my nodes went down and we couldnt restart it because of some corruption in one of the tables. So we tried rebuilding it by deleting all the data files and then starting the service, once it shows up in the ring we ran nodetool repair multiple times but it got hung throwing the same error
Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/var/lib/cassandra/data/profile/AttributeKey/profile-AttributeKey-ib-1848-Data.db): corruption detected, chunk at 1177104 of length 11576.
This occurs after 6gb of data is recovered. Also my replication factor is 3 so the same data is fine on the other 2 nodes.
I am a little new to Cassandra and am not sure what I am missing, has anybody seen this issue with repair? I have also tried scrubbing but it failed because of the corruption.
Please help.
rm /var/lib/cassandra/data/profile/AttributeKey/profile-AttributeKey-ib-1848-* and restart.
Scrub should not fail, please open a ticket to fix that at https://issues.apache.org/jira/browse/CASSANDRA.
first use the nodetool scrub if it does not fix
then shut down the node and run sstablescrub [yourkeyspace] [table] you will be able to remove the corrupted tables which were not done at nodetool scrub utility and run a repair you will be able to figure out the issue.

Resources