cassandra nodetool repair/upgrade - cassandra

I have a cassandra cluster with version 2.0.9 running.
Nodetool hasn't been running since the start (as it was not requested to schedule these repairs).
Each node has around 8GB of data. That seems rather small to me.
When I try to run nodetool repair it seems to take forever (not finished after 2 days).
I don't see any progress. I've been reading threads where they tell you to check compactionstats and netstats but those indicate no traffic. However the nodetool repair command never exits. That doesn't seem normal to me. I got messages about the system keyspace being repaired and being ok. However the actual data we put in it doesn't return anything.
All nodes are up. I've checked in the system.log (CentOS 6 BTW) for errors and there aren't any. I've started a command that checks if the number of commands and responses are still going up (which is the case) however I wonder if this might be from something else or if this is directly linked to the nodetool repair.
There doesn't seem to be any IO/net saturation.
So yesterday I started the repair again with a tool range-repair.py.
The last 12 hours there has been no extra output.
Last output was:
INFO 2015-11-01 20:55:46,268 repair line: 296 : [1/256] repairing range (-09214247901397780884, -09166106147119295777) in 100 steps for keyspace <all>
The main issue with this repair taking forever(or just repair being hung) is that we want to upgrade cassandra for app deployment. The procedure says do a nodetool repair first. Is this actually necessary before you start the upgrade? Maybe nodetool works more efficient (you now also have an incremental option).
Who can help me here? Thanks a lot in advance!

I'm not sure if this fully resolved the issue, however after doing a rolling restart of the whole cluster it seemed that nodetool repair was able to finish on where it didn't before. For another keyspace I got an issue that I had to start the process over and over again to get any progress. I used range_repair.py which allowed me to skip to a certain token so I could slowly go up.
In the end I used dry-run and steps option (1 step) and directed that to a file. Then I filtered the first column with sed and executed that file. If the command seems to hang you can note it down, CTRL-C it and rerun again afterwards. Generally it succeeded the second or third time I ran it.

Related

Can I avoid running repair while compaction is going on in Cassandra cluster?

I have scheduled incremental repair for everyday. But while the repair is going on, our monitoring system reports COMPACTIONEXECUTOR_PENDING tasks.
I am wondering, if I can introduce a check, to see, if compaction is not running, before I trigger repair.
I should be able to check if compaction is running by parsing output of nodetool netstats and compactionstats command output.
I will proceed with repair if both of the following checks passes:
nodetool netstats output contains Not sending any streams.
nodetool compactionstats output contains pending tasks: 0
But I want to get some expert opinion before I proceed.
Is my understanding correct?
I don't want to get into situation, in which, these checks are failing always and repair process is not getting triggered at all.
Thanks.
Compaction is occurring regularly in Cassandra. So I'm a bit scared that only triggering repair when pending_compactions=0 will result in repair not running enough. But it depends on your traffic of course, e.g. if you have few writes you won't do many compactions. You should probably add a max wait time for pending_compactions=0 so that after a specified time if the condition is not true repair will run anyway.
To answer your question. Nodetool uses JMX to fetch MBeans in Cassandra. You can see all available MBeans here: http://cassandra.apache.org/doc/latest/operating/metrics.html
You want this MBean:
org.apache.cassandra.metrics:type=Compaction name=PendingTasks
You can create your own JMX Client like this: How to connect to a java program on localhost jvm using JMX?
Or you can use jmxterm: https://github.com/jiaqi/jmxterm
My understanding is you could use it like this:
java -jar jmxterm-1.0.0-uber.jar
get -b org.apache.cassandra.metrics:type=Compaction name=PendingTasks

Could not replace dead cassandra node due to Host ID collision issue

We have installed cassandra 3.9 in 6 EC2 nodes. One of the node died due to some reasons and showed DN status in nodetool status. So i am trying to replace the node based on the instructions provided here.
http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node
in short using the -Dcassandra.replace_address and -Dcassandra.replace_address_first_boot when starting the cassandra. However, this does not seems to be working.
I am receiving the error
java.lang.RuntimeException: Host ID collision between active endpoint
I tried to remove the node using nodetool remove as well, and tried again. But, whatever i tried seems to be in vain.
The machine is not the seed node. I want to to start it directly without using replace, but would definitely want to know the reason why replace isn't working.
I agree, you definitely need to remove the problem node.
I tried to remove the node using nodetool remove as well, and tried again. But, whatever i tried seems to be in vain.
Sometimes nodetool removenode appears to do nothing or even hangs. At this point, I'd suggest the following:
Execute your removenode command.
After a minute or two, hit ctrl+c to return to your command prompt.
Verify that removenode is actually doing something:
$ nodetool removenode status
Expedite the removal process with the "force" command:
$ nodetool removenode force
Once that node is gone, try replacing it again. Note, if the IPs are the same, you may still need to use the replace_address line in the cassandra-env.sh.

Does nodetool cleanup affect Apache Spark rdd.count() of a Cassandra table?

I've been tracking the growth of some big Cassandra tables using Spark rdd.count(). Up 'till now the expected behavior was consistent, the number of rows is constantly growing.
Today I ran nodetool cleanup on one of the seeds and as usual it ran for a 50+ minutes.
And now rdd.count() returns one third of the rows it did before....
Did I destroy data using nodetool cleanup? Or is the Spark count unreliable and was counting ghost keys? I got no errors during cleanup and lots don't show anything out of the usual. It did seem like a successful operation, until now.
Update 2016-11-13
Turns out the Cassandra documentation set me up for the loss of 25+ million rows of data.
The documentation is explicit:
Use nodetool status to verify that the node is fully bootstrapped and
all other nodes are up (UN) and not in any other state. After all new
nodes are running, run nodetool cleanup on each of the previously
existing nodes to remove the keys that no longer belong to those
nodes. Wait for cleanup to complete on one node before running
nodetool cleanup on the next node.
Cleanup can be safely postponed for low-usage hours.
Well you check the status of the other nodes via nodetool status and they are all UP and Normal (UN), BUT here's the catch, you also need to run the command is nodetool describecluster where you might find that the schemas were not synced.
My schemas were not synced and I ran cleanup, when all nodes were UN, up and running normally as per the documentation. The Cassandra documentation does not mention nodetool describecluster after adding new nodes.
So I merrily added nodes, waited till they were UN (Up / Normal) and ran cleanup.
As a result, 25+ million rows of data are gone. I hope this helps others avoid this dangerous pitfall. Basically the Datastax documentation sets you up to destroy data by recommending cleanup as a step of the process of adding new nodes.
In my opinion, that cleanup step should be taken out of the new node procedure documentation altogether. It should be mentioned, elsewhere, that cleanup is a good practice but not in the same section as adding new nodes...it's like recommending rm -rf / as one of the steps for virus removal. Sure will remove the virus...
Thank you Aravind R. Yarram for your reply, I came to the same conclusion as your reply and came here to update this. Appreciate your feedback.
I am guessing you might have either added/removed nodes from the cluster or decreased replication factor before running nodetool cleanup. Until you run the cleanup, I guess Cassandra still reports the old key ranges as part of the rdd.count() as old data still exists on those nodes.
Reference:
https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCleanup.html

"Lost notification" from nodetool repair

I'm often seeing the following message when running nodetool repair :
[2015-02-10 16:19:40,042] Lost notification. You should check server log for repair status of keyspace xxx
What does it really mean (and how to prevent it if it's dangerous)?
I'm using Cassandra 2.1.2 in four-node cluster.
This message is not harmful by itself. It only means that the nodetool lost the track of the repair status. It does not affect the repair itself. It may be dangerous if you issue next repair command upon completion of the previous command, therefore resulting in multiple concurrent repairs which produces much higher load on the system. I used to have a script (don't have it any more now) that was monitoring logs for the repair cycle start/finish messages triggered by the "lost notification" message in order not to produce competing repairs.
This seems to be a known bug which already has been fixed in the latest releases.
You can always go, as suggested by the error message, to check cassandra`s system log and collect information about the repair activity.
$ cd /var/log/cassandra/
$ cat system.log | grep repair
Please note that i am testing for some purposes a cassandra 2.1.15 and yet encountered the problem.
As consideration, since it is not a major bug, not really affecting the repair process, i think it will stick around some time.

Cassandra: nodetool repair not working

Cassandra service on one of my nodes went down and we couldnt restart it because of some corruption in one of the tables. So we tried rebuilding it by deleting all the data files and then starting the service, once it shows up in the ring we ran nodetool repair multiple times but it got hung throwing the same error
Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/var/lib/cassandra/data/profile/AttributeKey/profile-AttributeKey-ib-1848-Data.db): corruption detected, chunk at 1177104 of length 11576.
This occurs after 6gb of data is recovered. Also my replication factor is 3 so the same data is fine on the other 2 nodes.
I am a little new to Cassandra and am not sure what I am missing, has anybody seen this issue with repair? I have also tried scrubbing but it failed because of the corruption.
Please help.
rm /var/lib/cassandra/data/profile/AttributeKey/profile-AttributeKey-ib-1848-* and restart.
Scrub should not fail, please open a ticket to fix that at https://issues.apache.org/jira/browse/CASSANDRA.
first use the nodetool scrub if it does not fix
then shut down the node and run sstablescrub [yourkeyspace] [table] you will be able to remove the corrupted tables which were not done at nodetool scrub utility and run a repair you will be able to figure out the issue.

Resources