"Lost notification" from nodetool repair - cassandra

I'm often seeing the following message when running nodetool repair :
[2015-02-10 16:19:40,042] Lost notification. You should check server log for repair status of keyspace xxx
What does it really mean (and how to prevent it if it's dangerous)?
I'm using Cassandra 2.1.2 in four-node cluster.

This message is not harmful by itself. It only means that the nodetool lost the track of the repair status. It does not affect the repair itself. It may be dangerous if you issue next repair command upon completion of the previous command, therefore resulting in multiple concurrent repairs which produces much higher load on the system. I used to have a script (don't have it any more now) that was monitoring logs for the repair cycle start/finish messages triggered by the "lost notification" message in order not to produce competing repairs.

This seems to be a known bug which already has been fixed in the latest releases.

You can always go, as suggested by the error message, to check cassandra`s system log and collect information about the repair activity.
$ cd /var/log/cassandra/
$ cat system.log | grep repair
Please note that i am testing for some purposes a cassandra 2.1.15 and yet encountered the problem.
As consideration, since it is not a major bug, not really affecting the repair process, i think it will stick around some time.

Related

Sudden load spikes in Cassandra cluster

We recently started having problems with our Cassandra cluster. Maybe someone has ideas on how to fix this. We're running Cassandra 3.11.7 on a 40 node cluster. We are using replication factor = 3 and read/write at consistency level QUORUM.
Recently, a single node experienced a sudden spike in CPU load which then last for a while. During that period, we can observe a lot of dropped and queued MUTATIONs. If we restart Cassandra on the problematic node, one or two other nodes start to suffer of the same problem. We have examined log files and access patterns and have not yet been able to find the reason.
What could be the most common reasons for such behaviour? Where should we take a closer look? Has anyone already had similar experiences?
If we restart Cassandra on the problematic node, one or two other nodes start to suffer of the same problem.
First of all, when a single node presents a problem, restarting it generally achieves nothing. If anything, you'll clear the JVM heap...which will be quickly repopulated upon startup. Seriously, don't expect restarting a node to fix anything.
Has anyone already had similar experiences?
Yes, several times. For things not Cassandra related:
Are you in a cloud environment? Run iostat and look for things like high percentages of iowait and steal. Sometimes shared resources don't play well with others. If you don't have iostat, get it (yum install -y sysstat).
Check cron for all users. We once had an issue with a file integrity checker getting installed as a part of our base image, and it did exactly what you are talking about.
What could be the most common reasons for such behaviour? Where should we take a closer look?
For Cassandra related issues, I see a few possibilities:
Repairs. Check if the node is running a repair. You can see Merkle Tree calculations with nodetool compactionstats and repair streams with nodetool netstats.
Compactions. Check nodetool compactionstats. If this is it, you can try lowering your compaction throughput so that it doesn't affect normal operations.
Garbage Collection. Check the gc.log.* files. If it's GC, it can usually be fixed by reading up on and adjusting the GC settings. If there isn't anyone on your team who is a JVM GC expert, I recommend using G1GC as it removes a lot of the guesswork.
Do note that everything I mentioned above can never be fixed with a reboot. In fact, it's likely it'll pick right back up where it left off.

Is there a way to speed up cassandra nodeltool repair ?

I've 10 nodes of Cassandra Cluster and currently installed version is 3.0.13.
How I launched : nodetool repair -j 4 -pr
Would like to know if there are some configuration options to speed up this process, I still see "Anticompaction after repair" is in progress when i check for compactionstats.
The current state of the art way of doing repairs are subrange repairs running all the time. See http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html for some explanations:
While the idea behind incremental repair is brilliant, the implementation still has flaws that can cause severe damage to a production cluster, especially when using LCS and DTCS. The improvements and fixes planned for 4.0 will need to be thoroughly tested to prove they fixed incremental repair and allow it to be safely used as a daily routine.
That beeing said (or quoted), have a look at http://cassandra-reaper.io/ - a simple and easy tool managing your repairs.

Can I avoid running repair while compaction is going on in Cassandra cluster?

I have scheduled incremental repair for everyday. But while the repair is going on, our monitoring system reports COMPACTIONEXECUTOR_PENDING tasks.
I am wondering, if I can introduce a check, to see, if compaction is not running, before I trigger repair.
I should be able to check if compaction is running by parsing output of nodetool netstats and compactionstats command output.
I will proceed with repair if both of the following checks passes:
nodetool netstats output contains Not sending any streams.
nodetool compactionstats output contains pending tasks: 0
But I want to get some expert opinion before I proceed.
Is my understanding correct?
I don't want to get into situation, in which, these checks are failing always and repair process is not getting triggered at all.
Thanks.
Compaction is occurring regularly in Cassandra. So I'm a bit scared that only triggering repair when pending_compactions=0 will result in repair not running enough. But it depends on your traffic of course, e.g. if you have few writes you won't do many compactions. You should probably add a max wait time for pending_compactions=0 so that after a specified time if the condition is not true repair will run anyway.
To answer your question. Nodetool uses JMX to fetch MBeans in Cassandra. You can see all available MBeans here: http://cassandra.apache.org/doc/latest/operating/metrics.html
You want this MBean:
org.apache.cassandra.metrics:type=Compaction name=PendingTasks
You can create your own JMX Client like this: How to connect to a java program on localhost jvm using JMX?
Or you can use jmxterm: https://github.com/jiaqi/jmxterm
My understanding is you could use it like this:
java -jar jmxterm-1.0.0-uber.jar
get -b org.apache.cassandra.metrics:type=Compaction name=PendingTasks

cassandra nodetool repair/upgrade

I have a cassandra cluster with version 2.0.9 running.
Nodetool hasn't been running since the start (as it was not requested to schedule these repairs).
Each node has around 8GB of data. That seems rather small to me.
When I try to run nodetool repair it seems to take forever (not finished after 2 days).
I don't see any progress. I've been reading threads where they tell you to check compactionstats and netstats but those indicate no traffic. However the nodetool repair command never exits. That doesn't seem normal to me. I got messages about the system keyspace being repaired and being ok. However the actual data we put in it doesn't return anything.
All nodes are up. I've checked in the system.log (CentOS 6 BTW) for errors and there aren't any. I've started a command that checks if the number of commands and responses are still going up (which is the case) however I wonder if this might be from something else or if this is directly linked to the nodetool repair.
There doesn't seem to be any IO/net saturation.
So yesterday I started the repair again with a tool range-repair.py.
The last 12 hours there has been no extra output.
Last output was:
INFO 2015-11-01 20:55:46,268 repair line: 296 : [1/256] repairing range (-09214247901397780884, -09166106147119295777) in 100 steps for keyspace <all>
The main issue with this repair taking forever(or just repair being hung) is that we want to upgrade cassandra for app deployment. The procedure says do a nodetool repair first. Is this actually necessary before you start the upgrade? Maybe nodetool works more efficient (you now also have an incremental option).
Who can help me here? Thanks a lot in advance!
I'm not sure if this fully resolved the issue, however after doing a rolling restart of the whole cluster it seemed that nodetool repair was able to finish on where it didn't before. For another keyspace I got an issue that I had to start the process over and over again to get any progress. I used range_repair.py which allowed me to skip to a certain token so I could slowly go up.
In the end I used dry-run and steps option (1 step) and directed that to a file. Then I filtered the first column with sed and executed that file. If the command seems to hang you can note it down, CTRL-C it and rerun again afterwards. Generally it succeeded the second or third time I ran it.

How to speedup the bootstrap of single node

I have a single node Cassandra installation on my development machine (and very little experience with Cassandra). I always had very few data in the node and I experienced no problems. I inserted about 9,000 elements in a table today to experiment with a real world use case. When I start up the node the boot time is extremely long now. I get this in system.log
Replaying /var/lib/cassandra/commitlog/CommitLog-3-1388134836280.log
...
Log replay complete, 9274 replayed mutations
That took 13 minutes and is hardly bearable. I wonder if there is a way to store data in such a way that can be read at once without replaying the log. After all 9,000 elements are nothing and there must be a quicker way to boot. I googled for hints and searched into Cassandra's documentation but I didn't find anything. It's obvious that I'm not looking for the right things, would anybody be so kind to point me to the right documents? Thanks.
There are a few things that might help. The most obvious thing you can do is flush the commit log before you shutdown Cassandra. This is a good idea to do in production too. Before I stop a Cassandra node in production I'll run the following commands:
nodetool disablethrift
nodetool disablegossip
nodetool drain
The first two commands gracefully shut down connections to clients connected to this node and then to other nodes in the ring. The drain command flushes memtables to disk (sstables). This should minimize what needs to be replayed on startup.
There are other factors that can make startup take a long time. Cassandra opens all the SSTables on disk at startup. So the more column families and SSTables you have on disk the longer it will take before a node is able to start serving clients. There was some work done in the 1.2 release to speed this up (so if you are not on 1.2 yet you should consider upgrading). Reducing the number of SSTables would probably improve your start time.
Since you mentioned this was a development machine I'll also give you my dev environment observations. On my development machine I do a lot of creating and dropping column families and key spaces. This can cause some of the system CFs to grow significantly and eventually cause a noticeable slowdown. The easiest way to handle this is to have a script that can quickly bootstrap a new database and blow away all the old data in /var/lib/cassandra.

Resources