Is there a way to speed up cassandra nodeltool repair ? - cassandra

I've 10 nodes of Cassandra Cluster and currently installed version is 3.0.13.
How I launched : nodetool repair -j 4 -pr
Would like to know if there are some configuration options to speed up this process, I still see "Anticompaction after repair" is in progress when i check for compactionstats.

The current state of the art way of doing repairs are subrange repairs running all the time. See http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html for some explanations:
While the idea behind incremental repair is brilliant, the implementation still has flaws that can cause severe damage to a production cluster, especially when using LCS and DTCS. The improvements and fixes planned for 4.0 will need to be thoroughly tested to prove they fixed incremental repair and allow it to be safely used as a daily routine.
That beeing said (or quoted), have a look at http://cassandra-reaper.io/ - a simple and easy tool managing your repairs.

Related

When Apache Cassandra scheduled repair becomes necessary operational practice?

Like eventual consistency, scheduled repair seems eventually useful from nodes drifting away too much from other.
Trying to understand why and when "Scheduled Repair" becomes mandatory. We are relatively new to operating Cassandra and progressively adopting it. Despite there are no scheduled repairs configured, few services are working quite well for months.
Hence, Have few questions about repair?
What is the statistical evidence that developer reliably look at, so
he/she can understand the immediate or eventual benefit of repair
processes?
Is there any indicator (from log or metrics) that warns
ahead of time about need of repair?
If we build read-heavy (very
rare transaction) reference data system, do we still need to repair
regularly?
Mistakenly if material-views used in application, should
we abstain repair till we re-write applications without material-views?
The answer is simple -- repair is part of the normal operation of Cassandra.
There are no metrics/statistics/indicators that determine when to run repairs. You just have to run repairs once every gc_grace_seconds. It's as simple as that.
By default, GC grace is 10 days so for simplicity you should run repairs at least once a week if you're not using automated tools like Reaper -- the free, open-source tool for automated Cassandra repairs. Cheers!

Sudden load spikes in Cassandra cluster

We recently started having problems with our Cassandra cluster. Maybe someone has ideas on how to fix this. We're running Cassandra 3.11.7 on a 40 node cluster. We are using replication factor = 3 and read/write at consistency level QUORUM.
Recently, a single node experienced a sudden spike in CPU load which then last for a while. During that period, we can observe a lot of dropped and queued MUTATIONs. If we restart Cassandra on the problematic node, one or two other nodes start to suffer of the same problem. We have examined log files and access patterns and have not yet been able to find the reason.
What could be the most common reasons for such behaviour? Where should we take a closer look? Has anyone already had similar experiences?
If we restart Cassandra on the problematic node, one or two other nodes start to suffer of the same problem.
First of all, when a single node presents a problem, restarting it generally achieves nothing. If anything, you'll clear the JVM heap...which will be quickly repopulated upon startup. Seriously, don't expect restarting a node to fix anything.
Has anyone already had similar experiences?
Yes, several times. For things not Cassandra related:
Are you in a cloud environment? Run iostat and look for things like high percentages of iowait and steal. Sometimes shared resources don't play well with others. If you don't have iostat, get it (yum install -y sysstat).
Check cron for all users. We once had an issue with a file integrity checker getting installed as a part of our base image, and it did exactly what you are talking about.
What could be the most common reasons for such behaviour? Where should we take a closer look?
For Cassandra related issues, I see a few possibilities:
Repairs. Check if the node is running a repair. You can see Merkle Tree calculations with nodetool compactionstats and repair streams with nodetool netstats.
Compactions. Check nodetool compactionstats. If this is it, you can try lowering your compaction throughput so that it doesn't affect normal operations.
Garbage Collection. Check the gc.log.* files. If it's GC, it can usually be fixed by reading up on and adjusting the GC settings. If there isn't anyone on your team who is a JVM GC expert, I recommend using G1GC as it removes a lot of the guesswork.
Do note that everything I mentioned above can never be fixed with a reboot. In fact, it's likely it'll pick right back up where it left off.

Cassandra repairs on TWCS

We have a 13 nodes Cassandra cluster (version 3.10) with RP 2 and read/write consistency of 1.
This means that the cluster isn't fully consistent, but eventually consistent. We chose this setup to speed up the performance, and we can tolerate a few seconds of inconsistency.
The tables are set with TWCS with read-repair disabled, and we don't run full repairs on them
However, we've discovered that some entries of the data are replicated only once, and not twice, which means that when the not-updated node is queried it fails to retrieve the data.
My first question is how could this happen? Shouldn't Cassandra replicate all the data?
Now if we choose to perform repairs, it will create overlapping tombstones, therefore they won't be deleted when their time is up. I'm aware of the unchecked_tombstone_compaction property to ignore the overlap, but I feel like it's a bad approach. Any ideas?
So you've obviously made some deliberate choices regarding your client CL. You've opted to potentially sacrifice consistency for speed. You have achieved your goals, but you assumed that data would always make it to all of the other nodes in the cluster that it belongs. There are no guarantees of that, as you have found out. How could that happen? There are multiple reasons I'm sure, some of which include: networking/issues, hardware overload (I/O, CPU, etc. - which can cause dropped mutations), cassandra/dse being unavailable for whatever reasons, etc.
If none of your nodes have not been "off-line" for at least a few hours (whether it be dse or the host being unavailable), I'm guessing your nodes are dropping mutations, and I would check two things:
1) nodetool tpstats
2) Look through your cassandra logs
For DSE: cat /var/log/cassandra/system.log | grep -i mutation | grep -i drop (and debug.log as well)
I'm guessing you're probably dropping mutations, and the cassandra logs and tpstats will record this (tpstats will only show you since last cassandra/dse restart). If you are dropping mutations, you'll have to try to understand why - typically some sort of load pressure causing it.
I have scheduled 1-second vmstat output that spools to a log continuously with log rotation so I can go back and check a few things out if our nodes start "mis-behaving". It could help.
That's where I would start. Either way, your decision to use read/write CL=1 has put you in this spot. You may want to reconsider that approach.
Consistency level=1 can create a problem sometimes due to many reasons like if data is not replicating to the cluster properly due to mutations or cluster/node overload or high CPU or high I/O or network problem so in this case you can suffer data inconsistency however read repair handles this problem some times if it is enabled. you can go with manual repair to ensure consistency of the cluster but you can get some zombie data too for your case.
I think, to avoid this kind of issue you should consider CL at least Quorum for write or you should run manual repair within GC_grace_period(default is 10 days) for all the tables in the cluster.
Also, you can use incremental repair so that Cassandra run repair in background for chunk of data. For more details you can refer below link
http://cassandra.apache.org/doc/latest/operating/repair.html or https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/tools/toolsRepair.html

nodetool repair taking a long time to complete

I am currently running Cassandra 3.0.9 in a 18 node config. We loaded quite a bit of data and now are running repairs against each node. My nodetool command is scripted to look like:
nodetool repair -j 4 -local -full
Using nodetool tpstats I see the 4 threads for repair but they are repairing very slowly. I have 1000's of repairs that are going to take weeks at this rate. The system log has repair items but also "Redistributing index summaries" listed as well. Is this what is causing my slowness? Is there a faster way to do this?
Repair can take a very long time, sometime days, sometime weeks. You might improve things with the following:
Run primary partition range repair (-pr) This will repair only the primary partition range of each node, which overall, will be faster (you still need to run a repair on each node, one at a time).
Using -j is not necessarily a big winner. Sure, you will repair multiple tables at a time, but you put much more load on your cluster, which can damage your latency.
You might want to prioritize repairing the keyspaces / tables that are most critical to your application.
Make sure you keep your node density reasonable. 1 to 2TB per node.
Focus repairing in priority the nodes that went down for more than 3 hours (assuming max_hint_window_in_ms is set to it's default value)
Focus repairing in priority the tables for which you create tombstones (DELETE statements)

How do I enable incremental repair on Cassandra 2.1.13?

I want to enable incremental repairs on Cassandra 2.1.13?. This mailing list post states
Yes, it should now be safe to just run a repair with -inc -par to migrate
to incremental repairs
. However, Datastax documentation states a six step checklist is required.
Is it fine to simply do nodetool repair -inc -par? Are there any risks? The checklist worries me since a full repair can take a really long time and I risk getting timeouts if I have many small sstables due to compaction being disabled.

Resources