Cassandra nodetool repair best practices - cassandra

This question applies to Cassandra 2.2
I am embarrassed to say that I still do not understand when I should be running a nodetool repair, or to be more precise on which nodes.
So far, I understand that to ensure deletes are handled correctly I should be running a repair at a frequency that is less than the GC_GRACE_SECONDS. So that's cool got that bit.
Q. If I have a cluster of 9 nodes with a replication factor of 3, what type of repair do I run? more importantly do I run the repair on every node, or just one node?
Q. If I have multiple data centers, does that change how I run repairs. Do I have to run them in each DC, or can it be coordinated from just one node in one DC?
I am hoping this is a trivial question and someone can just tell it how it is.

The nodetool repair command can be run on either a specified node or
on all nodes if a node is not specified. The node that initiates the
repair becomes the coordinator node for the operation.
If node it not specified it runs on all the nodes that is responsible for that partition range.
run nodetool repair -pr on every node in the cluster to repair all
data. Otherwise, some ranges of data will not be repaired
The nodetool repair -pr option is good for repairs across multiple datacenters.
Note: For Cassandra 2.2 and later, a recommended option for repairs across datacenters: use the -dcpar or --dc-parallel to repair
datacenters in parallel.
Nodetool Repair

This is the recommendation from datastax.
Run repair frequently enough that every node is repaired before
reaching the time specified in the gc_grace_seconds setting. Deleted
data is properly handled in the cluster if this requirement is met.

Related

How often should I run nodetool compact and repair in Cassandra?

We have 14 node cassandra cluster v 3.5.
Can someone enlighten with compact & repair ?
If I am running from one of node, does this needs to be runs from all the nodes in cluster
nodetool compact
I see it is very slow, how often this supposed to be run ?
Same question regarding nodetool repair ( All nodes or certain nodes in cluster)
nodetool repair or
nodetool repair -pr
how often this supposed to be run ?
Compactions are part of the normal operation of Cassandra nodes. They run automatically in the background (otherwise known as minor compactions) and get triggered by each table's defined compaction strategy based on any combination of configured thresholds and compaction sub-properties. This video extract from the DS201 Cassandra Foundations course at the DataStax Academy talks about compactions in more detail.
It is not necessary for an operator/administrator to manually kick off compactions with nodetool compact. In fact, it is not recommended to trigger user-defined compactions (otherwise known as major compactions) because they can create problems down the track like the one I explained in this post -- https://community.datastax.com/questions/6396/.
Repairs on the other hand is something that needs to be managed by a cluster administrator. Since Cassandra has a distributed architecture, it is necessary to run repairs to keep the copies of the data consistent between replicas (Cassandra nodes).
Repairs need to be run at least once every gc_grace_seconds (configured per table). By default, GC grace is 10 days (864000 seconds) so most DB admins run a repair on each node once a week. This short video from the DS210 Cassandra Operations course provides a good overview of Cassandra repairs.
Running a partitioner range repair (with -pr flag) on a node repairs only the data that a node owns so it is necessary to run nodetool repair -pr on each node, one node at a time, until all nodes in the cluster have been repaired. This blog post by Jeremiah Jordan is a good explanation of why this is necessary.
If you're interested, datastax.com/dev has free resources for learning Cassandra. The Cassandra Fundamentals series in particular is a good place to start. It is a collection of short online tutorials where you can quickly learn the basic concepts for free. Cheers!
If I am running from one of node, does this needs to be runs from all the nodes in cluster nodetool compact I see it is very slow, how often this supposed to be run ?
You should not run nodetool compact command generally. Compactions are by default meant to run automatically behind the scene if not disabled. Running compaction manually may create more problems and should be avoided for most of the cases. Auto compactions which run behind the scene should be able to handle your compactions. If you feel your compactions are slow you can tune your compactions by looking after the parameters related to compaction here (Mostly concurrent_compactors and compaction_throughput_mb_per_sec)
Same question regarding nodetool repair ( All nodes or certain nodes
in cluster) nodetool repair or nodetool repair -pr how often this
supposed to be run ?
Repair is a maintenance task which should be run on all then nodes once before each gc_grace_seconds period. For example default gc_grace_seconds is equal to 10 days so it is required to run repair on all the nodes once in this 10 day period. You should schedule your repair to run regularly once in gc_grace_seconds period. Regarding which option to use for running repair. If you are doing it by yourself you should run nodetool repair -pr on all the nodes one by one.

Nodetool repair -pr -full vs Nodetool repair -pr

I am running a cluster with 1 datacenter (6 nodes) and Cassandra 3.11.0 installed on each node with replication factor 2. I know nodetool repair -pr
will carry out a repair of the primary ranges on that node. My question, is how nodetool repair -pr -full is different from nodetool repair -pr?
Which repair option should I use on a heavy load production system?
my question is how nodetool repair -pr -full is different from nodetool repair -pr?
So a "full" repair means that all data between the source and target nodes will be verified and repaired. Essentially, it's the opposite of an "incremental" repair, where only a subset of data is repaired. These two options control how much data is repaired.
Once that is decided (incremental vs. full), -pr will run on that subset, and only then repairs the primary replicas.
Additionally, -full is the default for Cassandra 3; which would make -pr and -pr -full essentially the same.
Which repair option should I use on a heavy load production system?
I'll second what Alex said, and also recommend Cassandra Reaper for this. It has the ability to schedule repairs for slower times, as well as allowing you to pause repairs which don't complete in time.
For production systems, as part of it's better to use token range repair (using -st/-et options) to limit the load onto the nodes. Doing it manually could be tedious, but it could be automated with tools like Reaper, that track what token ranges are already repaired, and what not.
Its recommended that do not execute incremental repair with -PR option.
it will skip non-primary replicas unrepaired and is not a good practice in long run !!

Cassandra repair taking forever and increasing disk usage

My team is using Apache Cassandra 3.0, not DSE, for our 10 node cluster. We have one DC and all nodes are 1 TB each.
Right now all the nodes are around 300 GB occupied, The RF is 2. We have not run anti-entropy (manual) repair in a long time. The problem I am facing now is that I started repair on one of the nodes and it is taking forever. Is that normal? Also, the repair failed once and I am noticing increase in the disk space for that node, it is ~400GB now. how can I fix this behavior?
incremental repairs will not work in this scenario (default repairs). They have been meant to run from beginning so it never covers too much data. I would strongly recommend using sub range repairs - this can be a little difficult but can be automated with OpsCenters repair service or Reaper
you can use nodetool repair -pr -full
-pr will help node only repair the data range where it owns;
-full will disable the incremental repair and like other people suggests, incremental repair is not a good fit

Correct usage of nodetool repair command

What is the right method to run nodetool repair command?
In a 3-node Cassandra cluster in single datacenter, should we run nodetool repair or nodetool repair -pr ?
As per the Cassandra apache document http://cassandra.apache.org/doc/latest/operating/repair.html,
"By default, repair will operate on all token ranges replicated by the node you’re running repair on, which will cause duplicate work if you run it on every node. The -pr flag will only repair the “primary” ranges on a node, so you can repair your entire cluster by running nodetool repair -pr on each node in a single datacenter."
Running "nodetool repair" takes more than 5 mins.But running "nodetool repair -pr" takes lesser time.So,I want to know if "nodetool repair -pr" is the correct choice for 3-node Cassandra cluster in single datacenter.
Please advice.
Notice that if you use -pr, do not use -inc at the same time because these two are not recommended to be used together. So basically like other people suggests, before 2.2, on each node you can just run nodetool repair -pr; whereas, after 2.2, you'd better use nodetool repair -pr -full to suppress the incremental repair.
If you run repair periodically, the best way is to run -pr option, thus repairing only the primary token range. In that way, the whole ring will be repaired.
But if you didn't run repair until now, probably the best way will be to run a full repair and then maintenance repairs using -pr option.
Also, note that depending on your cassandra version, the repair behavior is changed. The default repair in 2.2 and afterwards is incremental. If you want to trigger a full repair you have to explicitly use -full option. For versions prior to 2.2, check the documentation.

Cassandra nodetool repair options

I have a 15 node cluster with RF 3 (using vnodes). We are ingesting data into the 15 nodes from multiple clients. It turns out that one of the nodes has been down for a couple of days and it's now almost 200 GBs behind, the other nodes have approx 380 GB.
What sort of nodetool repair would you recommend here? I know that the nodetool repair operation is CPU intensive and this might affect the rate at which the clients would be ingesting into the cluster. There seems to be several nodetool repair operations such as -snapshot, -par, etc and I was wondering if any of these options would better suit my current scenario.
I'm trying to run the repair with the least performance hit possible on the cluster.
Thanks,
mskh
Unless you have already taken a snapshot to repair from, the -snapshot option won't do you any good.
Do you have multiple datacenters? If so, you could do a nodetool repair -local, which would only repair your node from nodes in its local datacenter. This is a good way to repair a node without affecting overall cluster performance.
Otherwise Rock's suggestion of repairing only the first partition range (in parallel) is worth trying, as well.
You can use sh nodetool repair -par to ensure minimum impact for online cluster on each node.
Run sh nodetool cleanup once repair is done.

Resources