How to repair cassandra nodes - cassandra

I have a cassandra cluster with two datacenters.
In datacenter 2 I have a keyspace with replication factor 3.
I want to repair all keyspaces in datacenter 2.
I have tried to run:
nodetool repair --in-local-dc --full -j 4
But this command does not repair all keyspaces. Does anybody know if this is intended behaviour ? Cassandra logs does not indicate any problems

So I have also had issues with multi-DC repairs when designating a source DC. I don't know if those DC-specific repair flags are buggy, but I have found that pretty much the best way to ensure that only specific nodes are involved in a repair is to specify each one.
nodetool repair keyspace_name -hosts 10.6.8.2 -hosts 10.6.8.3 -hosts 10.6.8.1
-hosts 10.6.8.5 -hosts 10.6.8.4 -hosts 10.1.3.1 -full
Note that my goal was to run this repair on 10.1.3.1 while SSH'd into it. The node you are running the repair on must also be specified with a -hosts flag. Also make sure that each node in the source DC is listed, otherwise you'll get errors about missing source token ranges.
Try that and see if that helps.

Related

Nodetool repair -pr -full vs Nodetool repair -pr

I am running a cluster with 1 datacenter (6 nodes) and Cassandra 3.11.0 installed on each node with replication factor 2. I know nodetool repair -pr
will carry out a repair of the primary ranges on that node. My question, is how nodetool repair -pr -full is different from nodetool repair -pr?
Which repair option should I use on a heavy load production system?
my question is how nodetool repair -pr -full is different from nodetool repair -pr?
So a "full" repair means that all data between the source and target nodes will be verified and repaired. Essentially, it's the opposite of an "incremental" repair, where only a subset of data is repaired. These two options control how much data is repaired.
Once that is decided (incremental vs. full), -pr will run on that subset, and only then repairs the primary replicas.
Additionally, -full is the default for Cassandra 3; which would make -pr and -pr -full essentially the same.
Which repair option should I use on a heavy load production system?
I'll second what Alex said, and also recommend Cassandra Reaper for this. It has the ability to schedule repairs for slower times, as well as allowing you to pause repairs which don't complete in time.
For production systems, as part of it's better to use token range repair (using -st/-et options) to limit the load onto the nodes. Doing it manually could be tedious, but it could be automated with tools like Reaper, that track what token ranges are already repaired, and what not.
Its recommended that do not execute incremental repair with -PR option.
it will skip non-primary replicas unrepaired and is not a good practice in long run !!

Getting error: "It is not possible to mix sequential repair and incremental repairs" when trying to do incremental repair

I'm trying to do incremental repair on my nodes and I'm following this guide here
After that I executed the command ./bin/nodetool repair --inc and it gives me the following error:
[2019-01-17 21:10:38,827] Nothing to repair for keyspace 'dse_perf'
[2019-01-17 21:10:38,835] Nothing to repair for keyspace 'system'
[2019-01-17 21:10:38,863] Starting repair command #5, repairing 768
ranges for keyspace dse_system (parallelism=SEQUENTIAL, full=false)
[2019-01-17 21:10:38,867] It is not possible to mix sequential repair
and incremental repairs.
[2019-01-17 21:10:38,877] Starting repair command #6, repairing 512
ranges for keyspace my_keyspace (parallelism=SEQUENTIAL, full=false)
[2019-01-17 21:10:38,880] It is not possible to mix sequential repair
and incremental repairs.
[2019-01-17 21:10:38,893] Starting repair command #7, repairing 512
ranges for keyspace system_traces (parallelism=SEQUENTIAL, full=false)
[2019-01-17 21:10:38,895] It is not possible to mix sequential repair
and incremental repairs.
I don't understand what it actually means. I tried searching online but those it tells something about system limitation like here
But I'm not totally convinced what it's trying to say.
I'm doing this on Ubuntu 16.04. Any help would be appreciated. Thank you!
The guide that you're using is for very old version of Cassandra.
Incremental repair has some problems in implementation, so it was switched off as default in DSE 5.1.3. Depending on the version of the DSE you may need better to:
use OpsCenter's repair service to schedule the repairs most effective way. If you don't want use OpsCenter, just use standard (non-incremental) repairs;
in DSE 6.0+ you can enable NodeSync on tables, that will perform repairs in the background (this could be also done via OpsCenter).

Correct usage of nodetool repair command

What is the right method to run nodetool repair command?
In a 3-node Cassandra cluster in single datacenter, should we run nodetool repair or nodetool repair -pr ?
As per the Cassandra apache document http://cassandra.apache.org/doc/latest/operating/repair.html,
"By default, repair will operate on all token ranges replicated by the node you’re running repair on, which will cause duplicate work if you run it on every node. The -pr flag will only repair the “primary” ranges on a node, so you can repair your entire cluster by running nodetool repair -pr on each node in a single datacenter."
Running "nodetool repair" takes more than 5 mins.But running "nodetool repair -pr" takes lesser time.So,I want to know if "nodetool repair -pr" is the correct choice for 3-node Cassandra cluster in single datacenter.
Please advice.
Notice that if you use -pr, do not use -inc at the same time because these two are not recommended to be used together. So basically like other people suggests, before 2.2, on each node you can just run nodetool repair -pr; whereas, after 2.2, you'd better use nodetool repair -pr -full to suppress the incremental repair.
If you run repair periodically, the best way is to run -pr option, thus repairing only the primary token range. In that way, the whole ring will be repaired.
But if you didn't run repair until now, probably the best way will be to run a full repair and then maintenance repairs using -pr option.
Also, note that depending on your cassandra version, the repair behavior is changed. The default repair in 2.2 and afterwards is incremental. If you want to trigger a full repair you have to explicitly use -full option. For versions prior to 2.2, check the documentation.

Corrupted system_auth keyspace

I have a cluster with 6 nodes. As the official documents suggested, we need to change the replication factor of the system_auth keyspace to be the same as the number of nodes.
Now the system_auth seems corrupt because there are many version of system_auth, as the following:
Some of the users could not be altered since I got null pointer exception:
I tried to use nodetool repair to fix it, but didn't help.
Could anyone tell me wha's wrong with my cluster and suggest me how to resolve the problem?
Thanks!
Did you try running repair with -pr for system_auth across the cluster? Using -pr requires it to be run on every node.
nodetool repair -pr system_auth

Cassandra nodetool repair best practices

This question applies to Cassandra 2.2
I am embarrassed to say that I still do not understand when I should be running a nodetool repair, or to be more precise on which nodes.
So far, I understand that to ensure deletes are handled correctly I should be running a repair at a frequency that is less than the GC_GRACE_SECONDS. So that's cool got that bit.
Q. If I have a cluster of 9 nodes with a replication factor of 3, what type of repair do I run? more importantly do I run the repair on every node, or just one node?
Q. If I have multiple data centers, does that change how I run repairs. Do I have to run them in each DC, or can it be coordinated from just one node in one DC?
I am hoping this is a trivial question and someone can just tell it how it is.
The nodetool repair command can be run on either a specified node or
on all nodes if a node is not specified. The node that initiates the
repair becomes the coordinator node for the operation.
If node it not specified it runs on all the nodes that is responsible for that partition range.
run nodetool repair -pr on every node in the cluster to repair all
data. Otherwise, some ranges of data will not be repaired
The nodetool repair -pr option is good for repairs across multiple datacenters.
Note: For Cassandra 2.2 and later, a recommended option for repairs across datacenters: use the -dcpar or --dc-parallel to repair
datacenters in parallel.
Nodetool Repair
This is the recommendation from datastax.
Run repair frequently enough that every node is repaired before
reaching the time specified in the gc_grace_seconds setting. Deleted
data is properly handled in the cluster if this requirement is met.

Resources