Repair status not 100% after repair - cassandra

I have noticed that some tables show less than 100% "Percent repaired" in the nodetool tablestatus output. I have manually executed repairs on all nodes (3 node cluster, RF=3) but the value doesnt seem to change.
Example output:
Table: users
SSTable count: 3
Space used (live): 66636
Space used (total): 66636
Space used by snapshots (total): 0
Off heap memory used (total): 688
SSTable Compression Ratio: 0.5731829674519404
Number of partitions (estimate): 162
Memtable cell count: 11
Memtable data size: 483
Memtable off heap memory used: 0
Memtable switch count: 27
Local read count: 120833
Local read latency: NaN ms
Local write count: 12094
Local write latency: NaN ms
Pending flushes: 0
Percent repaired: 91.54
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 568
Bloom filter off heap memory used: 544
Index summary off heap memory used: 112
Compression metadata off heap memory used: 32
Compacted partition minimum bytes: 30
Compacted partition maximum bytes: 1916
Compacted partition mean bytes: 420
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0
Repair was done with nodetool repair -pr
What is going on?

Percent repaired seems to be a misleading metric as it refers to the percentage of SSTables repaired, but there are some conditions to be computed here:
- the tables should not be from systems keyspaces
- the tables should have a replication factor greater than 1
- the repair should be incremental or full (non-subrange)
When you use nodetool repair -pr, that will invoke a full repair that won't be able to update this value.
For more information regarding incremental repairs, I would recommend this article from the Last Pickle. Since they adopted the maintenance of the reaper tool, they have become an authority regarding repairs.

Executing nodetool repair -pr will repair the primary range owned by the node that command is executed on.
What does this mean? The node this command is executed on has data that it "owns", i.e., its primary range, but the node also contains data/replicas "owned" by other nodes. You are not repairing the replicas "owned" owned by other nodes.
Now, if you execute that command on every single node in the cluster (not data center), it will cover all the token ranges.
EDIT / NOTE:
My answer did not properly address the question. Although what I wrote is accurate, the answer to the question is stated in the answer above mine; basically, the percentage repaired is a value that is for incremental repair usage and is not affected by a full repair. (Incremental repair marks the repaired ranges as it works so it does not spend time re-repairing later.)

Related

Does nodetool for cassandra only gather data for a single node or for the entire cluster?

I have a 19-node Cassandra cluster for our internal service. If I log into a node using nodetool and run commands like tablestats, etc, does that gather stats just for that particular node or for the entire cluster?
nodetool utility for cassandra gather for entire cluster, not a single node.
For example, if you run command like-
command:
nodetool tablestats musicdb.artist
result:
Keyspace: musicdb
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: artist
SSTable count: 1
Space used (live): 62073
Space used (total): 62073
Space used by snapshots (total): 0
Off heap memory used (total): 1400
SSTable Compression Ratio: 0.27975344141453456
Number of keys (estimate): 1000
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 0
Local read latency: NaN ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 1264
Bloom filter off heap memory used: 1256
Index summary off heap memory used: 128
Compression metadata off heap memory used: 16
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 149
Compacted partition mean bytes: 149
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0
Status of the table artist belongs to keyspace musicdb above is from the entire cluster.
Most nodetool commands operate on a single node in the cluster if -h
is not used to identify one or more other nodes. If the node from
which you issue the command is the intended target, you do not need
the -h option to identify the target; otherwise, for remote
invocation, identify the target node, or nodes, using -h.
Nodetool Utility

High Read an Write Latency in cassandra 2.2.6

Hi similar question was already asked but I guess we have a bit different problem:
We use Cassandra 2.2.6 one node installation (and going to upgrade to the newest). Right now we've got horrible time of queries and sometimes write timeouts.
Read Count: 21554802
Read Latency: 10.702975718589295 ms.
Write Count: 19437551
Write Latency: 27.806026818707767 ms.
Pending Flushes: 0
Table: -----
SSTable count: 5
Space used (live): 661310370
Space used (total): 661310370
Space used by snapshots (total): 704698632
Off heap memory used (total): 845494
SSTable Compression Ratio: 0.13491738106721324
Number of keys (estimate): 179623
Memtable cell count: 594836
Memtable data size: 8816212
Memtable off heap memory used: 0
Memtable switch count: 3343
Local read count: 21554802
Local read latency: 11,744 ms
Local write count: 19437551
Local write latency: 30,506 ms
Pending flushes: 0
Bloom filter false positives: 387
Bloom filter false ratio: 0,00024
Bloom filter space used: 258368
Bloom filter off heap memory used: 258328
Index summary off heap memory used: 34830
Compression metadata off heap memory used: 552336
Compacted partition minimum bytes: 180
Compacted partition maximum bytes: 12108970
Compacted partition mean bytes: 23949
Average live cells per slice (last five minutes): 906.8858219156 92
Maximum live cells per slice (last five minutes): 182785
Average tombstones per slice (last five minutes): 1.432102507830 9697
Maximum tombstones per slice (last five minutes): 50
For comparison, there is a different table containing about 10M records and constructed quite similar to the above
Read Count: 815780599
Read Latency: 0.1672932019580917 ms.
Write Count: 3083462
Write Latency: 1.5470194706469547 ms.
Pending Flushes: 0
Table: ------
SSTable count: 9
Space used (live): 5067447115
Space used (total): 5067447115
Space used by snapshots (total): 31810631860
Off heap memory used (total): 19603932
SSTable Compression Ratio: 0.2952622065160448
Number of keys (estimate): 12020796
Memtable cell count: 300611
Memtable data size: 18020553
Memtable off heap memory used: 0
Memtable switch count: 97
Local read count: 815780599
Local read latency: 0,184 ms
Local write count: 3083462
Local write latency: 1,692 ms
Pending flushes: 0
Bloom filter false positives: 7
Bloom filter false ratio: 0,00000
Bloom filter space used: 15103552
Bloom filter off heap memory used: 15103480
Index summary off heap memory used: 2631412
Compression metadata off heap memory used: 1869040
Compacted partition minimum bytes: 925
Compacted partition maximum bytes: 1916
Compacted partition mean bytes: 1438
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1
Average tombstones per slice (last five minutes): 1.0193396020053622
Maximum tombstones per slice (last five minutes): 3
Difference is the first one contains lots of maps and UDT. Simple test in dev center select * from ... limit 999; (omitting any Lucene indexes etc.) shows 183ms for the last one and 1.8s for the first one.
Anybody could help define a way to find the rootcause?
Maximum live cells per slice (last five minutes): 182785
That is huge, probably from your maps and UDTs. Your data model is most likely the root cause. Walking live 180k cells to satisfy a single query will be very slow.
select * from ... limit 999;
Range queries are inherently slow. Try to design your tables such that you can answer your question from a single partition and you will get better results.
one node installation
Whenever there is a GC you will have a bad query, this is mitigated a by adding more nodes such that pauses wont hurt as bad (even better if use client side speculative retry on driver).

Cassandra: read/s write/s

I'm trying to figure out the throughput of my Cassandra cluster, and can't figure out how to use nodetool to accomplish that. Below is a sample output:
Starting NodeTool
Keyspace: realtimetrader
Read Count: 0
Read Latency: NaN ms.
Write Count: 402
Write Latency: 0.09648756218905473 ms.
Pending Flushes: 0
Table: currencies
SSTable count: 1
Space used (live): 5254
Space used (total): 5254
Space used by snapshots (total): 0
Off heap memory used (total): 40
SSTable Compression Ratio: 0.0
Number of keys (estimate): 14
Memtable cell count: 1608
Memtable data size: 567
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 0
Local read latency: NaN ms
Local write count: 402
Local write latency: 0.106 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0,00000
Bloom filter space used: 24
Bloom filter off heap memory used: 16
Index summary off heap memory used: 16
Compression metadata off heap memory used: 8
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 149
Compacted partition mean bytes: 149
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0
I run the command:
nodetool cfstats
to get this, and then subtract the latter, "Local read count:" from the earlier one.
But I'm not sure what the "Local" here means?
Does it mean its local to that node and in a ring of 5 nodes, I should multiple the value by 5? Or is it that the simple subtraction will give me the correct result?
Also, which JMX bean should I be looking at to get these #'s?
Have a look at this nodetool cfstats.
I think what you are looking for is 'Read Latency' and 'Write Latency'.
These fields indicate how faster your read/writes are in your cluster.

Cassandra Cluster with 2 Nodes got Read TimeOut/NoHostAvailable Exception

I am implementing a recommendation engine in .Net C#, I am using Cassandra to store the data. I am still new in using C*, just started using it 2 months ago. At the moment I have only 2 nodes in my cluster (single DC), deployed in Azure DS2 VM (each has 7Gb RAM, 2 Cores). I set RF=2, CL=1 for both read and write. I set the timeouts in yaml config file as below
read_request_timeout_in_ms: 60000
write_request_timeout_in_ms: 120000
counter_write_request_timeout_in_ms: 120000
request_timeout_in_ms: 120000
I set lower read query timeout in client side (30 secs each).
The data stored in cassandra is user history, item counter, and recommended items data. I created an API (stands in equinix DC) for my recommendation engine, its work is very simple, only reading all recommended_items Id from recommended_items table in C* everytime a user opens the website page. It means that the query is very simple for each user :
select * from recommended_items where username = <username>
When I did load testing for up to 500 users/threads, it was fine and very fast. But when the online site calls API to read from C* table, I got read timeouts very often. There were usually only less than 20 users at the same time though.
I monitor the cassandra nodes activity using DataDog and I found that only node #2 that keeps getting timeouts (the seed node is node #1, though what I understand is seed doesn't really matter except during bootstrapping step). However, everytime the timeout happens, I tried to query using cqlsh in both nodes, and node #1 is the one that return
OperationTimeOut Exception.
I have been trying to find the main root of this issue. Does that have anything to do with coordinator node being down (I read this article) ? Or is that because I have only 2 nodes?
When the timeout happens (the webpage shows nothing), then I tried to refresh the page that calls the API, it will be loading for long time before showing nothing again (because of the timeout). But surprisingly, I will get the log that all those requests were actually successful after few minutes even though the web page has been closed. It's like the read request was still running even though the page has been closed.
The exception are like these (they didn't happen together) :
None of the hosts tried for query are available (tried: 13.73.193.140:9042,13.75.154.140:9042)
OR
Cassandra timeout during read query at consistency LocalOne (0 replica(s) responded over 1 required)
Does anyone have any suggestion about my problem? thank you.
output of cfstats .recommended_items
NODE #1
Read Count: 683
Read Latency: 2.970781844802343 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: recommendedvideos
Space used (live): 96034775
Space used (total): 96034775
Space used by snapshots (total): 40345163
Off heap memory used (total): 192269
SSTable Compression Ratio: 0.4405242717559795
Number of keys (estimate): 101493
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 376
Local read latency: 1.647 ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 126928
Bloom filter off heap memory used: 126896
Index summary off heap memory used: 40085
Compression metadata off heap memory used: 25288
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 454826
Compacted partition mean bytes: 2201
Average live cells per slice (last five minutes): 160.28657799274487
Maximum live cells per slice (last five minutes): 2759
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0
NODE #2
Read Count: 733
Read Latency: 3.0032783083219647 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: recommendedvideos
Space used (live): 99145806
Space used (total): 99145806
Space used by snapshots (total): 15101127
Off heap memory used (total): 196008
SSTable Compression Ratio: 0.44063804831658704
Number of keys (estimate): 103863
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 453
Local read latency: 1.344 ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 129056
Bloom filter off heap memory used: 129040
Index summary off heap memory used: 40856
Compression metadata off heap memory used: 26112
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 454826
Compacted partition mean bytes: 2264
Average live cells per slice (last five minutes): 170.7715877437326
Maximum live cells per slice (last five minutes): 2759
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0

Why space usage is 0 although I had already inserted >40k rows

Currently, I have 3 nodes for Cassandra.
I create a table named events
After inserting >40k rows, I perform the following command in each node.
nodetool -h localhost cfstats
This is the output from one of the node
Table: events
SSTable count: 0
Space used (live): 0
Space used (total): 0
Space used by snapshots (total): 43516
Off heap memory used (total): 0
SSTable Compression Ratio: 0.0
Number of keys (estimate): 1
Memtable cell count: 102675
Memtable data size: 4224801
Memtable off heap memory used: 0
Memtable switch count: 1
Local read count: 0
Local read latency: NaN ms
Local write count: 4223
Local write latency: 0.085 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 0
Bloom filter off heap memory used: 0
Index summary off heap memory used: 0
Compression metadata off heap memory used: 0
Compacted partition minimum bytes: 0
Compacted partition maximum bytes: 0
Compacted partition mean bytes: 0
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0
To my surprise, Space used (live) and Space used (total) are 0. The other nodes are also having 0 Space used (live) and Space used (total).
However, when I perform SELECT, I can get multiple rows which are being inserted previously.
May I know, why are my Space used (live) and Space used (total) 0 for all nodes?
Your Memtables have not yet flushed to disk. Flush is generally triggered by a few things:
The memtable reaching the max threshold size
A commit log segment responsible for data in that memtable expiring
User calling nodetool flush
If you insert 40k rows and then do nothing, as long as they fit comfortably in memory, they will stay in memory. You will see no permanent disk usage for those rows since there is no on-disk sstable holding their values.
The persistence for those rows is guaranteed by the commit-log, which stores mutations in the order in which they occurred on the disk and can be replayed in case of node failure. The commit-log is a rolling log so when commit-log segement is about to expire, Cassandra will flush the memtable holding the data in that segement to an on-disk sstable.

Resources