Actual Storage Load of a Cassandra node - cassandra

Is there any way to get the actual data size stored on a node?
The nodetool outputs afaik only the compressed data size.
Thx!

Have you tried using nodetool cfstats <keyspace>? That should break it down on a per column family basis (for each column family in the specified keyspace). And it should give you more detail on space usage.
aploetz#ubuntu:/var/lib/cassandra/data$ nodetool cfstats products
Keyspace: products
Read Count: 3515
Read Latency: 0.4077462304409673 ms.
Write Count: 5434
Write Latency: 0.04547313213102686 ms.
Pending Tasks: 0
Table: itemmaster
SSTable count: 3
Space used (live), bytes: 1156013
Space used (total), bytes: 1266953
SSTable Compression Ratio: 0.2963641232834859
...

Related

Repair status not 100% after repair

I have noticed that some tables show less than 100% "Percent repaired" in the nodetool tablestatus output. I have manually executed repairs on all nodes (3 node cluster, RF=3) but the value doesnt seem to change.
Example output:
Table: users
SSTable count: 3
Space used (live): 66636
Space used (total): 66636
Space used by snapshots (total): 0
Off heap memory used (total): 688
SSTable Compression Ratio: 0.5731829674519404
Number of partitions (estimate): 162
Memtable cell count: 11
Memtable data size: 483
Memtable off heap memory used: 0
Memtable switch count: 27
Local read count: 120833
Local read latency: NaN ms
Local write count: 12094
Local write latency: NaN ms
Pending flushes: 0
Percent repaired: 91.54
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 568
Bloom filter off heap memory used: 544
Index summary off heap memory used: 112
Compression metadata off heap memory used: 32
Compacted partition minimum bytes: 30
Compacted partition maximum bytes: 1916
Compacted partition mean bytes: 420
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0
Repair was done with nodetool repair -pr
What is going on?
Percent repaired seems to be a misleading metric as it refers to the percentage of SSTables repaired, but there are some conditions to be computed here:
- the tables should not be from systems keyspaces
- the tables should have a replication factor greater than 1
- the repair should be incremental or full (non-subrange)
When you use nodetool repair -pr, that will invoke a full repair that won't be able to update this value.
For more information regarding incremental repairs, I would recommend this article from the Last Pickle. Since they adopted the maintenance of the reaper tool, they have become an authority regarding repairs.
Executing nodetool repair -pr will repair the primary range owned by the node that command is executed on.
What does this mean? The node this command is executed on has data that it "owns", i.e., its primary range, but the node also contains data/replicas "owned" by other nodes. You are not repairing the replicas "owned" owned by other nodes.
Now, if you execute that command on every single node in the cluster (not data center), it will cover all the token ranges.
EDIT / NOTE:
My answer did not properly address the question. Although what I wrote is accurate, the answer to the question is stated in the answer above mine; basically, the percentage repaired is a value that is for incremental repair usage and is not affected by a full repair. (Incremental repair marks the repaired ranges as it works so it does not spend time re-repairing later.)

Does nodetool for cassandra only gather data for a single node or for the entire cluster?

I have a 19-node Cassandra cluster for our internal service. If I log into a node using nodetool and run commands like tablestats, etc, does that gather stats just for that particular node or for the entire cluster?
nodetool utility for cassandra gather for entire cluster, not a single node.
For example, if you run command like-
command:
nodetool tablestats musicdb.artist
result:
Keyspace: musicdb
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: artist
SSTable count: 1
Space used (live): 62073
Space used (total): 62073
Space used by snapshots (total): 0
Off heap memory used (total): 1400
SSTable Compression Ratio: 0.27975344141453456
Number of keys (estimate): 1000
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 0
Local read latency: NaN ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 1264
Bloom filter off heap memory used: 1256
Index summary off heap memory used: 128
Compression metadata off heap memory used: 16
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 149
Compacted partition mean bytes: 149
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0
Status of the table artist belongs to keyspace musicdb above is from the entire cluster.
Most nodetool commands operate on a single node in the cluster if -h
is not used to identify one or more other nodes. If the node from
which you issue the command is the intended target, you do not need
the -h option to identify the target; otherwise, for remote
invocation, identify the target node, or nodes, using -h.
Nodetool Utility

Cassandra Cluster with 2 Nodes got Read TimeOut/NoHostAvailable Exception

I am implementing a recommendation engine in .Net C#, I am using Cassandra to store the data. I am still new in using C*, just started using it 2 months ago. At the moment I have only 2 nodes in my cluster (single DC), deployed in Azure DS2 VM (each has 7Gb RAM, 2 Cores). I set RF=2, CL=1 for both read and write. I set the timeouts in yaml config file as below
read_request_timeout_in_ms: 60000
write_request_timeout_in_ms: 120000
counter_write_request_timeout_in_ms: 120000
request_timeout_in_ms: 120000
I set lower read query timeout in client side (30 secs each).
The data stored in cassandra is user history, item counter, and recommended items data. I created an API (stands in equinix DC) for my recommendation engine, its work is very simple, only reading all recommended_items Id from recommended_items table in C* everytime a user opens the website page. It means that the query is very simple for each user :
select * from recommended_items where username = <username>
When I did load testing for up to 500 users/threads, it was fine and very fast. But when the online site calls API to read from C* table, I got read timeouts very often. There were usually only less than 20 users at the same time though.
I monitor the cassandra nodes activity using DataDog and I found that only node #2 that keeps getting timeouts (the seed node is node #1, though what I understand is seed doesn't really matter except during bootstrapping step). However, everytime the timeout happens, I tried to query using cqlsh in both nodes, and node #1 is the one that return
OperationTimeOut Exception.
I have been trying to find the main root of this issue. Does that have anything to do with coordinator node being down (I read this article) ? Or is that because I have only 2 nodes?
When the timeout happens (the webpage shows nothing), then I tried to refresh the page that calls the API, it will be loading for long time before showing nothing again (because of the timeout). But surprisingly, I will get the log that all those requests were actually successful after few minutes even though the web page has been closed. It's like the read request was still running even though the page has been closed.
The exception are like these (they didn't happen together) :
None of the hosts tried for query are available (tried: 13.73.193.140:9042,13.75.154.140:9042)
OR
Cassandra timeout during read query at consistency LocalOne (0 replica(s) responded over 1 required)
Does anyone have any suggestion about my problem? thank you.
output of cfstats .recommended_items
NODE #1
Read Count: 683
Read Latency: 2.970781844802343 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: recommendedvideos
Space used (live): 96034775
Space used (total): 96034775
Space used by snapshots (total): 40345163
Off heap memory used (total): 192269
SSTable Compression Ratio: 0.4405242717559795
Number of keys (estimate): 101493
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 376
Local read latency: 1.647 ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 126928
Bloom filter off heap memory used: 126896
Index summary off heap memory used: 40085
Compression metadata off heap memory used: 25288
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 454826
Compacted partition mean bytes: 2201
Average live cells per slice (last five minutes): 160.28657799274487
Maximum live cells per slice (last five minutes): 2759
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0
NODE #2
Read Count: 733
Read Latency: 3.0032783083219647 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: recommendedvideos
Space used (live): 99145806
Space used (total): 99145806
Space used by snapshots (total): 15101127
Off heap memory used (total): 196008
SSTable Compression Ratio: 0.44063804831658704
Number of keys (estimate): 103863
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 453
Local read latency: 1.344 ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 129056
Bloom filter off heap memory used: 129040
Index summary off heap memory used: 40856
Compression metadata off heap memory used: 26112
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 454826
Compacted partition mean bytes: 2264
Average live cells per slice (last five minutes): 170.7715877437326
Maximum live cells per slice (last five minutes): 2759
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0

What does different fields of nodetool cfstats mean?

When I am using below command.
nodetool -h localhost -p 7199 cfstats demodb I came up with the following results. I cant get upto any conclusion from the following results. I could not decide whether my two node clustered Cassandra is performing good or need to be tuned.
Keyspace: demodb
Read Count: 81361
Read Latency: 0.04145315323066334 ms.
Write Count: 23114
Write Latency: 0.06758518646707623 ms.
Pending Tasks: 0
Table: schema1
SSTable count: 0
Space used (live), bytes: 0
Space used (total), bytes: 3560
SSTable Compression Ratio: 0.0
Number of keys (estimate): 0
Memtable cell count: 5686
Memtable data size, bytes: 3707713
Memtable switch count: 5
Local read count: 81361
Local read latency: 0.000 ms
Local write count: 23114
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 0
Compacted partition minimum bytes: 0
Compacted partition maximum bytes: 0
Compacted partition mean bytes: 0
Average live cells per slice (last five minutes): 1.0
Average tombstones per slice (last five minutes): 0.0
As far as i see, I can tell you that your data in the table schema1 is still fully in memory "SSTable count: 0". At this point there is nothing to optimize. The statistics will be more helpful when you have more data and your in-memory state is flushed to disk. It is to early to optimize something.

Determining how full a Cassandra cluster is

I just imported a lot of data in a 9 node Cassandra cluster and before I create a new ColumnFamily with even more data, I'd like to be able to determine how full my cluster currently is (in terms of memory usage). I'm not too sure what I need to look at. I don't want to import another 20-30GB of data and realize I should have added 5-6 more nodes.
In short, I have no idea if I have too few/many nodes right now for what's in the cluster.
Any help would be greatly appreciated :)
$ nodetool -h 192.168.1.87 ring
Address DC Rack Status State Load Owns Token
151236607520417094872610936636341427313
192.168.1.87 datacenter1 rack1 Up Normal 7.19 GB 11.11% 0
192.168.1.86 datacenter1 rack1 Up Normal 7.18 GB 11.11% 18904575940052136859076367079542678414
192.168.1.88 datacenter1 rack1 Up Normal 7.23 GB 11.11% 37809151880104273718152734159085356828
192.168.1.84 datacenter1 rack1 Up Normal 4.2 GB 11.11% 56713727820156410577229101238628035242
192.168.1.85 datacenter1 rack1 Up Normal 4.25 GB 11.11% 75618303760208547436305468318170713656
192.168.1.82 datacenter1 rack1 Up Normal 4.1 GB 11.11% 94522879700260684295381835397713392071
192.168.1.89 datacenter1 rack1 Up Normal 4.83 GB 11.11% 113427455640312821154458202477256070485
192.168.1.51 datacenter1 rack1 Up Normal 2.24 GB 11.11% 132332031580364958013534569556798748899
192.168.1.25 datacenter1 rack1 Up Normal 3.06 GB 11.11% 151236607520417094872610936636341427313
-
# nodetool -h 192.168.1.87 cfstats
Keyspace: stats
Read Count: 232
Read Latency: 39.191931034482764 ms.
Write Count: 160678758
Write Latency: 0.0492021849459404 ms.
Pending Tasks: 0
Column Family: DailyStats
SSTable count: 5267
Space used (live): 7710048931
Space used (total): 7710048931
Number of Keys (estimate): 10701952
Memtable Columns Count: 4401
Memtable Data Size: 23384563
Memtable Switch Count: 14368
Read Count: 232
Read Latency: 29.047 ms.
Write Count: 160678813
Write Latency: 0.053 ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 115533264
Key cache capacity: 200000
Key cache size: 1894
Key cache hit rate: 0.627906976744186
Row cache: disabled
Compacted row minimum size: 216
Compacted row maximum size: 42510
Compacted row mean size: 3453
-
[default#stats] describe;
Keyspace: stats:
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Durable Writes: true
Options: [replication_factor:3]
Column Families:
ColumnFamily: DailyStats (Super)
Key Validation Class: org.apache.cassandra.db.marshal.BytesType
Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type/org.apache.cassandra.db.marshal.UTF8Type
Row cache size / save period in seconds / keys to save : 0.0/0/all
Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
Key cache size / save period in seconds: 200000.0/14400
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 1.0
Replicate on write: true
Built indexes: []
Column Metadata:
(removed)
Compaction Strategy: org.apache.cassandra.db.compaction.LeveledCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
Obviously, there are two types of memory -- disk and RAM. I'm going to assume you're talking about disk space.
First, you should find out how much space you're currently using per node. Check the on-disk usage of the cassandra data dir (by default /var/lib/cassandra/data) with this command: du -ch /var/lib/cassandra/data You should then compare that to the size of your disk, which can be found with df -h. Only consider the entry for the df results for the disk your cassandra data is on, by checking the Mounted on column.
Using those stats, you should be able to calculate how full in % the cassandra data partition. Generally you don't want to get too close to 100% because cassandra's normal compaction processes temporarily use more disk space. If you don't have enough, then a node can get caught with a full disk, which can be painful to resolve (as I side note I occasionally keep a "ballast" file of a few Gigs that I can delete just in case I need to open some extra space). I've generally found that not exceeding about 70% disk usage is on the safe side for the 0.8 series.
If you're using a newer version of cassandra, then I'd recommend giving the Leveled Compaction strategy a shot to reduce temporary disk usage. Instead of potentially using twice as much disk space, the new strategy will at most use 10x of a small, fixed size (5MB by default).
You can read more about how compaction temporarily increases disk usage on this excellent blog post from Datastax: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra It also explains the compaction strategies.
So to do a little capacity planning, you can figure up how much more space you'll need. With a replication factor of 3 (what you're using above), adding 20-30GB of raw data would add 60-90GB after replication. Split between your 9 nodes, that's maybe 3GB more per node. Does adding that kind of disk usage per node push you too close to having full disks? If so, you might want to consider adding more nodes to the cluster.
One other note is that your nodes' loads aren't very even -- from 2GB up to 7GB. If you're using the ByteOrderPartitioner over the random one, then that can cause uneven load and "hotspots" in your ring. You should consider using random if possible. The other possibility could be that you have extra data hanging out that needs to be taken care of (Hinted Handoffs and snapshots come to mind). Consider cleaning that up by running nodetool repair and nodetool cleanup on each node one at a time (be sure to read up on what those do first!).
Hope that helps.

Resources