Why does nodetool status *keyspace* still show hundreds of MBs of data after TRUNCATE? - cassandra

I have used the TRUNCATE command from the CQLSH at node .20 for my table.
20 Minutes have passed since I issued the command and the output of nodetool status *myKeyspace* still shows a lot of data on 4 out of 6 nodes.
I am using Cassandra 3.0.8
192.168.178.20:/usr/share/cassandra$ nodetool status *myKeyspace*
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.178.24 324,57 MB 256 32,7% 4d852aea-65c7-42e1-b2bd-f38a320ec827 rack1
UN 192.168.178.28 650,86 KB 256 35,7% 82b67dc5-9f4f-47e9-81d7-a93f28a3e9da rack1
UN 192.168.178.30 155,68 MB 256 31,9% 28cf5138-7b61-42ca-8b0c-e4be1b5418ba rack1
UN 192.168.178.32 321,62 MB 256 33,3% 64e106ed-770f-4654-936d-db5b80aa37dc rack1
UN 192.168.178.36 640,91 KB 256 33,0% 76152b07-caa6-4214-8239-e8a51bbc4b62 rack1
UN 192.168.178.20 103,07 MB 256 33,3% 539a6333-c4ef-487a-b1e4-aac40949af4c rack1
The following command was run on .24 node. It looks like there there are still snapshots/backups being saved somewhere? But the amount of data, 658 MB for Node .24, does not match the reported 324 MB from nodetool status. What's going on there?
192.168.178.24:/usr/share/cassandra$ nodetool cfstats *myKeyspace*
Keyspace: *myKeyspace*
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: data
SSTable count: 0
Space used (live): 0
Space used (total): 0
Space used by snapshots (total): 658570012
Off heap memory used (total): 0
SSTable Compression Ratio: 0.0
Number of keys (estimate): 0
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 0
Local read latency: NaN ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0,00000
Bloom filter space used: 0
Bloom filter off heap memory used: 0
Index summary off heap memory used: 0
Compression metadata off heap memory used: 0
Compacted partition minimum bytes: 0
Compacted partition maximum bytes: 0
Compacted partition mean bytes: 0
Average live cells per slice (last five minutes): 3.790273556231003
Maximum live cells per slice (last five minutes): 103
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Note that there are no other tables than the one I cleaned in the keyspace. There might be some index data from cassandra-lucene-index though if they do not get cleared when using TRUNCATE.

nodetool status's keyspace option is really only for knowing the replication factor and datacenters to include when computing the ownership. The load is actually for all the sstables, not just the one keyspace. Just like how IP address, host id, and number of tokens is not affected by setting keyspace option. status is more of a global check.
Space used by snapshots is expected to still have old data. When you do a truncate it snapshots the data (can disable by setting auto_snapshot in cassandra.yaml to false). To clear all the snapshots you can use nodetool clearsnapshot <keyspace>

Related

Does nodetool for cassandra only gather data for a single node or for the entire cluster?

I have a 19-node Cassandra cluster for our internal service. If I log into a node using nodetool and run commands like tablestats, etc, does that gather stats just for that particular node or for the entire cluster?
nodetool utility for cassandra gather for entire cluster, not a single node.
For example, if you run command like-
command:
nodetool tablestats musicdb.artist
result:
Keyspace: musicdb
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: artist
SSTable count: 1
Space used (live): 62073
Space used (total): 62073
Space used by snapshots (total): 0
Off heap memory used (total): 1400
SSTable Compression Ratio: 0.27975344141453456
Number of keys (estimate): 1000
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 0
Local read latency: NaN ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 1264
Bloom filter off heap memory used: 1256
Index summary off heap memory used: 128
Compression metadata off heap memory used: 16
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 149
Compacted partition mean bytes: 149
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0
Status of the table artist belongs to keyspace musicdb above is from the entire cluster.
Most nodetool commands operate on a single node in the cluster if -h
is not used to identify one or more other nodes. If the node from
which you issue the command is the intended target, you do not need
the -h option to identify the target; otherwise, for remote
invocation, identify the target node, or nodes, using -h.
Nodetool Utility

Cassandra: read/s write/s

I'm trying to figure out the throughput of my Cassandra cluster, and can't figure out how to use nodetool to accomplish that. Below is a sample output:
Starting NodeTool
Keyspace: realtimetrader
Read Count: 0
Read Latency: NaN ms.
Write Count: 402
Write Latency: 0.09648756218905473 ms.
Pending Flushes: 0
Table: currencies
SSTable count: 1
Space used (live): 5254
Space used (total): 5254
Space used by snapshots (total): 0
Off heap memory used (total): 40
SSTable Compression Ratio: 0.0
Number of keys (estimate): 14
Memtable cell count: 1608
Memtable data size: 567
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 0
Local read latency: NaN ms
Local write count: 402
Local write latency: 0.106 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0,00000
Bloom filter space used: 24
Bloom filter off heap memory used: 16
Index summary off heap memory used: 16
Compression metadata off heap memory used: 8
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 149
Compacted partition mean bytes: 149
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0
I run the command:
nodetool cfstats
to get this, and then subtract the latter, "Local read count:" from the earlier one.
But I'm not sure what the "Local" here means?
Does it mean its local to that node and in a ring of 5 nodes, I should multiple the value by 5? Or is it that the simple subtraction will give me the correct result?
Also, which JMX bean should I be looking at to get these #'s?
Have a look at this nodetool cfstats.
I think what you are looking for is 'Read Latency' and 'Write Latency'.
These fields indicate how faster your read/writes are in your cluster.

Cassandra Cluster with 2 Nodes got Read TimeOut/NoHostAvailable Exception

I am implementing a recommendation engine in .Net C#, I am using Cassandra to store the data. I am still new in using C*, just started using it 2 months ago. At the moment I have only 2 nodes in my cluster (single DC), deployed in Azure DS2 VM (each has 7Gb RAM, 2 Cores). I set RF=2, CL=1 for both read and write. I set the timeouts in yaml config file as below
read_request_timeout_in_ms: 60000
write_request_timeout_in_ms: 120000
counter_write_request_timeout_in_ms: 120000
request_timeout_in_ms: 120000
I set lower read query timeout in client side (30 secs each).
The data stored in cassandra is user history, item counter, and recommended items data. I created an API (stands in equinix DC) for my recommendation engine, its work is very simple, only reading all recommended_items Id from recommended_items table in C* everytime a user opens the website page. It means that the query is very simple for each user :
select * from recommended_items where username = <username>
When I did load testing for up to 500 users/threads, it was fine and very fast. But when the online site calls API to read from C* table, I got read timeouts very often. There were usually only less than 20 users at the same time though.
I monitor the cassandra nodes activity using DataDog and I found that only node #2 that keeps getting timeouts (the seed node is node #1, though what I understand is seed doesn't really matter except during bootstrapping step). However, everytime the timeout happens, I tried to query using cqlsh in both nodes, and node #1 is the one that return
OperationTimeOut Exception.
I have been trying to find the main root of this issue. Does that have anything to do with coordinator node being down (I read this article) ? Or is that because I have only 2 nodes?
When the timeout happens (the webpage shows nothing), then I tried to refresh the page that calls the API, it will be loading for long time before showing nothing again (because of the timeout). But surprisingly, I will get the log that all those requests were actually successful after few minutes even though the web page has been closed. It's like the read request was still running even though the page has been closed.
The exception are like these (they didn't happen together) :
None of the hosts tried for query are available (tried: 13.73.193.140:9042,13.75.154.140:9042)
OR
Cassandra timeout during read query at consistency LocalOne (0 replica(s) responded over 1 required)
Does anyone have any suggestion about my problem? thank you.
output of cfstats .recommended_items
NODE #1
Read Count: 683
Read Latency: 2.970781844802343 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: recommendedvideos
Space used (live): 96034775
Space used (total): 96034775
Space used by snapshots (total): 40345163
Off heap memory used (total): 192269
SSTable Compression Ratio: 0.4405242717559795
Number of keys (estimate): 101493
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 376
Local read latency: 1.647 ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 126928
Bloom filter off heap memory used: 126896
Index summary off heap memory used: 40085
Compression metadata off heap memory used: 25288
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 454826
Compacted partition mean bytes: 2201
Average live cells per slice (last five minutes): 160.28657799274487
Maximum live cells per slice (last five minutes): 2759
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0
NODE #2
Read Count: 733
Read Latency: 3.0032783083219647 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Flushes: 0
Table: recommendedvideos
Space used (live): 99145806
Space used (total): 99145806
Space used by snapshots (total): 15101127
Off heap memory used (total): 196008
SSTable Compression Ratio: 0.44063804831658704
Number of keys (estimate): 103863
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 453
Local read latency: 1.344 ms
Local write count: 0
Local write latency: NaN ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 129056
Bloom filter off heap memory used: 129040
Index summary off heap memory used: 40856
Compression metadata off heap memory used: 26112
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 454826
Compacted partition mean bytes: 2264
Average live cells per slice (last five minutes): 170.7715877437326
Maximum live cells per slice (last five minutes): 2759
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0

Why space usage is 0 although I had already inserted >40k rows

Currently, I have 3 nodes for Cassandra.
I create a table named events
After inserting >40k rows, I perform the following command in each node.
nodetool -h localhost cfstats
This is the output from one of the node
Table: events
SSTable count: 0
Space used (live): 0
Space used (total): 0
Space used by snapshots (total): 43516
Off heap memory used (total): 0
SSTable Compression Ratio: 0.0
Number of keys (estimate): 1
Memtable cell count: 102675
Memtable data size: 4224801
Memtable off heap memory used: 0
Memtable switch count: 1
Local read count: 0
Local read latency: NaN ms
Local write count: 4223
Local write latency: 0.085 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 0
Bloom filter off heap memory used: 0
Index summary off heap memory used: 0
Compression metadata off heap memory used: 0
Compacted partition minimum bytes: 0
Compacted partition maximum bytes: 0
Compacted partition mean bytes: 0
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0
To my surprise, Space used (live) and Space used (total) are 0. The other nodes are also having 0 Space used (live) and Space used (total).
However, when I perform SELECT, I can get multiple rows which are being inserted previously.
May I know, why are my Space used (live) and Space used (total) 0 for all nodes?
Your Memtables have not yet flushed to disk. Flush is generally triggered by a few things:
The memtable reaching the max threshold size
A commit log segment responsible for data in that memtable expiring
User calling nodetool flush
If you insert 40k rows and then do nothing, as long as they fit comfortably in memory, they will stay in memory. You will see no permanent disk usage for those rows since there is no on-disk sstable holding their values.
The persistence for those rows is guaranteed by the commit-log, which stores mutations in the order in which they occurred on the disk and can be replayed in case of node failure. The commit-log is a rolling log so when commit-log segement is about to expire, Cassandra will flush the memtable holding the data in that segement to an on-disk sstable.

What does different fields of nodetool cfstats mean?

When I am using below command.
nodetool -h localhost -p 7199 cfstats demodb I came up with the following results. I cant get upto any conclusion from the following results. I could not decide whether my two node clustered Cassandra is performing good or need to be tuned.
Keyspace: demodb
Read Count: 81361
Read Latency: 0.04145315323066334 ms.
Write Count: 23114
Write Latency: 0.06758518646707623 ms.
Pending Tasks: 0
Table: schema1
SSTable count: 0
Space used (live), bytes: 0
Space used (total), bytes: 3560
SSTable Compression Ratio: 0.0
Number of keys (estimate): 0
Memtable cell count: 5686
Memtable data size, bytes: 3707713
Memtable switch count: 5
Local read count: 81361
Local read latency: 0.000 ms
Local write count: 23114
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 0
Compacted partition minimum bytes: 0
Compacted partition maximum bytes: 0
Compacted partition mean bytes: 0
Average live cells per slice (last five minutes): 1.0
Average tombstones per slice (last five minutes): 0.0
As far as i see, I can tell you that your data in the table schema1 is still fully in memory "SSTable count: 0". At this point there is nothing to optimize. The statistics will be more helpful when you have more data and your in-memory state is flushed to disk. It is to early to optimize something.

Resources