Two Cassandra nodes with a replication factor of 2 yet different sizes in storage used - cassandra

We have OpenNMS sending graph data to our Cassandra/Newts cluster which is comprised of 2 Cassandra nodes. I've set the replication factor to 2 for the keyspace "newts".
I started the nodes at the same time and left them up for some time, i then ran "nodetool cfstats newts" on each node and both nodes have the exact same write count.
If i however go in to the data directory "/db/newts" of each node and run "du -h" i can see the following:
Node1 storage used: 36K
Node2 storage used: 12M
How can they differ in size if i set the replication factor to 2? I know that they're connected to the same cluster via "nodetool status" which is showing both nodes as "UN" (Up/Normal).

In Cassandra data is not written directly to the hard drive, it lives in:
Commit log >> Memtable >> SSTables
Here you can find a good documentation on how data is written.
You can run:
nodetool flush
which will flush the memtables into sstables. After that you should be able to see the same sstable size on both of your nodes.

Related

How to purge massive data from cassandra when node size is limited

I have a Cassandra cluster (with cassandra v3.11.11) with 3 data centers with replication factor 3. Each node has 800GB nvme but one of the data table is taking up 600GB of data. This results in below from nodetool status:
DataCenter 1:
node 1: 620GB (used) / 800GB (total)
DataCenter 2:
node 1: 610GB (used) / 800GB (total)
DataCenter 3:
node 1: 680GB (used) / 800GB (total)
I cannot install another disk to the server and the server does not have internet connection at all for security reasons. The only thing I can do is to put some small files like scripts into it.
ALL cassandra tables are set up with SizeTieredCompactionStrategy so I end up with very big files ~one single 200GB sstable (full table size 600). Since the cluster is running out of space, I need to free the data which will introduce tombstones. But I have only 120GB of free space left for compaction or garbage collection. That means I can only retain at most 120GB data, and to be safe for the Cassandra process, maybe even less.
I am executed nodetool cleanup so I am not sure if there is more room to free.
Is there any way to free 200GB of data from Cassandra? or I can only retain less than 120 GB data?
(if we remove 200GB data, we have 400GB data left, compaction/gc will need 400+680GB data which is bigger than disk space..)
Thank you!
I personally would start with checking if the whole space is occupied by actual data, and not by snapshots - use nodetool listsnapshots to list them, and nodetool clearsnapshot to remove them. Because if you did snapshot for some reason, then after compaction they are occupying the space as original files were removed.
The next step would be to try to cleanup tombstones & deteled data from the small tables using nodetool garbagecollect, or nodetool compact with -s option to split table into files of different size. For big table I would try to use nodetool compact with --user-defined option on the individual files (assuming that there will be enough space for them). As soon as you free > 200Gb, you can sstablesplit (node should be down!) to split big SSTable into small files (~1-2Gb), so when node starts again, the data will be compacted

why full replication Cassandra cluster have node data size difference

I have a 3-node cassandra cluster (version 3.11.11) with replication factor 3. only 2 of the nodes are receiving requests, and Node3 only sync with the other 2 nodes.
In theory, each node should have the same data size. But in practice, I end up with nodes with different data sizes as shown in the picture.
we have daily nodetool repair, operations like compaction are done automatically with default settings.
What can be the reason for the size difference?
It finally ends up how data gets compacted in the long run. Since compaction is local process and how sstables can be stacked up cannot be guaranteed. So I dont see any abbreviation here. Theory just say all nodes will have same data logically but physically it may vary. For example in node3 you may have old sstables that are not getting compacted due to size (if using STCS) and in other nodes they have compacted and reduced the size of those nodes.

Regarding Cassandra Table Size

How to calculate the total size of a keyspace in cassandra?
I have tried the nodetool cfstats and nodetool tablestats command. It is giving a lot of information, but I am not sure which field provides the exact information.
Can anybody suggest any method to find out the size of a keyspace and a table in Cassandra?
"nodetool tablestats" replaces the older command "nodetool cfstats". In other words both are the same. Output of this command lists the size of each of the tables within a keyspace.
Amongst the output, you are looking for "Space used (total)" value. Its the Total number of bytes of disk space used by SSTables belonging to this table, including obsolete SSTables waiting to be GCd.
Since there could be multiple tables within a keyspace, you need to sum up "Space used (total)" for all tables belonging to a keyspace to get size occupied by keyspace.
Another alternative if you have SSH access to the nodes, is to get to Cassandra Data directory and issue "du -h" to get the size of each keyspace directory. Again sum up the directory size on all nodes for that keyspace (ignoring the snapshot sizes).

Add new data center to cassandra cluster

I have the following datacenter-aware configuration:
Primary Datacenter: 3 node cluster, RF=3
Data size is more than 100GB per node
I would lite to add new data center (Secondary Datacenter: 3 node cluster, RF=3)
I know how do that.
But the problem is: How sync data from primary to secondary quickly?
I tried "nodetool repair" (with various keys) and "nodetool rebuild" but it takes much time near 10 hours.
I used cassandra 2.1.15 version
nodetool rebuild is usually the fastest way to sync new nodes.
To speed it up you could try a couple things:
If you have a lot of network bandwidth between the data centers, try increasing the cassandra.yaml parameter inter_dc_stream_throughput_outbound_megabits_per_sec. This defaults to 200 Mbs, so you could try a higher value.
You could also use a smaller replication factor than 3 in the new data center, for example start with 1 to get it up and running as quickly as possible, then later alter the keyspace to a higher value and use repair to create the extra replicas.

Major compaction in cassandra

I have a 4 node brisk cluster with 2 Cassandra nodes in Cassandra DC and 2 brisk nodes in Brisk DC. I stress tested this set up using stress tool which is being shipped along with cassandra for 10 Million writes
On executing
$ ./nodetool -h x.x.x.x compactionstats
pending tasks: 17
compaction type keyspace column family bytes compacted bytes total progress
Major Keyspace1 Standard1 45172473 60278166 74.94%
AFAIK major compaction is manually triggered from node tool. But I'm able to see that it has been triggered automatically.
Is this a desired behavior? If so what are all the situations this may occur?
Regards,
Tamil
From the doc:
Compactions are triggered when at least N SStables have been flushed
to disk, where N is tunable and defaults to 4.
"Minor" compactions merge sstables of similar size; "major" compactions merge all sstables in a given ColumnFamily.
Again from the doc:
A major compaction is triggered either via nodeprobe, or automatically:
Nodeprobe sends TreeRequest messages to all neighbors of the target
node: when a node receives a TreeRequest, it will perform a readonly
compaction to immediately validate the column family.
Automatic compactions will also validate a column family and broadcast
TreeResponses, but since TreeRequest messages are not sent to
neighboring nodes, repairs will only occur if two nodes happen to
perform automatic compactions within TREE_STORE_TIMEOUT of one
another.
You may find more info here and here

Resources