How to purge massive data from cassandra when node size is limited - cassandra

I have a Cassandra cluster (with cassandra v3.11.11) with 3 data centers with replication factor 3. Each node has 800GB nvme but one of the data table is taking up 600GB of data. This results in below from nodetool status:
DataCenter 1:
node 1: 620GB (used) / 800GB (total)
DataCenter 2:
node 1: 610GB (used) / 800GB (total)
DataCenter 3:
node 1: 680GB (used) / 800GB (total)
I cannot install another disk to the server and the server does not have internet connection at all for security reasons. The only thing I can do is to put some small files like scripts into it.
ALL cassandra tables are set up with SizeTieredCompactionStrategy so I end up with very big files ~one single 200GB sstable (full table size 600). Since the cluster is running out of space, I need to free the data which will introduce tombstones. But I have only 120GB of free space left for compaction or garbage collection. That means I can only retain at most 120GB data, and to be safe for the Cassandra process, maybe even less.
I am executed nodetool cleanup so I am not sure if there is more room to free.
Is there any way to free 200GB of data from Cassandra? or I can only retain less than 120 GB data?
(if we remove 200GB data, we have 400GB data left, compaction/gc will need 400+680GB data which is bigger than disk space..)
Thank you!

I personally would start with checking if the whole space is occupied by actual data, and not by snapshots - use nodetool listsnapshots to list them, and nodetool clearsnapshot to remove them. Because if you did snapshot for some reason, then after compaction they are occupying the space as original files were removed.
The next step would be to try to cleanup tombstones & deteled data from the small tables using nodetool garbagecollect, or nodetool compact with -s option to split table into files of different size. For big table I would try to use nodetool compact with --user-defined option on the individual files (assuming that there will be enough space for them). As soon as you free > 200Gb, you can sstablesplit (node should be down!) to split big SSTable into small files (~1-2Gb), so when node starts again, the data will be compacted

Related

why full replication Cassandra cluster have node data size difference

I have a 3-node cassandra cluster (version 3.11.11) with replication factor 3. only 2 of the nodes are receiving requests, and Node3 only sync with the other 2 nodes.
In theory, each node should have the same data size. But in practice, I end up with nodes with different data sizes as shown in the picture.
we have daily nodetool repair, operations like compaction are done automatically with default settings.
What can be the reason for the size difference?
It finally ends up how data gets compacted in the long run. Since compaction is local process and how sstables can be stacked up cannot be guaranteed. So I dont see any abbreviation here. Theory just say all nodes will have same data logically but physically it may vary. For example in node3 you may have old sstables that are not getting compacted due to size (if using STCS) and in other nodes they have compacted and reduced the size of those nodes.

How to rebalance and reclaim disk space after adding a Cassandra node

I have a 12 node cassandra cluster which is high on data load and disc space is almost nearing full capacity. I have expanded the cluster by adding 1 node and planning to add couple more.
I could find that the data load got reduced after adding the new node. However, the disc space has not reduced.
I fear running nodetool repair as this may require additional disc space and the available space may not be sufficient.
There are suggestions to use nodetool cleanup, looks like this will also cause temporary increase in disk space.
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/tools/toolsCleanup.html
Please suggest if there are better ways to cleanup old data from other nodes to reclaim disc space
Unfortunately, nodetool cleanup is the only way you could evict data that a node no longer owns after nodes are added to a cluster in order to reclaim disk space.
In order for cleanup to work, it temporarily uses more space since it needs to re-compact SSTables to new ones. This can be problematic if you have really large SSTables that are several GBs in size and don't have a lot of disk space left.
You can workaround this problem for large SSTables which are configured with SizeTieredCompactionStrategy by splitting them into smaller files on another server using the sstablesplit tool. I've documented the instructions in https://community.datastax.com/questions/6415/. Cheers!

Cassandra compaction: does replication factor have any influence?

Let’s assume that the total disk usage of all keyspaces is 100GB before replication. The replication factor is 3. Making the total physical disk usage = 100GB x 3 = 300GB.
We use the default compaction strategy (size-tiered) and let’s assume the worse case where Cassandra needs as much free space as the the data to complete the compaction. Does Cassandra needs 100GB (before replication) or 300GB (100GB x3 with replication)?
In other words, when Cassandra needs free disk space for performing compaction, does the replication factor has any influence?
Compaction in Cassandra is local to a Node.
Now let's say you have a 3 node cluster, replication factor is also 3, and the original data size is 100GB. This means that each node has 100GB worth of data.
Hence on each node, I will need 100GB free space to compact the data present on that node.
TLDR; Free space required for Compaction is equal to the total data present on the node.
Because data is replicated between the nodes, every node will need to have up to 100Gb free space - so it's total of the 300Gb, but not on one node...

Cassandra node almost out of space, but nodetool cleanup is increasing disk use?

One of our nodes was at 95% disk use and we added another node to the cluster to hopefully rebalance but the disk space didn't drop on the node. I tried doing nodetool cleanup assuming that excess keys were on the node, but the disk space is increasing! Will cleanup actually reduce the size?
Yes it will, but you have to be careful because a compaction is calculated and it generates temporary files and tmp link files that will increase disk space until the cleaned up compacted table is calculated.
So I would go into your data directory and figure out what your keyspace sizes are using
du -h -s *
Then individually clean up the smaller keyspaces (you can specify a keyspace in the nodetool cleanup command with nodetool cleanup ) until you have some overhead. To get an idea of how much space is being freed, tail the log and cat/grep for cleaned compactions:
tail <system.log location> | grep 'eaned'
I'd recommend you don't try to cleanup a keyspace that is more that half the size of your remaining disk space. Hopefully that is possible.
If you don't have enough space you'll have to shut down the node, attach a bigger disk, copy the data files over to the bigger disk, repoint the yaml to the new data directories, then restart up. This is useful for things like SSDs that are expensive and small, but the main spinning disks are cheaper and bigger.

Datastax Cassandra Remove and cleanup one column family

After some IT cleanup, we are noticing that we should probably do a full cleanup / restore for one column family. We believe that Cassandra has duplicate data that it is not cleaning up. Is it possible to clear out and just have Cassandra rebuild a single column family from scratch or a snapshot?
During an upgrade some of the nodes decided to rejoin the cluster, rather than just restarting. During that process nodetool netstats showed that nodes where transferring new data file into the original nodes. The cluster is stable, but the disk usage grew substantially. I am thinking that we will migrate to a new ring, but in the mean time I would like to see if I can reduce some disk usage. The ring is stable, and repairs are looking fine.
If we are able to cleanup one cf it would relieve disk space usage a ton.
nodetool cleanup is not reducing the size of the sstables.
If we have a new node join the cluster it is using approximately 50% of the disk space as the other nodes.
We could do the dance of nodetool decommision && nodetool join, but that is not going to be fun :)
We have validated that the data in the ring is consistent, and repairs show that the data is consistent across the ring.
Adding a new node and successfully running repair means the data for the partition range(s) that has(have) been assigned to that node has been streamed to the new node.
If, after this has happened, you run nodetool cleanup, any data from the other nodes that is no longer needed is cleaned up.
If you still see that some of your nodes have more data than others, this may be because you have some wider rows in some of your partitions, or because your nodes are unbalanced. There should not be any data duplication scenario (if you can prove this then it would be jira worthy).
You can run rebalance in OpsCenter or manually re-assign your tokens if you are looking to spread out the data more evenly across your nodes (or design your data model to avoid the aforementioned wide rows).
Use nodetool compact to clean up all the tombstones and compacts all the updated records into single record.
{nodetool compact}

Resources