Cassandra snapshot folder size is too high - cassandra

The folder size of one of the sstable after taking snapshot is 1TB
$ du -sh *
1001 GB user-820d7e50c85111eab874f3e361ecc166
Surprisingly, size of the cassandra snapshot folder in the sstable folder was 785G (snp-2021-04-11-0400-01)
and once I deleted the snapshot folder, the size of sstable folder dropped to 281 GB
-bash-4.2$ du -sh *
281G user-820d7e50c85111eab874f3e361ecc166
My question is why size of snapshot folder is more than twice of data folder? is it normal in Cassandra?
My assumption was Cassandra creates a copy of sstables to the snapshot folder with the same size.

Cassandra doesn't copy the SSTables, but really creating a hard link (just another name) from original SSTable into the snapshots folder. But when compaction happens, original SSTable is deleted, but it's kept on the disk because it has another name. And if you're doing snapshots often, and compaction happens often too, then you'll have a lot of links to old SSTables.
The solution is to periodically cleanup snapshots - you can use nodetool clearsnapshot command to delete selected snapshots (old backups for example)

Related

How do I find the size of a table in Cassandra Keyspaces?

I have used this command to get the size of a table
nodetool cfstats -- <Keyspace>.<table name>
But, I am not sure whether it is right or wrong as if I upload more rows into my table, "Space Used" is not changing only "Memtable data size" is changing.
I just wanna know how to find the size of a table in cassandra keyspaces.
When a node receives a mutation (a write), it first gets persisted in a commit log then written to a memtable. The mutations/writes are not persisted to a SSTable until the memtable is flushed to disk.
If you want to force the memtables to be flushed to disk, run:
$ nodetool flush -- ks_name table_name
For more info, see How data is written in Cassandra. Cheers!

How to purge massive data from cassandra when node size is limited

I have a Cassandra cluster (with cassandra v3.11.11) with 3 data centers with replication factor 3. Each node has 800GB nvme but one of the data table is taking up 600GB of data. This results in below from nodetool status:
DataCenter 1:
node 1: 620GB (used) / 800GB (total)
DataCenter 2:
node 1: 610GB (used) / 800GB (total)
DataCenter 3:
node 1: 680GB (used) / 800GB (total)
I cannot install another disk to the server and the server does not have internet connection at all for security reasons. The only thing I can do is to put some small files like scripts into it.
ALL cassandra tables are set up with SizeTieredCompactionStrategy so I end up with very big files ~one single 200GB sstable (full table size 600). Since the cluster is running out of space, I need to free the data which will introduce tombstones. But I have only 120GB of free space left for compaction or garbage collection. That means I can only retain at most 120GB data, and to be safe for the Cassandra process, maybe even less.
I am executed nodetool cleanup so I am not sure if there is more room to free.
Is there any way to free 200GB of data from Cassandra? or I can only retain less than 120 GB data?
(if we remove 200GB data, we have 400GB data left, compaction/gc will need 400+680GB data which is bigger than disk space..)
Thank you!
I personally would start with checking if the whole space is occupied by actual data, and not by snapshots - use nodetool listsnapshots to list them, and nodetool clearsnapshot to remove them. Because if you did snapshot for some reason, then after compaction they are occupying the space as original files were removed.
The next step would be to try to cleanup tombstones & deteled data from the small tables using nodetool garbagecollect, or nodetool compact with -s option to split table into files of different size. For big table I would try to use nodetool compact with --user-defined option on the individual files (assuming that there will be enough space for them). As soon as you free > 200Gb, you can sstablesplit (node should be down!) to split big SSTable into small files (~1-2Gb), so when node starts again, the data will be compacted

increased disk space usage after nodetool cleanup - Apache Cassandra

We have an Apache Cassandra (version 3.11.4) cluster in production with 5-5 nodes in two DCs. We've just added the last two nodes recently and after the repairs has finished, we started the cleanup 2 days ago. The nodes are quite huge, /data has 2.8TB mounted disk space, Cassandra used around 48% of it before the cleanup.
Cleanup finished (I don't think it broke, no errors in log, and nodetool compactionstats says 0 pending tasks) on the first node after ~14 hours and during the cleanup the disk usage increased up to 81% and since then never gone back.
Will Cassandra clean it up and if yes, when, or do we have to do something manually? Actually we don't find any tmp files that could be removed manually, so we have no idea now. Did anyone met this usecase and has a solution?
Thanks in advance!
Check the old snapshots - most probably you had many snapshots (from backups, or truncated, or removed tables) that were a hard links to the files with data (and not consuming the space), and after nodetool cleanup, the data files were rewritten, and new files were created, while hard links still pointing to the original files, consuming the disk space. Use nodetool listsnapshots to get a list of existing snapshots, and nodetool clearsnapshot to remove not necessary snapshots.

Cassandra backup: plain copy disk files vs snapshots

We are planning to deploy a Cassandra cluster with 100 virtual nodes.
To store maximally 1TB (compressed) data on each node. We're going to use (host-) local SSD disks.
The infrustructure team is used to plainly backing up the whole partitions.
We've come across Cassandra Snapshots.
What is the difference between plainly copying the whole disk vs. Cassandra snapshots?
- Is there a size difference?
- Using whole partition backups, also unnecessarily saves uncompressed data that are being compacted, is that the motive behind snapshots?
There are few benefits of using snapshots:
Snapshot command will flush the memtable to ssTables and then creates snapshots.
Nodetool can be used to restore the snapshots.
Incremental backup functionality can also be leveraged.
Snapshots create hardlink of your data so it is much faster.
Note: Cassandra can only restore data from a snapshot when the table schema exists. It is recommended that you also backup the schema.
In both it is to be made sure that operation (snapshot or plain copy) run at same time on all the nodes.

Cassandra node almost out of space, but nodetool cleanup is increasing disk use?

One of our nodes was at 95% disk use and we added another node to the cluster to hopefully rebalance but the disk space didn't drop on the node. I tried doing nodetool cleanup assuming that excess keys were on the node, but the disk space is increasing! Will cleanup actually reduce the size?
Yes it will, but you have to be careful because a compaction is calculated and it generates temporary files and tmp link files that will increase disk space until the cleaned up compacted table is calculated.
So I would go into your data directory and figure out what your keyspace sizes are using
du -h -s *
Then individually clean up the smaller keyspaces (you can specify a keyspace in the nodetool cleanup command with nodetool cleanup ) until you have some overhead. To get an idea of how much space is being freed, tail the log and cat/grep for cleaned compactions:
tail <system.log location> | grep 'eaned'
I'd recommend you don't try to cleanup a keyspace that is more that half the size of your remaining disk space. Hopefully that is possible.
If you don't have enough space you'll have to shut down the node, attach a bigger disk, copy the data files over to the bigger disk, repoint the yaml to the new data directories, then restart up. This is useful for things like SSDs that are expensive and small, but the main spinning disks are cheaper and bigger.

Resources