Cassandra dropped keyspaces still on HDD - cassandra

I noticed an increase in the number of open files on my cassandra cluster and went to check the health of it. Nodetool status reported only 300gb in use per node of the 3TB each has allocated.
Shortly there after i began to see HEAP OOM errors showing up in the cassandra logs.
These nodes had been running for 3-4 months no issue, but had a series of test data populate and then dropped from them.
After checking the harddrives via the df command i was able to determine they were all between 90-100% filled in a jboded scenario.
edit: further investigation shows that the remaining files are in the 'snapshot' subfolder and the data subfolder itself has no db tables.
My question is, has anyone seen this? Why did compaction not free these tombstones? Is this a bug?

Snapshots aren't tombstones - they are a backup of your data.
As Highstead says you can drop any unused snapshots via the clearsnapshot command.
You can disable the automatic snapshot facility via the cassandra.yaml
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__auto_snapshot

Also check if you may have non-default true for snapshot_before_compaction

Snapshots occur over the lifetime of the cassandra cluster. These snapshots are not captured in a nodetool status but still occupy space. In this case the snapshots consuming all the space were created when a table was dropped.
To retrieve a list of current snapshots use the command nodetool listsnapshots
This feature can be disabled through editing /etc/cassandra/cassandra-env.sh and setting auto_snapshot to false. Alternatively these snapshots can be purged via the command nodetool clearsnapshot <name>.

Related

Disk space of Cassandra node is over 80%

I'm running 12 nodes of Cassandra in AWS EC2 instance, 4 of them are using almost 80% of the disk space, so compaction failed on these nodes, since the type of the server is EC2 instance, I can't add mode disk space to the existing data volume on the fly, I can't ask IT team to add more nodes to scale and spread the clustre as disk space of other nodes is less than 40%, before fixing the unbalanced cluster issue, is there any way to free up some disk space?
My question is how can I find unused sstables and move them to another partition to run compaction and free up some space?
Any other suggestion to free up some disk space.
PS: I already dropped all the snapshots and backups.
If you are using vnodes then data sizes difference should not be that much. Before coming to solution we must find the reason for big difference in data sizes on different nodes.
You must look into logs to see if there is corruption of some big SStable which resulted in compaction failures and increase in data sizes. Or you can find something in your logs which points to the reason in increasing of disk sizes.
We faced an issue in Cassandra 2.1.16 due to some bug it happened that even after compaction old sstable files were not removed. We read the logs and identified the files which can be removed. This is an example where we found the reason of increased data size after reading the logs.
So your must identify the reason before solution. If it is a dire state you can identify keyspaces/tables which are not used during your main traffic and move those sstables in backup and remove those sstables. Once your compaction process is over you can bring them back.
Warning :Test any procedure before trying on production.

Cassandra Load status does not update (nodetool status)

Using the nodetool status I can read out the Load of each node. Adding or removing data from the table should have direct impact on that value. However, the value remains the same, no matter how many times the nodetool status command is executed.
Cassandra documentation states that the Load value takes 90 seconds to update. Even allowing several minutes between running the command, the result is always wrong. The only way I was able to make this value update, was to restart the node.
I don't believe it is relevant, but I should add that I am using docker containers to create the cluster.
In the documentation that you linked, under Load it also says
Because all SSTable data files are included, any data that is not
cleaned up, such as TTL-expired cell or tombstoned data is counted.
It's important to note that when Cassandra deletes data, the data is marked with a tombstone and doesn't actually get removed until compaction. Thus, the load doesn't decrease immediately. You can force a major compaction with nodetool compact.
You can also try flushing memtable if data is being added. Apache notes that
Cassandra writes are first written to the CommitLog, and then to a
per-ColumnFamily structure called a Memtable. When a Memtable is full,
it is written to disk as an SSTable.
So you either need to add more data until the memtable is full, or you can run a nodetool flush (documented here) to force it.

Cassandra data directory does not get updated with deletion

Currently, I am bench marking Cassandra database using YCSB framework. During this time I have performed (batch) insertion and deletion of the data quite regularly.
I am using Truncate command to delete keyspace rows. However, I am noticing that my Cassandra data directory swells up as the experiments.
I have checked and can confirm that even there is no data in the keystore when I checked the size of data directory. Is there a way to initialize a process so that Cassandra automatically release the stored space, or does it happen over time.
When you use Truncate cassandra will create snapshots of your data.
To disable it you will have to set auto_snapshot: false in cassandra.yaml file.
If you are using Delete, then cassandra use tombstone,i.e your data will not get deleted immediately. Data will get deleted once compaction is ran.
To remove previous snapshots one can use nodetool snapshot command.

Disk Space not freed up even after deleting keyspace from cassandra db and compaction

I created a keyspace and a table(columnfamily) within it.
Let's say "ks.cf"
After entering few hundred thousand rows in the columnfamily cf, I saw the disk usage using df -h.
Then, I dropped the keyspace using the command DROP KEYSPACE ks from cqlsh.
After dropping also, the disk usage remains the same. I also did nodetool compact, but no luck.
Can anyone help me out in configuring these things so that disk usage gets freed up after deleting the data/rows ?
Ran into this problem recently. After dropping a table a snapshot is made. This snapshot will allow you to roll this back if this was not intended. If you do however want that harddrive space back you need to run:
nodetool -h localhost -p 7199 clearsnapshot
on the appropriate nodes. Additionally you can turn snapshots off with auto_snapshot: false in your cassandra.yml.
edit: spelling/grammar
If you are just trying to delete rows, then you need to let the deletion step go through the usual delete cycle(delete_row->tombstone_creation->compaction_actually_deletes_the_row).
Now if you completely want to get rid of your keyspace, check your cassandra data folder(it should be specified in your yaml file). In my case it is "/mnt/cassandra/data/". In this folder there is a subfolder for each keyspace(i.e. ks ). You can just completely delete the folder related to your keyspace.
If you want to keep the folder around, it is good to know that cassandra creates a snapshot of your keyspace before dropping it. Basically a backup of all of your data. You can just go into 'ks' folder, and find the snapshots subdirectory. Go into the snapshots subdirectory and delete the snapshot related to your keyspace drop.
Cassandra does not clear snapshots automatically when your are dropping a table or keyspace. if you enabled auto_snapshot in the cassandra.yaml then every time when you drop a table or keyspace Cassandra will capture a snapshot of that table. This snapshot will help you to rollback this table data if this was not done by mistake. If you do clear those table data from disk then you need to run below clearsnapshot command to free space.
nodetool -u XXXX -pw XXXXX clearsnapshot -t snapshotname
You can disable this auto_snapshot feature any time in cassandra.yaml.
nodetool cleanup removes all data from disk that is not needed there any more, i.e. data that the node is not responsible for. (clearsnapshot will clear all snapshots, that may be not what you want.)
The nodetool command can be used to clean up all unused (i.e. previously dropped) tables snapshots in one go (here issued inside a running bitnami/cassandra:4.0 docker container):
$ nodetool --username <redacted> --password <redacted> clearsnapshot --all
Requested clearing snapshot(s) for [all keyspaces] with [all snapshots]
Evidence: space used by old tables snapshots in the dicts keyspace:
a) before the cleanup:
$ sudo du -sch /home/<host_user>/cassandra_data/cassandra/data/data/<keyspace_name>/
134G /home/<redacted>/cassandra_data/cassandra/data/data/dicts/
134G total
b) after the cleanup:
$ sudo du -sch /home/<host_user>/cassandra_data/cassandra/data/data/<keyspace_name>/
4.0K /home/<redacted>/cassandra_data/cassandra/data/data/dicts/
4.0K total
Note: the accepted answer missed the --all switch (and the need to log in), but it still deserves to be upvoted.

Proper cassandra keyspace restore procedure

I am looking for confirmation that my Cassandra backup and restore procedures are sound and I am not missing anything. Can you please confirm, or tell me if something is incorrect/missing?
Backups:
I run daily full backups of the keyspaces I care about, via "nodetool snapshot keyspace_name -t current_timestamp". After the snapshot has been taken, I copy the data to a mounted disk, dedicated to backups, then do a "nodetool clearsnapshot $keyspace_name -t $current_timestamp"
I also run hourly incremental backups - executing a "nodetool flush keyspace_name" and then moving files from the backup directory of each keyspace, into the backup mountpoint
Restore:
So far, the only valid way I have found to do a restore (and tested/confirmed) is to do this, on ALL Cassandra nodes in the cluster:
Stop Cassandra
Clear the commitlog *.log files
Clear the *.db files from the table I want to restore
Copy the snapshot/full backup files into that directory
Copy any incremental files I need to (I have not tested with multiple incrementals, but I am assuming I will have to overlay the files, in sequence from oldest to newest)
Start Cassandra
On one of the nodes, run a "nodetool repair keyspace_name"
So my questions are:
Does the above backup and restore strategy seem valid? Are any steps inaccurate or anything missing?
Is there a way to do this without stopping Cassandra on EVERY node? For example, is there a way to restore the data on ONE node, then somehow make it "authoritative"? I tried this, and, as expected, since the restored data is older, the data on the other nodes (which is newer) overwrites in when they sync up during repair.
Thank you!
There's two ways to restore Cassandra backups without restarting C*:
Copy the files into place, then run "nodetool refresh". This has the caveat that the rows will still be older than tombstones. So if you're trying to restore deleted data, it won't do what you want. It also only applies to the local server (you'll want to repair after)
Use "sstableloader". This will load data to all nodes. You'll need to make sure you have the sstables from a complete replica, which may mean loading the sstables from multiple nodes. Added bonus, this works even if the cluster size has changed. I'm not sure if ordering matters here (that is, I don't know if row timestamps are preserved through the load or if they're redefined during load)

Resources