Was reading in Cassandra Documentation that:
Cassandra takes a snapshot of the keyspace before dropping it. In Cassandra 2.0.4 and earlier, the user was responsible for removing the snapshot manually.
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/drop_keyspace_r.html
This would imply that in versions after Cassandra 2.0.4, this is done automatically. If so, what configuration parameter (if any) sets the time before snapshot is automatically removed when doing a DROP KEYSPACE?
For example, in the case of DROP TABLE, gc_grace_seconds is
the number of seconds after data is marked with a tombstone (deletion marker) before it is eligible for garbage-collection.
I believe this reference is not accurate, Cassandra does not automatically clean up snapshots for you.
Cassandra won’t clean up the snapshots for you
http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html#snapshot-before-compaction
You can remove snapshots using the nodetool clearsnapshot command, or manually delete the directories & files yourself (this is safe as snapshots are just file hard-links).
Note also that gc_grace_seconds is not related to snapshots, it is used during compactions only.
Related
We have an Apache Cassandra (version 3.11.4) cluster in production with 5-5 nodes in two DCs. We've just added the last two nodes recently and after the repairs has finished, we started the cleanup 2 days ago. The nodes are quite huge, /data has 2.8TB mounted disk space, Cassandra used around 48% of it before the cleanup.
Cleanup finished (I don't think it broke, no errors in log, and nodetool compactionstats says 0 pending tasks) on the first node after ~14 hours and during the cleanup the disk usage increased up to 81% and since then never gone back.
Will Cassandra clean it up and if yes, when, or do we have to do something manually? Actually we don't find any tmp files that could be removed manually, so we have no idea now. Did anyone met this usecase and has a solution?
Thanks in advance!
Check the old snapshots - most probably you had many snapshots (from backups, or truncated, or removed tables) that were a hard links to the files with data (and not consuming the space), and after nodetool cleanup, the data files were rewritten, and new files were created, while hard links still pointing to the original files, consuming the disk space. Use nodetool listsnapshots to get a list of existing snapshots, and nodetool clearsnapshot to remove not necessary snapshots.
We recently had a disk fail in one of our Cassandra node (its a 5 Cassandra 2.2 cluster with replication factor of 3). It took about a week or more to perform a full repair on that node. Each node contains 3/5 of the data and doing nodetool repair repaired 3/5 of the token ranges across all nodes. Now that its been repaired it will most likely repair faster since it did a incremental repair. I am wondering if its a good idea to perform periodic repairs on all nodes using nodetool repair -pr (We are at 2.2 and I think incremental repair is default in 2.2).
I think its a good idea because if performed periodically it will take less time to repair as it only needs to repair non repaired SStables. We also might've had instances where the nodes may've been down for more than the hinted handoff window and we probably didn't do anything about it.
Yes, its good practice to run scheduled incremental repair. Run repair frequently enough that every node is repaired before reaching the time specified in the gc_grace_seconds setting.
Also, it would be great if you run incremental repair on a frequent basis, combined with full repair less frequently like once per month/week. incremental repair would repair the SSTable which was not marked as repaired before, but full repair could take care more comprehensive case like SSTable rotting. check the reference from datastax:https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesWhen.html
We are planning to deploy a Cassandra cluster with 100 virtual nodes.
To store maximally 1TB (compressed) data on each node. We're going to use (host-) local SSD disks.
The infrustructure team is used to plainly backing up the whole partitions.
We've come across Cassandra Snapshots.
What is the difference between plainly copying the whole disk vs. Cassandra snapshots?
- Is there a size difference?
- Using whole partition backups, also unnecessarily saves uncompressed data that are being compacted, is that the motive behind snapshots?
There are few benefits of using snapshots:
Snapshot command will flush the memtable to ssTables and then creates snapshots.
Nodetool can be used to restore the snapshots.
Incremental backup functionality can also be leveraged.
Snapshots create hardlink of your data so it is much faster.
Note: Cassandra can only restore data from a snapshot when the table schema exists. It is recommended that you also backup the schema.
In both it is to be made sure that operation (snapshot or plain copy) run at same time on all the nodes.
We have Cassandra 1.1.1 servers with Leveled Compaction Strategy.
The system works so that there are read and delete operations. Every half a year we delete approximately half of the data while new data comes in. Sometimes it happens that disk usage goes up to 75% while we know that real data take about 40-50% other space is occupied by tombstones. To avoid disk overflow we force compaction of our tables by dropping all SSTables to Level 0. For that we remove .json manifest file and restart Cassandra node. (gc_grace option does not help since compaction starts only after level is filled)
Starting from Cassandra 2.0 the manifest file was moved to sstable file itself: https://issues.apache.org/jira/browse/CASSANDRA-4872
We are considering migration to Cassandra 2.x while we afraid we won't have such a possibility as forcing leveled compaction any more.
My question is: how could we achieve that our table has a disk space limit e.g. 150GB? (When the limit is exceeded it triggers compaction automatically). The question is mostly about Cassandra 2.x. While any alternative solutions for Cassandra 1.1.1 are also welcome.
It seems like I've found the answers myself.
There is tool sstablelevelreset starting from 2.x version which does similar level reset as deletion of manifest file. The tool is located in tools directory of Cassandra distribution e.g. apache-cassandra-2.1.2/tools/bin/sstablelevelreset.
Starting from Cassandra 1.2 (https://issues.apache.org/jira/browse/CASSANDRA-4234) there is tombstone removal support for Leveled Compaction Strategy which supports tombstone_threshold option. It gives the possibility of setting maximal ratio of tombstones in a table.
After some IT cleanup, we are noticing that we should probably do a full cleanup / restore for one column family. We believe that Cassandra has duplicate data that it is not cleaning up. Is it possible to clear out and just have Cassandra rebuild a single column family from scratch or a snapshot?
During an upgrade some of the nodes decided to rejoin the cluster, rather than just restarting. During that process nodetool netstats showed that nodes where transferring new data file into the original nodes. The cluster is stable, but the disk usage grew substantially. I am thinking that we will migrate to a new ring, but in the mean time I would like to see if I can reduce some disk usage. The ring is stable, and repairs are looking fine.
If we are able to cleanup one cf it would relieve disk space usage a ton.
nodetool cleanup is not reducing the size of the sstables.
If we have a new node join the cluster it is using approximately 50% of the disk space as the other nodes.
We could do the dance of nodetool decommision && nodetool join, but that is not going to be fun :)
We have validated that the data in the ring is consistent, and repairs show that the data is consistent across the ring.
Adding a new node and successfully running repair means the data for the partition range(s) that has(have) been assigned to that node has been streamed to the new node.
If, after this has happened, you run nodetool cleanup, any data from the other nodes that is no longer needed is cleaned up.
If you still see that some of your nodes have more data than others, this may be because you have some wider rows in some of your partitions, or because your nodes are unbalanced. There should not be any data duplication scenario (if you can prove this then it would be jira worthy).
You can run rebalance in OpsCenter or manually re-assign your tokens if you are looking to spread out the data more evenly across your nodes (or design your data model to avoid the aforementioned wide rows).
Use nodetool compact to clean up all the tombstones and compacts all the updated records into single record.
{nodetool compact}