How to compact sstables offline? - cassandra

I am using CQLSSTableWriter to write sstables in an offline/bulk mode. The order is not enforced during the write operation. Is it possible to enforce a compaction before I use sstableloader to load data into cassandra cluster?

SStables are immutable in nature, also sstable is not just a file but its having data with metadata.
Meta data includes index.db etc. check datastax docs for more details.
so we should not do manually as the token range in each sstable will change during the compaction and the resultant sstable will not be having data evenly distributed.
Also compaction will leads to larger sstable and the node which will be having that sstable will become the hotspot.
it will be better/recommended not to do it manually.

You can drain the node via nodetool drain and then safely continue your compactions.

Related

Cassandra is tracking the number of deletion in sstables to trigger a compaction?

Wonder whether Cassandra is triggering a compaction (STCS or LCS) based on the number of deletion in sstables? In LCS, as I know, cassandra compacts sstables to next level only if a level is full. But the size of a deletion recored is usually small. If just consider the sstable size to decide whether a level is full or not, it may take long for a tombstone to be reclaimed.
I know rocksdb is triggering compaction using the number of deletions in sstables. This will help to reduce tombstone.
Yes, Cassandra's compaction can be triggered by the number of deletion (a.k.a. tombstones)
Have a look to the common options for all the compaction strategies and specifically this param:
tombstone_threshold
How much of the sstable should be tombstones for us to consider doing a single sstable compaction of that sstable.
See doc here: https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/index.html

Why is forcing major compaction on a table not ideal?

Consider a scenario where a table partitions with thousands of deleted rows. When reading from the table, Cassandra has to scan over thousands of deleted rows before it gets to the live rows.
A common workaround is to manually run a compaction on a node to forcibly get rid of tombstones.
What are the downsides of forcing major compaction on a table (with nodetool compact) and what is the best practice recommendation?
Background
When forcing a major compaction on a table configured with the SizeTieredCompactionStrategy (STCS), all the SSTables on the node get compacted together into a single large SSTable. Due to its size, the resulting SSTable will likely never get compacted out since similar-sized SSTables are not available as compaction candidates. This creates additional issues for the nodes since tombstones do not get evicted and keep accumulating, affecting the cluster's performance.
Caveats
We understand that cluster administrators use major compaction as a way of evicting tombstones which have accumulated as a result of high-delete workloads which in most cases is due to an incorrect data model.
The recommendation in this post does NOT constitute a solution to the underlying issue users face. It should not be considered a long-term fix to the data model problem.
Recommendation
In Apache Cassandra 2.2, CASSANDRA-7272 introduced a huge improvement which splits the output of nodetool compact into multiple files which are 50% then 25% then 12.5% of the original table size until the smallest chunk is 50MB for tables using STCS.
When using major compaction as a last resort for evicting tombstones, use the --split-output (or shorthand -s) to take advantage of this new feature:
$ nodetool compact --split-output -- <keyspace> <table>
NOTE - This feature is only available from Cassandra 2.2 and newer versions.
Also see How to split large SSTables on another server. Cheers!

Cassandra read path

I'm learning Cassandra's read path.
According to some sources:
"When Cassandra receives the read request, data will be searched first in the Memtable, then data will be searched in SSTables and if data exists it is returned"
Also, I know that Memtables are periodically flushed to SSTables on disk.
My questions:
are memtables fully deleted from RAM after flushing to SSTables?
Suppose, we have a read request on a node. Node contains both memtables and SSTables.
Is it possible for Cassandra to get required data only from Memtables without accessing SSTables? If yes, when it is possible and how can Cassandra determine that required data stored only in Memtables and there are no other related data stored on disk (SSTables)?
Short answer to 2nd question - No. Cassandra will always check SSTables even if the data is in the memtable. The reason for that is the data in the memtable could be older than data in the SSTable. For example, if you're explicitly set write timestamp for records, or data is replayed from the hints on other node. When memtable is flushed, data is removed from memory. But in some cases you can use row cache if you have data that is often accessed.
You can read more about read path in the DSE arch guide.

Leveled Compaction Strategy with low disk space

We have Cassandra 1.1.1 servers with Leveled Compaction Strategy.
The system works so that there are read and delete operations. Every half a year we delete approximately half of the data while new data comes in. Sometimes it happens that disk usage goes up to 75% while we know that real data take about 40-50% other space is occupied by tombstones. To avoid disk overflow we force compaction of our tables by dropping all SSTables to Level 0. For that we remove .json manifest file and restart Cassandra node. (gc_grace option does not help since compaction starts only after level is filled)
Starting from Cassandra 2.0 the manifest file was moved to sstable file itself: https://issues.apache.org/jira/browse/CASSANDRA-4872
We are considering migration to Cassandra 2.x while we afraid we won't have such a possibility as forcing leveled compaction any more.
My question is: how could we achieve that our table has a disk space limit e.g. 150GB? (When the limit is exceeded it triggers compaction automatically). The question is mostly about Cassandra 2.x. While any alternative solutions for Cassandra 1.1.1 are also welcome.
It seems like I've found the answers myself.
There is tool sstablelevelreset starting from 2.x version which does similar level reset as deletion of manifest file. The tool is located in tools directory of Cassandra distribution e.g. apache-cassandra-2.1.2/tools/bin/sstablelevelreset.
Starting from Cassandra 1.2 (https://issues.apache.org/jira/browse/CASSANDRA-4234) there is tombstone removal support for Leveled Compaction Strategy which supports tombstone_threshold option. It gives the possibility of setting maximal ratio of tombstones in a table.

Cassandra control SSTable size

Is there a way I could control max size of a SSTable, for example 100 MB so that when there is actually more than 100MB of data for a CF, then Cassandra creates next SSTable?
Unfortunately the answer is not so simple, the sizes of your SSTables will be influenced by your compaction Strategy and there is no direct way to control your max sstable size.
SSTables are initially created when memtables are flushed to disk as SSTables. The size of these tables initially depends on your memtable settings and the size of your heap (memtable_total_space_in_mb being a large influencer). Typically these SSTables are pretty small. SSTables get merged together as part of a process called compaction.
If you use Size-Tiered Compaction Strategy you have an opportunity to have really large SSTables. STCS will combine SSTables in a minor compaction when there are at least min_threshold (default 4) sstables of the same size by combining them into one file, expiring data and merging keys. This has the possibility to create very large SSTables after a while.
Using Leveled Compaction Strategy there is a sstable_size_in_mb option that controls a target size for SSTables. In general SSTables will be less than or equal to this size unless you have a partition key with a lot of data ('wide rows').
I haven't experimented much with Date-Tiered Compaction Strategy yet, but that works similar to STCS in that it merges files of the same size, but it keeps data together in time order and it has a configuration to stop compacting old data (max_sstable_age_days) which could be interesting.
The key is to find the compaction strategy which works best for your data and then tune the properties around what works best for your data model / environment.
You can read more about the configuration settings for compaction here and read this guide to help understand whether STCS or LCS is appropriate for you.

Resources