Is SSTables or Hfiles merged above 1TB? - cassandra

In major compaction merge all sstables from region server (Hbase) and all SSTables form tablet server (Cassandra) into big one.
If period comes is meany SSTables (total space above 1TB) merged into one?
Maby there is some range bounds for SSTable or HFile that splits it to several parts - for ensure that merge operations dont "rewrite all server"?
My question is related to "Compaction" section of this link http://wiki.apache.org/cassandra/MemtableSSTable

From what I found actually SSTable producted by major compaction is not splited in Cassandra. Other LSM-tree databases relies in this case on disturbed file system whitch splits SSTable (or HFile, CellSotre in Hypertable) into several files (for example 64MB) but major compaction either must compact all of this file into new one SSTable (i think is inefficient).
There are tickets in JIRA to improve and redesign compaction for Cassandra as mentioned:
https://issues.apache.org/jira/browse/CASSANDRA-1608
You may also want read my second simiral question:
How much data per node in Cassandra cluster?

Related

How to set TTL on Cassandra sstable

We are using Cassandra 3.10 with 6 nodes cluster.
lately, we noticed that our data volume increased drastically, approximately 4GB per day in each node.
We want to implement a more aggressive retention policy in which we will change the compaction to TWCS with 1-hour window size and set a few days TTL, this can be achieved via the table properties.
Since the ETL should be a slow process in order to lighten Cassandra workload it possible that it will not finish extracting all the data until the TTL, so I wanted to know is there a way for the ETL process to set TTL=0 on entire SSTable once it done extracting it?
TTL=0 is read as a tombstone. When next compacted it would be written tombstone or purged depending on your gc_grace. Other than the overhead of doing the writes of the tombstone it might be easier just to do a delete or create sstables that contain the necessary tombstones than to rewrite all the existing sstables. If its more efficient to do range or point tombstones will depend on your version and schema.
An option that might be easiest is to actually use a different compaction strategy all together or a custom one like https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy. You can then just purge data on compactions that have been processed. This still depends quite a bit on your schema on how hard it would be to mark whats been processed or not.
You should set TTL 0 on table and query level as well. Once TTL expire data will converted to tombstones. Based on gc_grace_seconds value next compaction will clear all the tombstones. you may run major compaction also to clear tombstones but it is not recommended in cassandra based on compaction strategy. if STCS atleast 50% disk required to run healthy compaction.

sstableexpiredblockers: what to do having blocking SSTables in Cassandra?

I have realized that some sstables are not dropped even if they contained only tombstones.
Using a manual major compaction these sstables are removed.
Perhaps it needs to update unchecked_tombstone_compaction to true along with gc_grace_period
I have seen utility sstableexpiredblockers utility that will reveal blocking SSTables that prevent an SSTable from dropping.
During compaction, Cassandra can drop entire SSTables if they contain
only expired tombstones and if it is guaranteed to not cover any data
in other SSTables. This diagnostic tool outputs all SSTables that are
blocking other SSTables from being dropped.
I do not understand:
... if it is guaranteed to not cover any data in other SSTables ...
Since my compaction strategy is TimeWindowCompactionStrategy, all sstables have different time intervals.
I would like to know how handle those blocking SSTables.
Just looked at it(bit late). You may check if the NTP server pool was set up during installation. This can be a big reason of clock not getting synched.
Also, it has been seen that instead of relying on VM, it is a good idea to use a physical machine for this purpose.

Cassandra - What is difference between TTL at table and inserting data with TTL

I have a Cassandra 2.1 cluster where we insert data though Java with TTL as the requirement of persisting the data is 30 days.
But this causes problem as the files with old data with tombstones is kept on the disk. This results in disk space being occupied by data which is not required. Repairs take a lot of time to clear this data (upto 3 days on a single node)
Is there a better way to delete the data?
I have come across this on datastax
Cassandra allows you to set a default_time_to_live property for an entire table. Columns and rows marked with regular TTLs are processed as described above; but when a record exceeds the table-level TTL, Cassandra deletes it immediately, without tombstoning or compaction. https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutDeletes.html?hl=tombstone
Will the data be deleted more efficiently if I set TTL at table level instead of setting each time while inserting.
Also, documentation is for Cassandra 3, so will I have to upgrade to newer version to get any benefits?
Setting default_time_to_live applies the default ttl to all rows and columns in your table - and if no individual ttl is set (and cassandra has correct ntp time on all nodes), cassandra can easily drop those data safely.
But keep some things in mind: your application is still able so set a specific ttl for a single row in your table - then normal processing will apply. On top, even if the data is ttled it won't get deleted immediately - sstables are still immutable, but tombstones will be dropped during compaction.
What could help you really a lot - just guessing - would be an appropriate compaction strategy:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/dml/dmlHowDataMaintain.html#dmlHowDataMaintain__twcs-compaction
TimeWindowCompactionStrategy (TWCS)
Recommended for time series and expiring TTL workloads.
The TimeWindowCompactionStrategy (TWCS) is similar to DTCS with
simpler settings. TWCS groups SSTables using a series of time windows.
During compaction, TWCS applies STCS to uncompacted SSTables in the
most recent time window. At the end of a time window, TWCS compacts
all SSTables that fall into that time window into a single SSTable
based on the SSTable maximum timestamp. Once the major compaction for
a time window is completed, no further compaction of the data will
ever occur. The process starts over with the SSTables written in the
next time window.
This help a lot - when choosing your time windows correctly. All data in the last compacted sstable will have roughly equal ttl values (hint: don't do out-of-order inserts or manual ttls!). Cassandra keeps the youngest ttl value in the sstable metadata and when that time has passed cassandra simply deletes the entire table as all data is now obsolete. No need for compaction.
How do you run your repair? Incremental? Full? Reaper? How big in terms of nodes and data is your cluster?
The quick answer is yes. The way it is implemented is by deleting the SStable/s directly from disk. Deleting an SStable without the need to compact will clear up disk space faster. But you need to be sure that the all the data in a specific sstable is "older" than the globally configured TTL for the table.
This is the feature referred to in the paragraph you quoted. It was implemented for Cassandra 2.0 so it should be part of 2.1

Getting database for Cassandra or building one from scratch?

So, I'm new to Cassandra and I was wondering what the best approach would be to learn Cassandra.
Should I first focus on the design of a database and build one from scratch?
And as I was reading that Cassandra is great for writing. How can one observe that? Is there open source data that one can use? (I didn't really know where to look.)
A good point getting started with Cassandra are the free online courses from DataStax (an enterprise grade Cassandra distribution): https://academy.datastax.com/courses
And for Cassandra beeing good at writing data - have a look here: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlHowDataWritten.html
The write path comes down to these points:
write the data into the commitlog (append only sequentially, no random io - therefore should be on its own disk to prevent head movements, with ssd no issue)
write the data into memtables (kept in memory - very fast)
So in terms of disk, a write is a simple append to the commitlog in the first place. No data is directly written to the sstables (it's in the commitlog and memtable, which becomes flushed to disk at times as sstables), updates are not changing an sstable on disk (sstables are immutable, an update is written separately with a new timestamp), a delete does not remove data from sstables (sstables are immutable - instead a tombstone is written).
All updates and deletes produce new entries in memtable and sstables, to remove deleted data and to get rid of old versions of data from updates sstables on disk are compacted from time to time into a new one.
Also read about the different compaction strategies (can help you provide good performance), replication factor (how many copies of your data the cluster should keep) and consistency levels (how Cassandra should determine when a write or read is successful, hint: ALL is almost wrong all the time, look for QUORUM).

Cassandra Leveled Compaction vs TimeWindowCompactionStrategy

The idea behind TimeWindowCompactionStrategy is each SSTable has records from only a particular time window, instead of records from different time windows getting mixed with each other.
Doesn't Leveled Compaction result in something similar? SSTables are compacted with other SSTables from the same level, which are all from the same time window. (aka SSTables at higher levels are always older). This looks very similar to DateTieredCompactionStrategy, except that the SSTable size is determined by max size in MBs instead of a time window.
LeveledCS is grouping SSTables by size in a multilevel structure, while TimeWindowCS is making same-interval SSTables (thus it's a single level structure) and has limitations on number of buckets so tables with TWCS requires TTL for all rows.
You are correct about difference between DTCS and LCS.
P.S. I recommend to watch the slides from presentation by the author of TWCS to get the reasoning behind it.

Resources