We are using Cassandra 3.10 with 6 nodes cluster.
lately, we noticed that our data volume increased drastically, approximately 4GB per day in each node.
We want to implement a more aggressive retention policy in which we will change the compaction to TWCS with 1-hour window size and set a few days TTL, this can be achieved via the table properties.
Since the ETL should be a slow process in order to lighten Cassandra workload it possible that it will not finish extracting all the data until the TTL, so I wanted to know is there a way for the ETL process to set TTL=0 on entire SSTable once it done extracting it?
TTL=0 is read as a tombstone. When next compacted it would be written tombstone or purged depending on your gc_grace. Other than the overhead of doing the writes of the tombstone it might be easier just to do a delete or create sstables that contain the necessary tombstones than to rewrite all the existing sstables. If its more efficient to do range or point tombstones will depend on your version and schema.
An option that might be easiest is to actually use a different compaction strategy all together or a custom one like https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy. You can then just purge data on compactions that have been processed. This still depends quite a bit on your schema on how hard it would be to mark whats been processed or not.
You should set TTL 0 on table and query level as well. Once TTL expire data will converted to tombstones. Based on gc_grace_seconds value next compaction will clear all the tombstones. you may run major compaction also to clear tombstones but it is not recommended in cassandra based on compaction strategy. if STCS atleast 50% disk required to run healthy compaction.
Related
I'm using LCS and a relatively large TTL of 2 years for all inserted rows and I'm concerned about the moment at which C* would drop the corresponding tombstones (neither explicit deletes nor updates are being performed).
From Missing Manual for Leveled Compaction Strategy, Tombstone Compactions in Cassandra and Deletes Without Tombstones or TTLs I understand that
All levels except L0 contain non-overlapping SSTables, but a partition key may be present in one SSTable in each level (aka distributed in all levels).
For a compaction to be able to drop a tombstone it must be sure that is compacting all SStables that contains de data to prevent zombie data (this is done checking bloom filters). It also considers gc_grace_seconds
So, for my particular use case (2 years TTL and write heavy load) I can conclude that TTLed data will be in highest levels so I'm wondering when those SSTables with TTLed data will be compacted with the SSTables that contains the corresponding SSTables.
The main question will be: Where are tombstones (from ttls) being created? Are being created at Level 0 so it will take a long time until it will end up in the highest levels (hence disk space will take long time to be freed)?
In a comment from About deletes and tombstones Alain says that
Yet using TTLs helps, it reduces the chances of having data being fragmented between SSTables that will not be compacted together any time soon. Using any compaction strategy, if the delete comes relatively late in the row history, as it use to happen, the 'upsert'/'insert' of the tombstone will go to a new SSTable. It might take time for this tombstone to get to the right compaction "bucket" (with the rest of the row) and for Cassandra to be able to finally free space.
My understanding is that with TTLs the tombstones is created in-place, thus it is often and for many reasons easier and safer to get rid of a TTLs than from a delete.
Another clue to explore would be to use the TTL as a default value if that's a good fit. TTLs set at the table level with 'default_time_to_live' should not generate any tombstone at all in C*3.0+. Not tested on my hand, but I read about this.
I'm not sure what it means with "in-place" since SSTables are immutable.
(I also have some doubts about what it says of using default_time_to_live that I've asked in How default_time_to_live would delete rows without tombstones in Cassandra?).
My guess is that is referring to tombstones being created in the same level (but different SStables) that the TTLed data during a compaction triggered by one of the following reasons:
"Going from highest level, any level having score higher than 1.001 can be picked by a compaction thread" The Missing Manual for Leveled Compaction Strategy
"If we go 25 rounds without compacting in the highest level, we start bringing in sstables from that level into lower level compactions" The Missing Manual for Leveled Compaction Strategy
"When there are no other compactions to do, we trigger a single-sstable compaction if there is more than X% droppable tombstones in the sstable." CASSANDRA-7019
Since tombstones are created during compaction, I think it may be using SSTable metadata to estimate droppable tombstones.
So, compactions (2) and (3) should be creating/dropping tombstones in highest levels hence using LCS with a large TTL should not be an issue per se.
With creating/dropping I mean that the same kind of compactions will be creating tombstones for expired data and/or dropping tombstones if the gc period has already passed.
A link to source code that clarifies this situation will be great, thanks.
Alain Rodriguez's answer from mailing list
Another clue to explore would be to use the TTL as a default value if
that's a good fit. TTLs set at the table level with 'default_time_to_live'
should not generate any tombstone at all in C*3.0+. Not tested on my hand,
but I read about this.
As explained on a parallel thread, this is wrong, mea culpa. I believe the rest of my comment still stands (hopefully :)).
I'm not sure what it means with "in-place" since SSTables are immutable.
My guess is that is referring to tombstones being created in the same
Yes, I believe during the next compaction following the expiration date,
the entry is 'transformed' into a tombstone, and lives in the SSTable that
is the result of the compaction, on the level/bucket this SSTable is put
into. That's why I said 'in-place' which is indeed a bit weird for
immutable data.
As a side idea for your problem, on 'modern' versions of Cassandra (I don't
remember the version, that's what 'modern' means ;-)), you can run
'nodetool garbagecollect' regularly (not necessarily frequently) during the
off-peak period. That might use the cluster resources when you don't need
them to claim some disk space. Also making sure that a 2 years old record
is not being updated regularly by design would definitely help. In the
extreme case of writing a data once (never updated) and with a TTL for
example, I see no reason for a 2 years old data not to be evicted
correctly. As long as the disk can grow, it should be fine.
I would not be too much scared about it, as there is 'always' a way to
remove tombstones. Yet it's good to think about the design beforehand
indeed, generally, it's good if you can rotate the partitions over time,
not to reuse old partitions for example.
I have a Cassandra 2.1 cluster where we insert data though Java with TTL as the requirement of persisting the data is 30 days.
But this causes problem as the files with old data with tombstones is kept on the disk. This results in disk space being occupied by data which is not required. Repairs take a lot of time to clear this data (upto 3 days on a single node)
Is there a better way to delete the data?
I have come across this on datastax
Cassandra allows you to set a default_time_to_live property for an entire table. Columns and rows marked with regular TTLs are processed as described above; but when a record exceeds the table-level TTL, Cassandra deletes it immediately, without tombstoning or compaction. https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutDeletes.html?hl=tombstone
Will the data be deleted more efficiently if I set TTL at table level instead of setting each time while inserting.
Also, documentation is for Cassandra 3, so will I have to upgrade to newer version to get any benefits?
Setting default_time_to_live applies the default ttl to all rows and columns in your table - and if no individual ttl is set (and cassandra has correct ntp time on all nodes), cassandra can easily drop those data safely.
But keep some things in mind: your application is still able so set a specific ttl for a single row in your table - then normal processing will apply. On top, even if the data is ttled it won't get deleted immediately - sstables are still immutable, but tombstones will be dropped during compaction.
What could help you really a lot - just guessing - would be an appropriate compaction strategy:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/dml/dmlHowDataMaintain.html#dmlHowDataMaintain__twcs-compaction
TimeWindowCompactionStrategy (TWCS)
Recommended for time series and expiring TTL workloads.
The TimeWindowCompactionStrategy (TWCS) is similar to DTCS with
simpler settings. TWCS groups SSTables using a series of time windows.
During compaction, TWCS applies STCS to uncompacted SSTables in the
most recent time window. At the end of a time window, TWCS compacts
all SSTables that fall into that time window into a single SSTable
based on the SSTable maximum timestamp. Once the major compaction for
a time window is completed, no further compaction of the data will
ever occur. The process starts over with the SSTables written in the
next time window.
This help a lot - when choosing your time windows correctly. All data in the last compacted sstable will have roughly equal ttl values (hint: don't do out-of-order inserts or manual ttls!). Cassandra keeps the youngest ttl value in the sstable metadata and when that time has passed cassandra simply deletes the entire table as all data is now obsolete. No need for compaction.
How do you run your repair? Incremental? Full? Reaper? How big in terms of nodes and data is your cluster?
The quick answer is yes. The way it is implemented is by deleting the SStable/s directly from disk. Deleting an SStable without the need to compact will clear up disk space faster. But you need to be sure that the all the data in a specific sstable is "older" than the globally configured TTL for the table.
This is the feature referred to in the paragraph you quoted. It was implemented for Cassandra 2.0 so it should be part of 2.1
I am using awesome Cassandra DB (3.7.0) and I have questions about tombstone.
I have table called raw_data. This table has default TTL as 1 hour. This table gets new data every second. Then another processor reads one row and remove the row.
It seems like this raw_data table becomes slow at reading and writing after several days of running.
Is this because of deleted rows are staying as tombstone? This table already has TTL as 1 hour. Should I set gc_grace_period to something less than 10 days (default value) to remove tombstones quickly? (By the way, I am single-node DB)
Thank you in advance.
Deleting your data is the way to have tombstone problems. TTL is the other way.
It is pretty normal for a Cassandra cluster to become slower and slower after each delete, and your cluster will eventually refuse to read data from this table.
Setting gc_grace_period to less than the default 10 days is only one part of the equation. The other part is the compaction strategy you use. Indeed, in order to remove tombstones a compaction is needed.
I'd change my mind about my single-node cluster and I'd go with the minimum standard 3 nodes with RF=3. Then I'd design my project around something that doesn't explicitly delete data. If you absolutely need to delete data, make sure that C* runs compaction periodically and removes tombstones (or force C* to run compactions), and make sure to have plenty of IOPS, because compaction is very IO intensive.
In short Tombstones are used to Cassandra to mark the data is deleted, and replicate the same to other nodes so the deleted data doesn't re-appear. These tombstone will be stored in Cassandra till the gc_grace_period. Creating more tobestones might slow down your table. As you are using a single node Cassandra you don't have to replicate anything in other nodes, hence you can update your gc grace seconds to 1 day, which will not affect. In future if you are planning to add new nodes and data centers change this gc grace seconds.
I've got an append only Cassandra table where every entry has a ttl. Using SizeTieredCompaction the table seems to grow unbounded. Is there a way to ensure that sstables are checked for tombstoned columns more often?
Tombstones will be effectively deleted after the grace period (search for gc_grace_seconds) expires and a compaction occurs. You should check your YAML configuration for the parameter value (default is 864000 seconds, 10 days) and change it to something suitable for your needs. Beware that lowering this value has its own drawbacks e.g. row "resurrecting" if a node is down for more than the grace period.
Check the datastax documentation about deletes in Cassandra.
If you have high loads of input data and do not modify it, you may better use DateTieredCompactionStrategy as it will be more efficient processing and removing the tombstones.
We are running a Titan Graph DB server backed by Cassandra as a persistent store and are running into an issue with reaching the limit on Cassandra tombstone thresholds that is causing our queries to fail / timeout periodically as data accumulates. It seems like the compaction is unable to keep up with the number of tombstones being added.
Our use case supports:
High read / write throughputs.
High sensitivity to reads.
Frequent updates to node values in Titan. causing rows to be updated in Cassandra.
Given the above use cases, we are already optimizing Cassandra to aggressively do the following:
Aggressive compaction by using the levelled compaction strategies
Using tombstone_compaction_interval as 60 seconds.
Using tombstone_threshold to be 0.01
Setting gc_grace_seconds to be 1800
Despite the following optimizations, we are still seeing warnings in the Cassandra logs similar to:
[WARN] (ReadStage:7510) org.apache.cassandra.db.filter.SliceQueryFilter: Read 0 live and 10350 tombstoned cells in .graphindex (see tombstone_warn_threshold). 8001 columns was requested, slices=[00-ff], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}
Occasionally as time progresses, we also see the failure threshold breached and causes errors.
Our cassandra.yaml file has the tombstone_warn_threshold to be 10000, and the tombstone_failure_threshold to be much higher than recommended at 250000, with no real noticeable benefits.
Any help that can point us to the correct configurations would be greatly appreciated if there is room for further optimizations. Thanks in advance for your time and help.
Sounds like the root of your problem is your data model. You've done everything you can do to mitigate getting TombstoneOverwhelmingException. Since your data model requires such frequent updates causing tombstone creation a eventual consistent store like Cassandra may not be a good fit for your use case. When we've experience these types of issues we had to change our data model to fit better with Cassandra strengths.
About deletes http://www.slideshare.net/planetcassandra/8-axel-liljencrantz-23204252 (slides 34-39)
Tombstones are not compacted away until the gc_grace_seconds configuration on a table has elapsed for a given tombstone. So even increasing your compaction interval your tombstones will not be removed until gc_grace_seconds has elapsed, with the default being 10 days. You could try tuning gc_grace_seconds down to a lower value and do repairs more frequently (usually you want to schedule repairs to happen every gc_grace_seconds_in_days - 1 days).
So everyone here is right. If you repair and compact frequently you an reduce your gc_grace_seconds number.
It may also however be worth considering that Inserting Nulls is equivalent to a delete. This will increase your number of tombstones. Instead you'll want to insert the UNSET_VALUE if you're using prepared statements. Probably too late for you, but if anyone else comes here.
The variables you've tuned are helping you expire tombstones, but it's worth noting that while tombstones can not be purged until gc_grace_seconds, Cassandra makes no guarantees that tombstones WILL be purged at gc_grace_seconds. Indeed, tombstones are not compacted until the sstable containing the tombstone is compacted, and even then, it will not be eliminated if there is another sstable containing a cell that is shadowed.
This results in tombstones potentially persisting a very long time, especially if you're using sstables that are infrequently compacted (say, very large STCS sstables). To address this, tools exist such as the JMX endpoint to forceUserDefinedCompaction - if you're not adept at using JMX endpoints, tools to do this for you automatically exist such as http://www.encql.com/purge-cassandra-tombstones/