Impact of not running Cassandra repair on a time series data - cassandra

Data we store in Cassandra is pure time series with no manual deletes. Data gets deleted only by TTL.
For such use cases, is repair really needed? What is the impact of not running repair?

Tobstoned data really deleted after gc_grace_seconds + compaction. if table with tombstoned data is not compacted, you will stack with this data, and it will cause performance degradation.
If you don't run repair within gc_grace period, dead data can live again. Here's datastax article on this (and why you need to run repairs regulary):
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_deletes_c.html
EDIT:
TTLed data isn't tombstoned on the time of the expire, but only when there's a compaction proccess (at least in 3.9). You will not see expired data, even when there's no tombstones.
So, if there is a problem with the node, and TTLed data isn't got it's tombstone on compaction, it will get one on the next compaction, or will be simply deleted. According to this, and the fact that the data is NEVER deleted and only expired, and you don't have any owerwrites to same key, you don't have to run repairs for data consistency.
And, regarding to all above, i will recommend to run repairs once in a while (with much higher interval between them), in case that something accidentally was written not using you write pass.

If you set TTL, cassandra will mark the data with tombstone after the time exceeded. If you don't run repair regularly, huge tombstone will be generated and it will affect cassandra performance
After the number of seconds since the column's creation exceeds the TTL value, TTL data is considered expired and is included in results. Expired data is marked with a tombstone after on the next read on the read path, but it remains for a maximum of gc_grace_seconds. After this amount of time, the tombstoned data is automatically removed during the normal compaction and repair processes
https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html

Related

How to set TTL on Cassandra sstable

We are using Cassandra 3.10 with 6 nodes cluster.
lately, we noticed that our data volume increased drastically, approximately 4GB per day in each node.
We want to implement a more aggressive retention policy in which we will change the compaction to TWCS with 1-hour window size and set a few days TTL, this can be achieved via the table properties.
Since the ETL should be a slow process in order to lighten Cassandra workload it possible that it will not finish extracting all the data until the TTL, so I wanted to know is there a way for the ETL process to set TTL=0 on entire SSTable once it done extracting it?
TTL=0 is read as a tombstone. When next compacted it would be written tombstone or purged depending on your gc_grace. Other than the overhead of doing the writes of the tombstone it might be easier just to do a delete or create sstables that contain the necessary tombstones than to rewrite all the existing sstables. If its more efficient to do range or point tombstones will depend on your version and schema.
An option that might be easiest is to actually use a different compaction strategy all together or a custom one like https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy. You can then just purge data on compactions that have been processed. This still depends quite a bit on your schema on how hard it would be to mark whats been processed or not.
You should set TTL 0 on table and query level as well. Once TTL expire data will converted to tombstones. Based on gc_grace_seconds value next compaction will clear all the tombstones. you may run major compaction also to clear tombstones but it is not recommended in cassandra based on compaction strategy. if STCS atleast 50% disk required to run healthy compaction.

Tombstone in Cassandra

I have a Cassandra table with TTL of 60 seconds, I have few questions in this,
1) I am getting the following warning
Read 76 live rows and 1324 tombstone cells for query SELECT * FROM xx.yy WHERE token(y) >= token(fc872571-1253-45a1-ada3-d6f5a96668e8) LIMIT 100 (see tombstone_warn_threshold)
What does this mean?
2) As per my study, Tombstone is a flag in case of TTL (will be deleted after gc_grace_seconds)
i) so till 10 days does it mean that it won't be deleted ?
ii) What will be the consequence of it waiting for 10 days?
iii) Why it is a long time 10 days?
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/tabProp.html
gc_grace_seconds 864000 [10 days] The number of seconds after data is marked with a tombstone (deletion marker) before it is eligible for garbage-collection. Cassandra will not execute hints or batched mutations on a tombstoned record within its gc_grace_period. The default value allows a great deal of time for Cassandra to maximize consistency prior to deletion. For details about decreasing this value, see garbage collection below.
3) I read that performing compaction and repair using nodetool will delete the tombstone, How frequently we need to run this in background, What will be the consequence of it?
This means that your query returned 76 "live" or non-deleted/non-obsoleted rows of data, and that it had to sift through 1324 tombstones (deletion markers) to accomplish that.
In the world of distributed databases, deletes are hard. After all, if you delete a piece of data from one node, and you expect that deletion to happen on all of your nodes, how would you know if it worked? Quite literally, how do you replicate nothing? Tombstones (delete markers) are the answer to that question.
i. The data is gone (obsoleted, rather). The tombstone(s) will remain for gc_grace_seconds.
ii. The "consequence" is that you'll have to put up with those tombstone warning messages for that duration, or find a way to run your query without having to scan over the tombstones.
iii. The idea behind the 10 days, is that if the tombstones are collected too early, that your deleted data will "ghost" its way back up to some nodes. 10 days gives you enough time to run a weekly repair, which ensures your tombstones are properly replicated before removal.
Compaction removes tombstones. Repair replicates them. You should run repair once per week. While you can run compaction on-demand, don't. Cassandra has its own thresholds (based on number and size of SSTable files) to figure out when to run compaction, and it's best not to get in its way. If you do, you'll be manually running compaction from there on out, as you'll probably never reach the compaction conditions organically.
The consequences, are that both repair and compaction take compute resources, and can reduce a node's ability to serve requests. But they need to happen. You want them to happen. If compaction doesn't run, your SSTable files will grow in number and size; eventually causing rows to exist over multiple files, and queries for them will get slow. If repair doesn't run, your data is at risk of not being in-sync.

Cassandra - What is difference between TTL at table and inserting data with TTL

I have a Cassandra 2.1 cluster where we insert data though Java with TTL as the requirement of persisting the data is 30 days.
But this causes problem as the files with old data with tombstones is kept on the disk. This results in disk space being occupied by data which is not required. Repairs take a lot of time to clear this data (upto 3 days on a single node)
Is there a better way to delete the data?
I have come across this on datastax
Cassandra allows you to set a default_time_to_live property for an entire table. Columns and rows marked with regular TTLs are processed as described above; but when a record exceeds the table-level TTL, Cassandra deletes it immediately, without tombstoning or compaction. https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutDeletes.html?hl=tombstone
Will the data be deleted more efficiently if I set TTL at table level instead of setting each time while inserting.
Also, documentation is for Cassandra 3, so will I have to upgrade to newer version to get any benefits?
Setting default_time_to_live applies the default ttl to all rows and columns in your table - and if no individual ttl is set (and cassandra has correct ntp time on all nodes), cassandra can easily drop those data safely.
But keep some things in mind: your application is still able so set a specific ttl for a single row in your table - then normal processing will apply. On top, even if the data is ttled it won't get deleted immediately - sstables are still immutable, but tombstones will be dropped during compaction.
What could help you really a lot - just guessing - would be an appropriate compaction strategy:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/dml/dmlHowDataMaintain.html#dmlHowDataMaintain__twcs-compaction
TimeWindowCompactionStrategy (TWCS)
Recommended for time series and expiring TTL workloads.
The TimeWindowCompactionStrategy (TWCS) is similar to DTCS with
simpler settings. TWCS groups SSTables using a series of time windows.
During compaction, TWCS applies STCS to uncompacted SSTables in the
most recent time window. At the end of a time window, TWCS compacts
all SSTables that fall into that time window into a single SSTable
based on the SSTable maximum timestamp. Once the major compaction for
a time window is completed, no further compaction of the data will
ever occur. The process starts over with the SSTables written in the
next time window.
This help a lot - when choosing your time windows correctly. All data in the last compacted sstable will have roughly equal ttl values (hint: don't do out-of-order inserts or manual ttls!). Cassandra keeps the youngest ttl value in the sstable metadata and when that time has passed cassandra simply deletes the entire table as all data is now obsolete. No need for compaction.
How do you run your repair? Incremental? Full? Reaper? How big in terms of nodes and data is your cluster?
The quick answer is yes. The way it is implemented is by deleting the SStable/s directly from disk. Deleting an SStable without the need to compact will clear up disk space faster. But you need to be sure that the all the data in a specific sstable is "older" than the globally configured TTL for the table.
This is the feature referred to in the paragraph you quoted. It was implemented for Cassandra 2.0 so it should be part of 2.1

Cassandra Tombstoning warning and failure thresholds breached

We are running a Titan Graph DB server backed by Cassandra as a persistent store and are running into an issue with reaching the limit on Cassandra tombstone thresholds that is causing our queries to fail / timeout periodically as data accumulates. It seems like the compaction is unable to keep up with the number of tombstones being added.
Our use case supports:
High read / write throughputs.
High sensitivity to reads.
Frequent updates to node values in Titan. causing rows to be updated in Cassandra.
Given the above use cases, we are already optimizing Cassandra to aggressively do the following:
Aggressive compaction by using the levelled compaction strategies
Using tombstone_compaction_interval as 60 seconds.
Using tombstone_threshold to be 0.01
Setting gc_grace_seconds to be 1800
Despite the following optimizations, we are still seeing warnings in the Cassandra logs similar to:
[WARN] (ReadStage:7510) org.apache.cassandra.db.filter.SliceQueryFilter: Read 0 live and 10350 tombstoned cells in .graphindex (see tombstone_warn_threshold). 8001 columns was requested, slices=[00-ff], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}
Occasionally as time progresses, we also see the failure threshold breached and causes errors.
Our cassandra.yaml file has the tombstone_warn_threshold to be 10000, and the tombstone_failure_threshold to be much higher than recommended at 250000, with no real noticeable benefits.
Any help that can point us to the correct configurations would be greatly appreciated if there is room for further optimizations. Thanks in advance for your time and help.
Sounds like the root of your problem is your data model. You've done everything you can do to mitigate getting TombstoneOverwhelmingException. Since your data model requires such frequent updates causing tombstone creation a eventual consistent store like Cassandra may not be a good fit for your use case. When we've experience these types of issues we had to change our data model to fit better with Cassandra strengths.
About deletes http://www.slideshare.net/planetcassandra/8-axel-liljencrantz-23204252 (slides 34-39)
Tombstones are not compacted away until the gc_grace_seconds configuration on a table has elapsed for a given tombstone. So even increasing your compaction interval your tombstones will not be removed until gc_grace_seconds has elapsed, with the default being 10 days. You could try tuning gc_grace_seconds down to a lower value and do repairs more frequently (usually you want to schedule repairs to happen every gc_grace_seconds_in_days - 1 days).
So everyone here is right. If you repair and compact frequently you an reduce your gc_grace_seconds number.
It may also however be worth considering that Inserting Nulls is equivalent to a delete. This will increase your number of tombstones. Instead you'll want to insert the UNSET_VALUE if you're using prepared statements. Probably too late for you, but if anyone else comes here.
The variables you've tuned are helping you expire tombstones, but it's worth noting that while tombstones can not be purged until gc_grace_seconds, Cassandra makes no guarantees that tombstones WILL be purged at gc_grace_seconds. Indeed, tombstones are not compacted until the sstable containing the tombstone is compacted, and even then, it will not be eliminated if there is another sstable containing a cell that is shadowed.
This results in tombstones potentially persisting a very long time, especially if you're using sstables that are infrequently compacted (say, very large STCS sstables). To address this, tools exist such as the JMX endpoint to forceUserDefinedCompaction - if you're not adept at using JMX endpoints, tools to do this for you automatically exist such as http://www.encql.com/purge-cassandra-tombstones/

Retrieving "tombstoned" records in Cassandra

My question is very simple. Is it in any way possible to retrieve columns that have been marked tombstone before the GCGraceSeconds period expiry(default 10 days). If yes what would be the exact CQL query for that?
If I were to understand the deletion process the tombstones are marked on the MemTables and the SSTable being immutable waiting for compaction still has the the deleted data waiting for compaction. So before compaction occurs is there any way to read the tombstoned data from either the Memtable or SSTable?
Using CQL 3.0 on CQLSH command prompt & Cassandra 2.0.
You are right, when a tombstone is inserted it usually doesn't immediately delete the underlying data (unless all your data is in a memtable). However, you can't control when it does. If you don't have much data and compaction happens quickly, the underlying data may be deleted very quickly, much sooner than 10 days.
There is no query to read deleted data, but you can inspect all your SSTables with sstable2json to see if they contain the deleted data.
Just to add on to the previous comment. Have a low value of gc_grace_seconds for the column families that have frequent deletions. It will take some time for gc but tombstones are expected to get cleared .

Resources