Cassandra TTL and gc_grace_seconds - cassandra

Using Cassandra 3.0:
If all of my columns have a (default or otherwise) TTL and I never delete a column, but overwrites happen maybe 2-3 times a day, can I set gc_grace_seconds = 0?
Note: The TTL of a columns, even after being overwritten, always points to the same point in time. E.g. March 10, 2017
Will I run into issues when a node goes down?
I know if I deleted a column and a node goes down and does not come up before gc_grace_seconds, I will have zombie column. "Logic" says this shouldn't be a problem unless the overwritten column has a different TTL.

Related

Cassandra is not deleting rows after TTL expires due to Boolean columns

Having a table where I am setting the TTL to 7 days, I expect that the Cassandra will delete the rows after 7 days.
My table contains Boolean columns which set the column to True ONLY when creating new rows on the table. The columns are never updated to new value and so these ones are not changing the TTL value with a new TTL value (because Cassandra is column-oriented database).
However, I noticed that after 7 days all the columns are set to NULL values (as expected to get deleted) except from the Boolean columns which remain True and as a result, the rows are never deleted.
Checked the TTL value of all columns and they have NULL values which means that TTL has expired on ALL columns including the Boolean columns.
When setting the Boolean columns MANUALLY to NULL (after the 7 days) then the rows will be removed immediately as expected.
I can not understand why Cassandra is not setting the Boolean columns to NULL after TTL expires so the rows will be deleting automatically.
Is the Cassandra working in a different way with Boolean columns and TTL values?
Working with:
Python 3.6 and
Cassandra 3.11
Solution: After running nodetool flush, the issue fixed and the rows are getting deleted when TTL expires.
That's really weird but it is working and I can see the rows getting deleted immediately. I can not find a reason of not deleting expired rows on the table even though the rows have been marked as expired (expires= true) on sstables.

Tombstone scanning in system.log

I have a cassandra cluster with less delete use case. I found in my system.log "Read 10 live and 5645464 tombstones cells in keyspace.table" What does it mean? please help to understand.
Thanks.
For Cassandra, all the information recorded is immutable. This means that when you have a delete operation (explicit with a delete statement or with a Time To Live [TTL] clause), the database will add another record with a special flag named tombstone. All these records will stay on the database until the gc_grace_seconds periods have passed; the default is 10 days.
In your case, the engine found out that most of the records retrieved were deleted, but they are still waiting for the gc_grace_seconds to pass, to let compaction reclaim the space. One possible option to fix the issue is to decrease gc_grace_seconds for that table.
For more information, please refer to this article from the Last Pickle.
One more important thing to keep in mind when working with Cassandra is that tombstones cells do not directly correlate to deletes.
When you insert null value to an attribute when performing your insert, Cassandra internally marks that attribute/cell as a tombstone. So, even if you don't have a lot of deletes happening, you could end up with an enormous number of tombstones. Easy and simple solution is to not insert null values for an attribute while inserting.
As per this statement Read 10 live and 5645464 tombstones cells in keyspace.table goes, there might be a table scan for a query happening that is scanning 10 cells and 5645464 number of tombstones (cells with null value) while doing so is what I am guessing. Need to understand what type of queries are being executed to gain more insight into that.

Too many Tombstone in Cassandra

I have a table named 'holder' which has the single partition in which for every one hour we will have 60K entries,
I have another table named 'holderhistory' which has the 'date' as partitionId, so every day's record from 'holder' table will be copied to the 'holderhistory'
There will be a job running in the application
i) which collects all the older entries in holder table and copy to the holderhistory table
ii) Delete the older entries from holder table
NOW the issue is - there will be too many tombstones created in the holder table.
As default the tombstones will be cleared after 10 days (864000 seconds) gc_grace_seconds
But I don't want to keep the tombstone for more than 3 hours,
1) so It is good to set the gc_grace_seconds to 3 hours?
2) Or It is good to set the default_time_to_live to 3 hours?
Which is the best solution for deleting the tombstone?
Also what are the consequence on reducing the gc_grace_seconds from 10 days to 3 hours? where we will have impact?
Anyhelp is appreciated.
If you reduce the GCGraceSeconds parameter too low and the recovery time of any node longer than the GCGraceSeconds, in such case, once one of these nodes came back online, it would mistakenly think that all of the nodes that had received the delete had actually missed a write and it would start repairing all of the other nodes. I would recommend to use efault_time_to_live and give a try.
To answer your particular case : as the table 'holder' contains only one partition, you can delete the whole partition with a single "delete by partition key" statement, effectively creating a single tombstone.
If you delete the partition once a day, you'll end up with 1 tombstone per day... that's quite acceptable.
1) with gc_grace_seconds equals 3 hours, and if RF > 1, you will not be guaranteed to recover consistently from a node failure longer than 3 hours
2) with default_time_to_live equals 3 hours, each record will be deleted by creating a tombstone 3 hours after insertion
So you could keep default gc_grace_seconds set to 10 days, and take care to delete your daily records with something like DELETE FROM table WHERE PartitionKey = X
EDIT: Answering to your comment about hinted handoff...
Let's say RF = 3, gc_grace_second = 3h and a node goes down. The 2 others replicas continue to receive mutations (insert, update, delete), but they can't replicate them to the offline node. In that case, hints will be stored on disk temporarily, to be sent later if the dead node comes back.
But a hint expires after gc_grace_seconds, after what it will never been sent.
Now if you delete a row, it will generate a tombstone in the sstables of the 2 replicas and a hint in the coordinator node. After 3 hours, the tombstones are removed from the online nodes by the compaction manager, and the hint expires.
Later when your dead node comes back, it still have the row, and it can't know that this row has been deleted because no hint and no more tombstone exist on replicas... thus it's a zombie row.
You might also find this support blog article useful:
https://academy.datastax.com/support-blog/cleaning-tombstones-datastax-dse-and-apache-cassandra

cassandra TTL for table behaviour

Suppose I inserted a column at second-1 and another column at second-2. Default TTL for table is set to 10 seconds for example:
Question 1: Is data1 and data2 going to be deleted after 10 seconds or data 1 will be deleted after 10 seconds and data-2 after 11 seconds ( as it was inserted in second-2)?
Question 2: Is it possible to set a TTL at a table level in such a way that each entry in the table will expire based on the TTL in a FIFO fashion ? (data-1 will expire at second-10 and data-2 at second-11), without specifying TTL while inserting for each data point? (Should be able to specify at a table level ?)
Thanks for the help :)
EDIT:
the page at https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html says
Setting a TTL for a table
The CQL table definition supports the default_time_to_live property,
which applies a specific TTL to each column in the table. After the
default_time_to_live TTL value has been exceed, Cassandra tombstones
the entire table. Apply this default TTL to a table in CQL using
CREATE TABLE or ALTER TABLE
they say "entire table" which confused me.
TTL at table level is by no means different than TTL at values level: it specifies the default TTL time for each row.
The TTL specifies after how many seconds the values must be considered outdated and thus deleted. The reference point is the INSERT/UPDATE timestamp, so if you insert/update a row at 09:53:01:
with a TTL of 10 seconds, it will expire at 09:53:11
with a TTL of 15 seconds, it will expire at 09:53:16
with a TTL of 0 seconds, it will never expire
You can override the default TTL time by specifying USING TTL X clause in your queries, where X is your new TTl value.
Please note that using TTL not wisely can cause tombstones problems. And note also that the TTL usage have some quirks. Have a look at this recent answer for further details.
Question 1 Ans : data1 will deleted after 10 and data2 will deleted after 11 seconds
Question 2 Ans : Cassandra insert every column with the table's ttl, So Every column will expire on insertion time + ttl.
I read this topic and a lot of anothers but I'm still confused because at https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useExpire.html
they say exactly this:
If any column exceeds TTL, the entire table is tombstoned.
What do they mean? I understand that there is no any sence to tombstone all columns in table when only one exceeded default_time_to_live but they wrote exactly this!
UPD: I did several tests. default_time_to_live means just default TTL on column level. When this TTL expires just concrete columns with expired TTL are tombstoned.
They used very strange sentence in that article.

Cassandra ttl on a row

I know that there are TTLs on columns in Cassandra. But is it also possible to set a TTL on a row? Setting a TTL on each column doesn't solve my problem as can be seen in the following usecase:
At some point a process wants to delete a complete row with a TTL (let's say row "A" with TTL 1 week). It could do this by replacing all existing columns with the same content but with a TTL of 1 week.
But there may be another process running concurrently on that row "A" which inserts new columns or replaces existing ones without a TTL because that process can't know that the row is to be deleted (it runs concurrently!). So after 1 week all columns of row "A" will be deleted because of the TTL except for these newly inserted ones. And I also want them to be deleted.
So is there or will there be Cassandra support for this use case or do I have to implement something on my own?
Kind Regards
Stefan
There is no way of setting a TTL on a row in Cassandra currently. TTLs are designed for deleting individual columns when their lifetime is known when they are written.
You could achieve what you want by delaying your process - instead of wanting to insert a TTL of 1 week, run it a week later and delete the row. Row deletes have the following semantics: any column inserted just before will get deleted but columns inserted just after won't be.
If columns that are inserted in the future still need to be deleted you could insert a row delete with a timestamp in the future to ensure this but be very careful: if you later wanted to insert into that row you couldn't, columns would just disappear when written to that row (until the tombstone is garbage collected).
You can set ttl for a row in Cassandra 3 using
INSERT INTO Counter(key,eventTime,value) VALUES ('1001',dateof(now()),100) USING ttl 10;
Although I do not recommend such, there is a Cassandra way to fix the problem:
SELECT TTL(value) FROM table WHERE ...;
Get the current TTL of a value first, then use the result to set the TTL in an INSERT or UPDATE:
INSERT ... USING TTL ttl-of-value;
So... I think that the SELECT TTL() is slow (from experience with TTL() and WRITETIME() in some of my CQL commands). Not only that, the TTL is correct at the time the select results are generated on the Cassandra node, but by the time the insert happens, it will be off. Cassandra should have offered a time to delete rather than a time to live...
So as mentioned by Richard, having your own process to delete data after 1 week is probably safer. You should have one column to save the date of creation or the date when the data becomes obsolete. Then a background process can read that date and if the data is viewed as obsolete, drop the entire row.
Other processes can also use that date to know whether that row is considered valid or not! (so even if it was not yet deleted, you can still view the row as invalid if the date is passed.)

Resources