Update TTL for entire row when doing CQL update statement - cassandra

Assume you have a row with 4 columns, that when you created it, you set a TTL of 1 hour.
I need to occasionally update the date column of the row, and at the same time update the TTL of the entire row.
Asusming this doesn't work, whats the correct way to achieve this?
update mytable using ttl 3600
set accessed_on=?

Cassandra supports TTL per column only, which is a nice flexible features, but the ability to TTL a row is a feature that has been requested many times.
Your only option is to update all columns on the row, thereby updating the TTL on all the columns.

Related

Cassandra TTL data not working

I have old data (last 1 year) in Cassandra. I then alter the table structure adding TTL of 30 days. Will TTL (default_time_to_live = 2592000) delete my one year back old data or not?
From documentation:
If the value is greater than zero, TTL is enabled for the entire table and an expiration timestamp is added to each column. A new TTL timestamp is calculated each time the data is updated and the row is removed after all the data expires.
So the TTL for data will be set only if you update them, but will not touch the old data.
This description of how data is deleted would be also helpful.

cassandra TTL for table behaviour

Suppose I inserted a column at second-1 and another column at second-2. Default TTL for table is set to 10 seconds for example:
Question 1: Is data1 and data2 going to be deleted after 10 seconds or data 1 will be deleted after 10 seconds and data-2 after 11 seconds ( as it was inserted in second-2)?
Question 2: Is it possible to set a TTL at a table level in such a way that each entry in the table will expire based on the TTL in a FIFO fashion ? (data-1 will expire at second-10 and data-2 at second-11), without specifying TTL while inserting for each data point? (Should be able to specify at a table level ?)
Thanks for the help :)
EDIT:
the page at https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html says
Setting a TTL for a table
The CQL table definition supports the default_time_to_live property,
which applies a specific TTL to each column in the table. After the
default_time_to_live TTL value has been exceed, Cassandra tombstones
the entire table. Apply this default TTL to a table in CQL using
CREATE TABLE or ALTER TABLE
they say "entire table" which confused me.
TTL at table level is by no means different than TTL at values level: it specifies the default TTL time for each row.
The TTL specifies after how many seconds the values must be considered outdated and thus deleted. The reference point is the INSERT/UPDATE timestamp, so if you insert/update a row at 09:53:01:
with a TTL of 10 seconds, it will expire at 09:53:11
with a TTL of 15 seconds, it will expire at 09:53:16
with a TTL of 0 seconds, it will never expire
You can override the default TTL time by specifying USING TTL X clause in your queries, where X is your new TTl value.
Please note that using TTL not wisely can cause tombstones problems. And note also that the TTL usage have some quirks. Have a look at this recent answer for further details.
Question 1 Ans : data1 will deleted after 10 and data2 will deleted after 11 seconds
Question 2 Ans : Cassandra insert every column with the table's ttl, So Every column will expire on insertion time + ttl.
I read this topic and a lot of anothers but I'm still confused because at https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useExpire.html
they say exactly this:
If any column exceeds TTL, the entire table is tombstoned.
What do they mean? I understand that there is no any sence to tombstone all columns in table when only one exceeded default_time_to_live but they wrote exactly this!
UPD: I did several tests. default_time_to_live means just default TTL on column level. When this TTL expires just concrete columns with expired TTL are tombstoned.
They used very strange sentence in that article.

Cassandra add TTL to existing entries

How can I update an entire table and set a TTL for every entry?
Current Scenario (Cassandra 2.0.11):
table:
CREATE TABLE external_users (
external_id text,
type int,
user_id text,
PRIMARY KEY (external_id, type)
)
currently there are ~40mio entries in this table and i want to add a TTL for lets say 86 400 seconds (1day).
It's no problem for new entries with USING TTL(86400) or UPDATE current entries, but how do i apply a ttl for every already existing entry?
My idea was to select all data and update every single row with a little script. I was just wondering if there is an easier way to achieve this (because even with batch updates this is gonna take a while and is a big effort)
Thanks in advance
There is no way to alter TTL of existing data in C*. TTL is just an internal column attribute which is written together with all other column data into immutable SSTable. A quote from the docs:
If you want to change the TTL of expiring data, you have to re-insert the data with a new TTL. In Cassandra, the insertion of data is actually an insertion or update operation, depending on whether or not a previous version of the data exists.

TTL field for a set of columns in CQL3 - Cassandra

Consider the following Insert statement.
INSERT INTO NerdMovies (movie, director, main_actor, year)
VALUES ('Serenity', 'Joss Whedon', 'Nathan Fillion', 2005)
USING TTL 86400;
Does the TTL field specify the time to live for the whole set of columns for a particular primary key or just one particular column. Because i would want to specify a TTL for a whole set of columns that should get deleted after the TTL expires.
Ok, I figured it out my self. It sets the TTL for the whole set of columns. so, all the columns for a particular primary key will be deleted once the TTL expires.
#sayed-jalil
To be more precise, it will set TTL for the columns that you mentioned in the INSERT/UPDATE statement.
So for instance, if at time t you do
INSERT INTO NerdMovies (movie, director, main_actor, year)
VALUES ('Serenity', 'Joss Whedon', 'Nathan Fillion', 2005)
USING TTL 86400;
if you then do the following at time t + 10
UPDATE USING TTL 86400 NerdMovies SET year = 2004;
then columns movie, director, main_actor will have TTL of t+86400 and column year will have TTL of t+10+86400
Hope that makes sense.

Cassandra ttl on a row

I know that there are TTLs on columns in Cassandra. But is it also possible to set a TTL on a row? Setting a TTL on each column doesn't solve my problem as can be seen in the following usecase:
At some point a process wants to delete a complete row with a TTL (let's say row "A" with TTL 1 week). It could do this by replacing all existing columns with the same content but with a TTL of 1 week.
But there may be another process running concurrently on that row "A" which inserts new columns or replaces existing ones without a TTL because that process can't know that the row is to be deleted (it runs concurrently!). So after 1 week all columns of row "A" will be deleted because of the TTL except for these newly inserted ones. And I also want them to be deleted.
So is there or will there be Cassandra support for this use case or do I have to implement something on my own?
Kind Regards
Stefan
There is no way of setting a TTL on a row in Cassandra currently. TTLs are designed for deleting individual columns when their lifetime is known when they are written.
You could achieve what you want by delaying your process - instead of wanting to insert a TTL of 1 week, run it a week later and delete the row. Row deletes have the following semantics: any column inserted just before will get deleted but columns inserted just after won't be.
If columns that are inserted in the future still need to be deleted you could insert a row delete with a timestamp in the future to ensure this but be very careful: if you later wanted to insert into that row you couldn't, columns would just disappear when written to that row (until the tombstone is garbage collected).
You can set ttl for a row in Cassandra 3 using
INSERT INTO Counter(key,eventTime,value) VALUES ('1001',dateof(now()),100) USING ttl 10;
Although I do not recommend such, there is a Cassandra way to fix the problem:
SELECT TTL(value) FROM table WHERE ...;
Get the current TTL of a value first, then use the result to set the TTL in an INSERT or UPDATE:
INSERT ... USING TTL ttl-of-value;
So... I think that the SELECT TTL() is slow (from experience with TTL() and WRITETIME() in some of my CQL commands). Not only that, the TTL is correct at the time the select results are generated on the Cassandra node, but by the time the insert happens, it will be off. Cassandra should have offered a time to delete rather than a time to live...
So as mentioned by Richard, having your own process to delete data after 1 week is probably safer. You should have one column to save the date of creation or the date when the data becomes obsolete. Then a background process can read that date and if the data is viewed as obsolete, drop the entire row.
Other processes can also use that date to know whether that row is considered valid or not! (so even if it was not yet deleted, you can still view the row as invalid if the date is passed.)

Resources