Moving a record after TTL expiry - cassandra

I have two tables, a normal table and it's archived version. The rows in the normal table need to be moved to the archived version after TTL expires on the row. How can I accomplish this?
Is there a native trigger feature in Cassandra that I can use to move the record over to the audit table?
I know how to do this using code, but I thought that a batch process or even an event driven process to move it is unnecessarily complex.

Short answer, no, there is no way to achieve this without writing code for it.
When TTL is expired and when the record is read after that, the record will be marked as tombstone and once the gc grace period is finished, it is removed from the disk. There is no control over these operations/events and hence there is no way, including triggers, to instruct cassandra to insert this row into some other table.

Related

capturing data expired through Time to Live cassandra

I have a specific usecase where i want to perform some operations with the data that gets expired as a reason of providing TTL while inserting a row in cassandra.Currently i am not able to find any such provision where i can fetch the data that expired as a reason of TTL.
You can change your data model to have the ttl'd date as part you can query on. You can than make a bulk job that processes them at intervals.
Or you can make a custom compaction strategy that triggers something when reading a column that has expired. It wont be "as it happens" but more as the columns are cleaned up/turned into tombstones.

Does Cassandra preserve writetime after a restore?

I would like to know if a query like
select id,field,writetime(field) from mykeyspace.table
will return exactly the same values after a backup/restore operation. I'm not sure if the restore operation will change the internal timestamp handled by Cassandra's "writetime" function.
The "writetime" is preserved across Cassandra backup/restore. It can be easily tested if you had TTL on your original data. While you restore, the TTL gets carried from original written time and not the restore time.
Say for testing, you had a short TTL of 5min and you did a backup/restore, the record would get wiped out within 5min of original writetime.

Performance - TTL vs Deleting a row in Cassandra

We have a massive set of data that is written in to millions of rows in cassandra. We also have a scheduler that needs to process these records and remove them after processing them successfully.
Was wondering if Deleting the row after processing vs Marking a row with a TTL (essentially delaying its deletion).
Are there any pros / cons with Deletion vs TTL w.r.t Cassandra performance ?.
Thanks much
_DD
When using TTL the record is not removed from storage immediately, it is marked as tombstone. It gets physically removed only when the compaction occurs. Till that time the data impacts the nodes processing as it consumes the resources till the compaction happens. When you do a range query event the deleted(marked as tombstone) records are scanned by Cassandra. So using TTL to delete too many entries is considered as anti-pattern. The recommendation is to use temporary tables so that individual rows need not be removed. Just drop the entire table.
From what little information you have given here it sounds to me that you are using Cassandra as a queue which is a well known anti-pattern. You can read more about that here:
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
However to answer your basic question there is little difference in performance between using TTL and deletes. TTL's in C* are handled as tombstones which is the same as a delete. The major difference is that a tombstone is not written to a record who's TTL has expired until that record is read again. When a delete is called a tombstone is immediately created. Tombstones in general cause significant performance problems within C* and while there are some methods to mitigate the issues that they create having large numbers of them usually point to a poor data model or poor use case for C*. If you are really looking at using C* as a queue why not look at using something more fit for that purpose such as Redis?
Based on what I've read, TTL will probably be as fast as your fastest delete process could be. The reason for this is that TTL doesn't have to seek the data in order to mark it with a tombstone. The TTL lives on the record and when the record is read and the TTL has expired, then it is marked with a tombstone.
http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html

MemSQL for Last n Days Data

I plan to use memsql to store my last 7 days data for real time analytics using SQL.
I checked the documentation and find out that there is no such TTL / expiration feature in MemSQL
Is there any such feature (in case I missed it)?
Is memsql fit the use case if I do daily delete on >7 days data? I quite curious about the fragmentation
We tried it on postgresql and we need to execute Vacuum command, it takes a long time to run.
There is no TTL/expiration feature. You can do it by running delete queries. Many customer use cases are doing this type of thing, so yes MemSQL does fit the use case. Fragmentation generally shouldn't be too much of a problem here - what kind of fragmentation are you concerned about?
There is No Out of the Box TTL feature in MemSQL.
We achieved TTL by adding an additional TS column in our MemSQL Rowstore table with TIMESTAMP(6) datatype.
This provides automatic current timestamp insertion when you add a new row to the table.
When querying data from this table, you can apply a simple filter based on this TIMESTAMP column to filter older records beyond your TTL value.
https://docs.memsql.com/sql-reference/v6.7/datatypes/#time-and-date
You can always have a batch job which can run one a month which can delete older data.
we have not seen any issues due to fragmentation but you can do below once in a while if fragmentation is a concern for you:
MemSQL’s memory allocators can become fragmented over time (especially if a large table is shrunk dramatically by deleting data randomly). There is no command currently available that will compact them, but running ALTER TABLE ADD INDEX followed by ALTER TABLE DROP INDEX will do it.
Warning
Caution should be taken with this work around. Plans will rebuild and the two ALTER queries are going to move all moves in the table twice, so this should not be used that often.
Reference:
https://docs.memsql.com/troubleshooting/latest/troubleshooting/

remove specified ttl in cassandra

I read about updating ttl and that it is only possible by updating row.
But I want to remove ttl. I fear it is the same process, but I did not found any information about it. Is there a way to remove ttl without updating all rows?
What I do, is saving user information with ttl when user is registrating. So if the user do not validate his/her mail address the entry will automaticly delete.
Here's an excerpt from the official docs here.
If you want to change the TTL of expiring data, you have to re-insert
the data with a new TTL. In Cassandra, the insertion of data is
actually an insertion or update operation, depending on whether or not
a previous version of the data exists.
TTL data has a precision of one second, as calculated on the server.
Therefore, a very small TTL probably does not make much sense.
Moreover, the clocks on the servers should be synchronized; otherwise
reduced precision could be observed because the expiration time is
computed on the primary host that receives the initial insertion but
is then interpreted by other hosts on the cluster.
This is slightly unpleasant in practice, but it's relatively easy to build a very simple migration tool, you would simply iterate through the entire table and re-insert all the records with a new TTL in another table.
If computationally/storage-wise you can afford to do this, it's probably a more compelling idea to store the records twice, once with TTL and once without, simply to go around the limitation: you cannot cancel or change the TTL in Cassandra.

Resources