How to recover deleted data in cassandra? - cassandra

If I deleted a value in a cell from a table, but later, I want to recover it back, how can I do it?
I know that, cassandra doesn’t really delete data, it just mark it as deleted, so how can I recover the data?
Usecase for example: I want to delete all information from a user, so I first delete the information in cassandra database, then, I try to delete his information in somewhere else, but it comes to an error, so I have to stop the deletion process and recover the deleted data from cassandra database.
How can I do that?

Unfortunately not. You could however use sstabledump (Cassandra >= 3.0) to inspect sstable contents, but there are some drawbacks:
if the data was not flushed to disk (thus being in the memtable) it will be deleted before reaching to sstable
you need to find the sstable that the data belongs to
Probably there are some other drawbacks that I miss right now.
Some workarounds
first copy the data to another table and then perform the delete. After you delete the information from the other location, you can safely delete it from your backup table.
a new column ("pending_delete") where you would record the state. You would only query for your "live" data.
a new table where you would store the pk of the data to be deleted and delete it from the both tables after the operation on the other location is successful.
Choosing the right solution I guess depends on your use case and the size of your data.

Related

Cassandra Delete when row not exists

Is there a performance impact when running delete statements on cassandra when row doesn't exist? I am not passing the IF EXISTS clause in my delete statement as it adds an overhead of checking. I haven't found anything online about this unique use-case.
Delete operation in Cassandra is just adding a marker called "tombstone" - it will be appended to the files with data to "hide" the previously existed data. It could have some performance impact on the read operations, if you have a lot of deletes inside partitions, etc. as tombstone usually kept for 10 days in the data file (configurable per table)
There is a very interesting blog post on deletes and tombstones - I recommend to read it.

cassandra: restoring partially lost data

Theoretical question:
Lets say I have a cassandra cluster with some data in it.
Backups are created on a daily basis.
Now a subset of data is being lost, either by application error or manual deletion.
What is the best way to restore data from existing backup?
I can think of starting a separate node with the backup disk attached, then export data manually through selects and reimport into the prod database.
That would work but sounds complicated, is there a more straight forward solution for such problems?
If its a single partition probably best bet is to use sstabledump or something like sstable-tools to read from it and just manually reinstert. If ok with restoring everything deleted from time of snapshot: reduce gcgrace to purge any tombstones with a force compact (or else they will continue to shadow the restored data) and use the sstable loader or if the token ranges are the same copy the backed up sstables back in the data directory.

Is it possible to recover deleted column data in cassandra?

If we have deleted some(20 query) data in Cassandra using below delete query.
DELETE lastname FROM cycling.cyclist_name WHERE id = c7fceba0-c141-4207-9494-a29f9809de6f;
So how we can restore/find above deleted data in Cassandra? please help
If no compaction happened yet, then you may recover the data from SSTables via sstabledump and get them from generated JSON files.
But correct answer is to use some kind of backup solution - via OpsCenter, or using the manual backup via nodetool snapshot, etc. More information you can find in following article of DataStax support team.
Cassandra doesn't delete data immediately. As Alex hinted, it will still be in the sstables (data files) until compaction, and only marked with a deletion flag (tombstoned).
You can dump the contents of the sstables into text files and then search for your id.
Do something like this for each sstable:
sstabledump mc-3-big-Data.db > dump2019a
These text files will have your data, with a "deletion_info" flag. You can then search for your id and retrieve the data.
You should act quickly before compaction, though.

Cassandra - recovery of data after accidental delete

As the data in case of Cassandra is physically removed during compaction, is it possible to access the recently deleted data in any way? I'm looking for something similar to Oracle Flashback feature (AS OF TIMESTAMP).
Also, I can see the pieces of deleted data in the relevant commit log file, however it's obviously unreadable. Is it possible to convert this file to a more readable format?
You will want to execute a restore from your commitlog.
The safest is to copy the commitlog to a new cluster (with same schema), and restore following the instructions (comments) from commitlog_archiving.properties file. In your case, you will want to set restore_point_in_time to a time between your insert and your delete.

cannot delete row key

I'm having an issue while deleting row key in Cassandra. Whenever I delete Row Key all the columns contained by that RowKey are deleted but RowKey itself is not deleted. Can anybody tell me how to remove a rowkey, once it is inserted in columnfamily.
I'm looking forward to do that via thrift client.
This is a side effect of how distributed deletes work in Cassandra. From the Cassandra wiki page on distributed deletes:
[A] delete operation can't just wipe out all traces of the data being removed immediately: if we did, and a replica did not receive the delete operation, when it becomes available again it will treat the replicas that did receive the delete as having missed a write update, and repair them! So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone. The tombstone can then be propagated to replicas that missed the initial remove request.
Also take a look at this question on the FAQ: Why do deleted keys show up during range scans?

Resources