cannot delete row key - cassandra

I'm having an issue while deleting row key in Cassandra. Whenever I delete Row Key all the columns contained by that RowKey are deleted but RowKey itself is not deleted. Can anybody tell me how to remove a rowkey, once it is inserted in columnfamily.
I'm looking forward to do that via thrift client.

This is a side effect of how distributed deletes work in Cassandra. From the Cassandra wiki page on distributed deletes:
[A] delete operation can't just wipe out all traces of the data being removed immediately: if we did, and a replica did not receive the delete operation, when it becomes available again it will treat the replicas that did receive the delete as having missed a write update, and repair them! So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone. The tombstone can then be propagated to replicas that missed the initial remove request.
Also take a look at this question on the FAQ: Why do deleted keys show up during range scans?

Related

Cassandra Delete when row not exists

Is there a performance impact when running delete statements on cassandra when row doesn't exist? I am not passing the IF EXISTS clause in my delete statement as it adds an overhead of checking. I haven't found anything online about this unique use-case.
Delete operation in Cassandra is just adding a marker called "tombstone" - it will be appended to the files with data to "hide" the previously existed data. It could have some performance impact on the read operations, if you have a lot of deletes inside partitions, etc. as tombstone usually kept for 10 days in the data file (configurable per table)
There is a very interesting blog post on deletes and tombstones - I recommend to read it.

Is there a way to view data in 2 replicas in Cassandra?

I am a newbie to Cassandra.I have created a keyspace in Cassandra in NetworkTopology Strategy with 2 replicas in one datacenter. Is there a cql command or some other way to view my data in two replicas?
Like SELECT * FROM tablename in replica1 / replica2
Whether there is another way such that I can visually see the data in two replicas?
Thanks in advance.
So your question is not real clear "See the data in 2 replicas". If you ever want to validate your data, you can run some commands to visually see things.
The first thing you'd want to do is log onto the node you want to investigate. Go to the data directory of the interested table -> DataDir/keyspace/table. In there you'll see one or more files that look like *Data.db. Those are your sstables. Data in memory is flushed to sstables in certain scenarios. You want to be sure your data is flushed from memory to disk if you're validating (as you may not find what you're looking for otherwise). To do that, you issue a "nodetool flush" command (you can use the keyspace and table as parameters if you only want to flush the specific table).
Like I said, after that, everything in memory would be flushed to disk. So you'd be able to see your sstables (again, *Data.db) files. Once you have those sstables, you can run the "sstabledump" command on each sstable to see the data that resides in them, thus validating your data.
If you have only a few rows you want to validate and a lot of nodes, you can find which node the rows would reside by running "nodetool getendpoints" with the keyspace, table, and partition key. That will tell you every node that will have the data. That way you're not guessing which node the row(s) should be on. Unfortunately, there is no way to know which sstable the rows should exist in (and it could be more than one if updates/deletes, etc. occurred). You'll have to go through each sstable on the specific node(s).
Hope that helps answer your question?
Good luck.
-Jim
You can for a specific partition. If you are sure host1 is a replica (nodetool getendpoints or from query trace), then if you make your query with CL.ONE and explicitly to that host, the coordinator will always pick local first. So
Statement q = new SimpleStatement("SELECT * FROM tablename WHERE key = X");
q.setHost("host1")
Where host1 owns X.
For SELECT * FROM tablename its a bit harder because you are looking over entire data set and coordinator will send out multiple queries for each part of ring. If you do some queries with CL.ONE it will still only go to one node for each part of that range so if you set q.enableTracing() you can see what node answered for each range. You have no control over which coordinator picks so may take few queries.
If you just want to see if theres differences you can use preview repair. nodetool repair --preview --full.

How to recover deleted data in cassandra?

If I deleted a value in a cell from a table, but later, I want to recover it back, how can I do it?
I know that, cassandra doesn’t really delete data, it just mark it as deleted, so how can I recover the data?
Usecase for example: I want to delete all information from a user, so I first delete the information in cassandra database, then, I try to delete his information in somewhere else, but it comes to an error, so I have to stop the deletion process and recover the deleted data from cassandra database.
How can I do that?
Unfortunately not. You could however use sstabledump (Cassandra >= 3.0) to inspect sstable contents, but there are some drawbacks:
if the data was not flushed to disk (thus being in the memtable) it will be deleted before reaching to sstable
you need to find the sstable that the data belongs to
Probably there are some other drawbacks that I miss right now.
Some workarounds
first copy the data to another table and then perform the delete. After you delete the information from the other location, you can safely delete it from your backup table.
a new column ("pending_delete") where you would record the state. You would only query for your "live" data.
a new table where you would store the pk of the data to be deleted and delete it from the both tables after the operation on the other location is successful.
Choosing the right solution I guess depends on your use case and the size of your data.

Partition DELETE/INSERT concurrency issue in Cassandra

I have a table in Cassandra which stores versions of csv-files. It uses a primary key with a unique id for the version (the partition key) and a row number (the clustering key). When I insert a new version I first execute a delete statement on the partition key I am about to insert, to clean up any incomplete data. Then the data is inserted.
Now here is the issue. Even though the delete and subsequent insert are executed synchronously after one another in the application it seems that some level of concurrency still exist in Cassandra, because when I read afterwards, rows from my insert will be missing occasionally - something like 1 in 3 times. Here are some facts:
Cassandra 3.0
Consistency ALL (R+W)
Delete using the Java Driver
Insert using the Spark-Cassandra connector
Number of nodes: 2
Replication factor: 2
The delete statement I execute looks like this:
"DELETE FROM myTable WHERE version = 'id'"
If I omit it, the problem goes away. If I insert a delay between the delete and the insert the problem is reduced (less rows missing). Initially I used a less restrictive consistency level, and I was sure this was the issue, but it didn't affect the problem. My hypothesis is that for some reason the delete statement is being sent to the replica asynchronously despite the consistency level of ALL, but I can't see why this would be the case or how to avoid it.
All mutations are going to by default get a write time of the coordinator for that write. From the docs
TIMESTAMP: sets the timestamp for the operation. If not specified,
the coordinator will use the current time (in microseconds) at the
start of statement execution as the timestamp. This is usually a
suitable default.
http://cassandra.apache.org/doc/cql3/CQL.html
Since the coordinator for different mutations can be different, a clock skew between coordinators can end up with a mutations to one machine to be skewed relative to another.
Since write time controls C* history this means you can have a driver which synchronously inserts and deletes but depending on the coordinator the delete can happen "before" the insert.
Example
Imagine two nodes A and B, B is operating with a 5 second clock skew behind A.
At time 0: You insert data to the cluster and A is chosen as the coordinator. The mutation arrives at A and A assigns a timestamp (0)
There is now a record in the cluster
INSERT VALUE AT TIME 0
Both nodes contain this message and the request returns confirming the write was successful.
At time 2: You issue a delete for the data previously inserted and B is chosen as the coordinator. B assigns a timestamp of (-3) because it is clock skewed 5 seconds behind the time in A. This means that we end up with a statement like
DELETE VALUE AT TIME -3
We acknowledge that all nodes have received this record.
Now the global consistent timeline is
DELETE VALUE AT TIME -3
INSERT VALUE AT TIME 0
Since the insertion occurs after the delete the value still exists.
I have got similar problem, and I have fixed it by enabling Light-Weight-Transaction for both INSERT and DELETE requests (for all queries actually, including UPDATE). It will make sure all queries to this partition are serialized through one "thread", so DELETE wan't overwrite INSERT. For example (assuming instance_id is a primary key):
INSERT INTO myTable (instance_id, instance_version, data) VALUES ('myinstance', 0, 'some-data') IF NOT EXISTS;
UPDATE myTable SET instance_version=1, data='some-updated-data' WHERE instance_id='myinstance' IF instance_version=0;
UPDATE myTable SET instance_version=2, data='again-some-updated-data' WHERE instance_id='myinstance' IF instance_version=1;
DELETE FROM myTable WHERE instance_id='myinstance' IF instance_version=2
//or:
DELETE FROM myTable WHERE instance_id='myinstance' IF EXISTS
IF clauses enable light-wight-transactions for each row, so all of them are serialized. Warning: LWT is more expensive than normal calls, but sometimes they are needed, like in the case of this concurrency problem.

Side effects of Cassandra hinted handoff lead to inconsistency

I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level is Write quorum, Read quorum.
Traffic has three major steps
Create:
Rowkey: xxxx
Column: status=new, requests="xxxxx"
Update:
Rowkey: xxxx
Column: status=executing, requests="xxxxx"
Delete:
Rowkey: xxxx
When one node down, it can work according to consistency configuration, and the final status is all requests are finished and deleted.
So if running cassandra client to list the result (also set consistency quorum). It shows empty (only rowkey left), which is correct.
But if we start the dead node, the hinted handoff model will write back the data to this node. So there are lots of create, update, delete.
I don't know due to GC or compaction, the delete records on other two nodes seems not work, and if using cassandra client to list the data (also consistency quorum), the deleted row show again with column value. Due to the recovery node replay the history again.
And if using client to check the data several times, you can find the data is changed, seems hinted handoff replay operation, the deleted data show up and then disappear.
Is there a way to have this procedure invisible from external, until the hinted handoff finished?
What I want is final status synchronization, the temporary status is out of date and also incorrect, should never been seen from external.
Is it due to row delete instead of column delete? Or compaction?
After check the log and configuration, I found it caused by two reason.
GC grace seconds
I using hector client to connect cassandra, and the default value of GC grace seconds for each column family is Zero! So when hinted handoff replay the temporary value, the tombstone on other two node is deleted by compaction. And then client will get the temporary value.
Secondary index
Even after fix the first problem, I can still get temporary result from cassandra client. And I use the command like "get my_cf where column_one='value' " to query the data, then the temporary value show again. But when I using the raw key to query the record again, it disappeared.
And from client, we always using row key to get the data, and in this way, I didn't get the temporary value.
So it seems the secondary index is not restricted by the consistency configuration.
And when I change GC grace seconds to 10 days. our problem solved, but it is still a strange behavior when using index query.

Resources