I have a question regarding inserting data to Cassandra.
I have deleted a row key from a Column Family(CF) after some time I am trying insert data with the same rowkey.
The program gets executed but when I try to access the data with the rowkey from command line I get zero results.
Why is this happening? I know there is some thing called "Tombstone" with each deleted key.
But I am trying to insert the data after compaction.
I have set my <GCGraceSeconds> 0 </GCGraceSeconds>.
Thanks in advance.
probably your delete is happening with a larger timestamp than the insert you did later.
How many nodes do you have? Are their clocks synchronized accurately? If not, this could cause inconsistencies.
I have seen this same problem happening but I haven't been able to debug it. Currently I'm checking http://wiki.apache.org/cassandra/FAQ#range_ghosts to see if that is causing the problem. Maybe it will help you too.
Related
I would like to know if it is safe to delete an entire partition in cassandra in a single DeleteQuery. How is the performance in this case ? Any insights ?
Partition deletes is the best that you can do from performance standpoint because it generates only a single tombstone of special type. You can read more about different types of deletes & tombstones in the following great blog post.
It is a matter of necessity, not a question of whether it is safe or not.
If you need to delete a partition then delete the partition. If you need to delete a row then delete the row. If you need to delete a column, delete the column.
I'm guessing that you've read somewhere that tombstones are an issue in Cassandra. The problem with tombstones isn't with the tombstones themselves -- it's whether you are using Cassandra to process queues or queue-like datasets.
As a friendly note, a better question is "What problem are you trying to solve?" instead of asking an open-ended question without providing background information or context. Cheers!
We have a requirement where we would like our application (which might be deployed on multiple hosts) to create a row in Cassandra. The only host which is successful in creating the row, execute the work. Would it be enough to write an insert statement like below so that if two server try to insert the row, only one succeeds, and the other one gets the exception/does not succeed?
INSERT INTO keyspace1.claim (claim_id, status) VALUES (1, false) IF NOT EXIST
Would like to understand using IF NOT EXIST will avoid the upsert.
Thanks,
Shilpa
Yes, IF NOT EXISTS will include a paxos round and read-before-write though so much much slower. Check the resultset of the insert with wasApplied() to tell if it took or not.
https://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
I'm running a Cassandra 3.9 cluster, and today I noticed some NULL values in some generated reports.
I opened up cqlsh and after some queries I noticed that null values are appearing all over the data, apparently in random columns.
Replication factor is 3.
I've started a nodetool repair on the cluster but it hasn't finished yet.
My question is: I searched for this behavior and could not find it anywhere. Apparently the random appearance of NULL values in columns is not a common problem.
Does anyone know what's going on? This kind of data corruption seems pretty serious. Thanks in advance for any ideas.
ADDED Details:
Happens on columns that are frequently updated with toTimestamp(now()) which never returns NULL, so it's not about null data going in.
Happens on immutable columns that are only inserted once and never changed. (But other columns on the table are frequently updated.)
Do updates cause this like deletions do? Seems kinda serious to me, to wake up to a bunch of NULL values.
I also know specifically some of the data that has been lost, three entries I've already identified are for important entries which are missing. These have not been deleted for sure - there is no deletion on one specific table which is full of NULL everywhere.
I am the sole admin and nobody ran any nodetool commands overnight, 100% sure.
UPDATE
nodetool repair has been running for 6+ hours now and it fully recovered the data on one varchar column "item description".
It's a Cassandra issue and no, there were no deletions at all. And like I said functions which never return null had null in them(toTimestamp(now())).
UPDATE 2
So nodetool repair finished overnight but the NULLs were still there in the morning.
So I went node by node stopping and restarting them and voilĂ , the NULLs are gone and there was no data loss.
This is a major league bug if you ask me. I don't have the resources now to go after it, but if anyone else faces this here's the simple "fix":
Run nodetool repair -dcpar to fix all nodes in the datacenter.
Restart node by node.
I faced a similar issue some months ago. It's explained quite good in the following blog. (This is not written by me).
The null values actually have been caused by updates in this case.
http://datanerds.io/post/cassandra-no-row-consistency/
Mmmh... I think that if this was a Cassandra bug it would already be reported. So I smell code bug in your application, but you didn't post any code, so this will remain only a (wild) guess until you provide some code (i'd like to have a look at the update code).
You don't delete data, nor use TTL. It may seem there are no other ways to create NULL values, but there's one more tricky one: failing at binding, that is explictly binding to NULL. It may seem strange, but it happens...
Since
...null values are appearing all over the data...
I'd expect to catch this very fast enabling some debugging or assert code on the values before issuing any updates.
check the update query if it updates only the columns necessary, or it does it through Java beans which includes the list of all columns in the table. This would explain the NULL updates for other columns which weren't desired to be updated.
The Select all query works fine in a Cassandra KeySpace-Column Family.
But when the Select by Key query is issued for a specific key, the FluentCassandra client throws a timeout exception.
This happens for some of the keys in the same column family , while others succeed.
Could this be due to an index issue.
Okay,
I provided no further information about your issue. But since I had the same issue and stumbled upon this question here is what happened to me.
I followed #Richard suggestion and took a look on cassandra logs while the query was executed. I turned out that some non existent data file was being referenced.
Restarting Cassasandra forced the system to check its logs and rebuild the data files solving the issue.
I have a table in which the PK column Id of type bigint and it is populated automatically increasing order of 1,2,3...so on
i notice that some times all of a sudden the ids that are generated are having very big value . for example the ids are like 1,2,3,4,5,500000000000001,500000000000002
there is a huge jump after 5...ids 6 , 7 were not used at all
i do perform delete operations on this table but i am absolutely sure that missing ids were not used before.
why does this occur and how can i fix this?
many thanks for looking in to this.
my env:
sybase ase 15.0.3 , linux
You get this with Sybase when the system is restarted after an improper shutdown. See full description, and what to do about it, here.