Cassandra data corruption: NULL values appearing on certain columns - cassandra

I'm running a Cassandra 3.9 cluster, and today I noticed some NULL values in some generated reports.
I opened up cqlsh and after some queries I noticed that null values are appearing all over the data, apparently in random columns.
Replication factor is 3.
I've started a nodetool repair on the cluster but it hasn't finished yet.
My question is: I searched for this behavior and could not find it anywhere. Apparently the random appearance of NULL values in columns is not a common problem.
Does anyone know what's going on? This kind of data corruption seems pretty serious. Thanks in advance for any ideas.
ADDED Details:
Happens on columns that are frequently updated with toTimestamp(now()) which never returns NULL, so it's not about null data going in.
Happens on immutable columns that are only inserted once and never changed. (But other columns on the table are frequently updated.)
Do updates cause this like deletions do? Seems kinda serious to me, to wake up to a bunch of NULL values.
I also know specifically some of the data that has been lost, three entries I've already identified are for important entries which are missing. These have not been deleted for sure - there is no deletion on one specific table which is full of NULL everywhere.
I am the sole admin and nobody ran any nodetool commands overnight, 100% sure.
UPDATE
nodetool repair has been running for 6+ hours now and it fully recovered the data on one varchar column "item description".
It's a Cassandra issue and no, there were no deletions at all. And like I said functions which never return null had null in them(toTimestamp(now())).
UPDATE 2
So nodetool repair finished overnight but the NULLs were still there in the morning.
So I went node by node stopping and restarting them and voilĂ , the NULLs are gone and there was no data loss.
This is a major league bug if you ask me. I don't have the resources now to go after it, but if anyone else faces this here's the simple "fix":
Run nodetool repair -dcpar to fix all nodes in the datacenter.
Restart node by node.

I faced a similar issue some months ago. It's explained quite good in the following blog. (This is not written by me).
The null values actually have been caused by updates in this case.
http://datanerds.io/post/cassandra-no-row-consistency/

Mmmh... I think that if this was a Cassandra bug it would already be reported. So I smell code bug in your application, but you didn't post any code, so this will remain only a (wild) guess until you provide some code (i'd like to have a look at the update code).
You don't delete data, nor use TTL. It may seem there are no other ways to create NULL values, but there's one more tricky one: failing at binding, that is explictly binding to NULL. It may seem strange, but it happens...
Since
...null values are appearing all over the data...
I'd expect to catch this very fast enabling some debugging or assert code on the values before issuing any updates.

check the update query if it updates only the columns necessary, or it does it through Java beans which includes the list of all columns in the table. This would explain the NULL updates for other columns which weren't desired to be updated.

Related

Data loss after repair

Our repair jobs failed for a long period (> 14 days).
Today i manually started an repair job with nodetool repair -pr. Afterwards it looks like we lost some data from a table.
Question:
Is it theoretically possible to lose data after a repair job?
If yes what can be done to avoid this?
You should not lose data with repair. If anything, you could gain back records that were deleted (resurrected zombie records).
One scenario where data might appear to be "lost" is if you have a missing tombstone cell copied from an alternate node during repair. That would be a correct value, not a lost value. If your client CL was something small, say, 1, and you're on the node with the data (but missing the tombstone), you might think that all-of-the-sudden you lost the cell, but again, that's the correct value.
Another scenario where things might appear to be "lost" is if the nodes time/clocks ever got out of sync and on your cluster where certain cells have incorrect time/date values causing things to get potentially messed up when repair tries to sync things up.
That's all I can think of off the top of my head.
-Jim

Does cassandra guarantee row level consistency during write?

As I understand a row in a cassandra table is a Set of Key-value pairs (corresponding to each column)
I notice a strange issue during insert, values are not persisted in couple of columns, though I am fairly confident it has values before insert.
It happens sporadically and succeeds if we retry later. We are suspecting some kind of race condition or db connection drop etc.
Is it possible that only a subset of keys gets saved in a row of cassandra table ? Does cassandra guarantee all or nothing during save (row level consistency)
Cassandra Version : 2.1.8
Datastax cassandra-driver-core : 3.1.0
On the row level the concurrency guarantees are described pretty much in this answer.
Cassandra row level isolation
As far as your problem goes. First check if it's really cassandra with dropped mutations
nodetool tpstats
If you see dropped mutations, it's likely you are running underpowered setup and you simply have to put more hardware to the problem you are facing.
There isn't really more from your question that I can tell. Just as a precaution, please go into your code and check that you are actually creating a new bound statement every time and that you are not reusing the created bound statement instance. Once a client had this issue that the inserts were lost under mysterious circumstances and that was it. Hope this helps you, if not please give some code that you have.
There are consistency levels for read and writes in Cassandra.
It looks like you are using consistency level one, so your reads/writes are not consistent. Try to use quorum for both reads and writes and see if the problem resolves.
If this doesn't help, please provide example query, cluster size, rf factor.

Where can I observer writes to Cassandra database, aka where are they logged?

Trying to track down a problem with one of our developers, mainly a program he wrote that modifies (adds some flags) to existing entries in the various tables in our Cassandra keyspace.
The issue is that it seems to work just fine for many of the tables, but at least 3 so far I've discovered that it isn't writing anything to them. The only thing his logs can tell me is that x number of rows were committed to the database, but of course when I query a specific row I see that is not the case.
I was just wondering if there is somewhere that Cassandra logs each INSERT so I can look the log and figure out what was going on when it was supposedly inserting that data into the table? I know when a write command is issued it is written to the commit log but I believe that is not human readable, so I need to be able to check somewhere that is.
The only thing his logs can tell me is that x number of rows were
committed to the database, but of course when I query a specific row I
see that is not the case.
This sounds like it might be a consistency issue, can you query using CL ALL?
I was just wondering if there is somewhere that Cassandra logs each
INSERT so I can look the log and figure out what was going on when it
was supposedly inserting that data into the table?
Bad news:
Cassandra does not have audit logging
Good news:
DSE does have audit logging -- http://docs.datastax.com/en/datastax_enterprise/4.7/datastax_enterprise/sec/secAuditingCassandraTable.html
Remember there is a performance penalty for audit logging. You may just want to turn it on temporarily.

Index Corruption in Cassandra

The Select all query works fine in a Cassandra KeySpace-Column Family.
But when the Select by Key query is issued for a specific key, the FluentCassandra client throws a timeout exception.
This happens for some of the keys in the same column family , while others succeed.
Could this be due to an index issue.
Okay,
I provided no further information about your issue. But since I had the same issue and stumbled upon this question here is what happened to me.
I followed #Richard suggestion and took a look on cassandra logs while the query was executed. I turned out that some non existent data file was being referenced.
Restarting Cassasandra forced the system to check its logs and rebuild the data files solving the issue.

Not able to insert data into Cassandra./

I have a question regarding inserting data to Cassandra.
I have deleted a row key from a Column Family(CF) after some time I am trying insert data with the same rowkey.
The program gets executed but when I try to access the data with the rowkey from command line I get zero results.
Why is this happening? I know there is some thing called "Tombstone" with each deleted key.
But I am trying to insert the data after compaction.
I have set my <GCGraceSeconds> 0 </GCGraceSeconds>.
Thanks in advance.
probably your delete is happening with a larger timestamp than the insert you did later.
How many nodes do you have? Are their clocks synchronized accurately? If not, this could cause inconsistencies.
I have seen this same problem happening but I haven't been able to debug it. Currently I'm checking http://wiki.apache.org/cassandra/FAQ#range_ghosts to see if that is causing the problem. Maybe it will help you too.

Resources