Why do I see so many tombstones in Cassandra? - cassandra

When I do a table repair, I see a lot of warnings like the following:
WARN [CompactionExecutor:112958] 2016-04-07 15:39:33,160 SliceQueryFilter.java:236 - Read 10002 live and 2857 tombstoned cells
But I do not delete anything and do not set TTLs, so nothing has been deleted. Why are there so many tombstones? The datasize is about 200G, but I have inserted some cells with NULL.

I had the same issue recently. And the reason was that I was inserting NULL values. I just did not know that I did.
If you use a prepared statement and you do not set some of the parameters, or you set them to NULL, or if you insert a JSON object that does not contain a key for every column of the table, then you end up with a tombstone for each of those columns.
In case of the prepared statement you can avoid the tombstone by explicitly setting unused parameters as unset. See CASSANDRA-7304.
Unfortunately there is no such syntax/API for JSON inserts yet. Have a look at CASSANDRA-11424 to check the progress.

Related

Tombstone scanning in system.log

I have a cassandra cluster with less delete use case. I found in my system.log "Read 10 live and 5645464 tombstones cells in keyspace.table" What does it mean? please help to understand.
Thanks.
For Cassandra, all the information recorded is immutable. This means that when you have a delete operation (explicit with a delete statement or with a Time To Live [TTL] clause), the database will add another record with a special flag named tombstone. All these records will stay on the database until the gc_grace_seconds periods have passed; the default is 10 days.
In your case, the engine found out that most of the records retrieved were deleted, but they are still waiting for the gc_grace_seconds to pass, to let compaction reclaim the space. One possible option to fix the issue is to decrease gc_grace_seconds for that table.
For more information, please refer to this article from the Last Pickle.
One more important thing to keep in mind when working with Cassandra is that tombstones cells do not directly correlate to deletes.
When you insert null value to an attribute when performing your insert, Cassandra internally marks that attribute/cell as a tombstone. So, even if you don't have a lot of deletes happening, you could end up with an enormous number of tombstones. Easy and simple solution is to not insert null values for an attribute while inserting.
As per this statement Read 10 live and 5645464 tombstones cells in keyspace.table goes, there might be a table scan for a query happening that is scanning 10 cells and 5645464 number of tombstones (cells with null value) while doing so is what I am guessing. Need to understand what type of queries are being executed to gain more insight into that.

Will Cassandra return tombstone rows as valid rows?

My idea is to add rows to Cassandra with TTL = 15 minutes so I'll be able to load realtime data (now - 15 minutes, now) w/o storing timestamps etc. My concern is that the rows with expiring TTL will be marked as tombstone (not actually deleted). I.e., will they count when I run select count(*) from realtime_table?
No, tombstoned rows won't be returned as a result - they will be skipped when reading the data.
But if you actively expiring the data, you may need to tune gc_grace_period, otherwise you can get too many not removed tombstones, and in some cases will start to get warning or error during read if read operation will need to skip tombstones (controlled by tombstone_warn_threshold & by tombstone_failure_threshold options of cassandra.yaml.
Here is the very good blog post that describes how data are deleted & cleaned up.
But select count(*) from table is real antipattern in Cassandra - you need to consider correct modelling of your data with partitions, etc.

Why don't an upsert create Tombstones in Cassandra?

As per Question regarding Tombstone, why doesn't upserts create tombstones?
As per datastax documentation, How is data updated ? for every upsert, cassandra considers as delete followed by insert, as the new timestamps of the insert overwrites the old timestamp. The old timestamp data has to be marked as delete which relates to tombstone.
Why do we have contradicting statements? or else am I missing anything here?
Usecase:
Data is inserted with unique key (uuid) in Cassandra and some of the columns in this data keeps updating frequently. Which approach do you recommend?
Inserting the same data with new column values in the
Insert query.
Updating the existing record based on given uuid
with new column values in the update query.
Which approach does or doesn't create tombstones? and how does Cassandra handle both queries?
As Russ pointed out, you may want to read other similar questions on this topic. However,
An upsert/overwrite is just-another-cell, with a name, a timestamp and a value.
A tombstone is just like an overwrite, except it gets one extra field indicating that it's been deleted, so that it isn't returned as valid output. The reason tombstones are often harmful is that they can accumulate in bad data models, even when people think the data is gone - and skipping them to get to live data actually requires memory.
When you update/upsert as you describe, the cell you create SHADOWS (obsoletes) the previous cell, which will be removed upon compaction. That previous cell is NOT a tombstone, even though it's no longer live/active - it will be compacted away and completely replaced by the new, live, highest-timestamp value as soon as compaction allows.
The biggest thing to keep in mind is this: tombstones aren't necessarily removed by compaction - they're kept around (persisted/rewritten) for at least gc_grace_seconds, and potentially even long if they need to shadow/cover other cells in sstables not-yet-compacted. Because of this, tombstones stay around for a long time, but shadowed/overwritten cells are gc'd as soon as the sstable they're in is compacted.

Reinserting data after row deletion in cassandra using Pelops

I am trying to re-insert data for same row-key after deleting the row but they are not getting inserted. Neither any exception is thrown.
I am using Pelops RowDeletor to delete the row data (Note that the row-key is still shown with no columns) after deleting. If I truncate the table and reinsert columns gets inserted.
I have tried changing consistency levels from ANY to ONE to ALL. Any ideas as to whats the problem or should I go for Hector client?
This can be an issue with the tombstones (keys without columns) if your timestamp on your column is in the past. Make sure this is not the case, and you should be able to insert. Note that this is not an issue with Pelops but is related to Cassandra's conflict resolution. If you have a tombstone that's newer than the insert, you will have this issue because Cassandra sees the delete as having happened after the insert.

When I remove rows in Cassandra I delete only columns not row keys

If I delete every keys in a ColumnFamily in a Cassandra db using remove(key), then if I use get_range_slices, rows are still there but without columns. How could I remove entire rows?
Why do deleted keys show up during range scans?
Because get_range_slice says, "apply this predicate to the range of rows given," meaning, if the predicate result is empty, we have to include an empty result for that row key. It is perfectly valid to perform such a query returning empty column lists for some or all keys, even if no deletions have been performed.
Cassandra uses Distributed Deletes as expected.
Thus, a delete operation can't just wipe out all traces of the data
being removed immediately: if we did, and a replica did not receive
the delete operation, when it becomes available again it will treat
the replicas that did receive the delete as having missed a write
update, and repair them! So, instead of wiping out data on delete,
Cassandra replaces it with a special value called a tombstone. The
tombstone can then be propagated to replicas that missed the initial
remove request.
http://wiki.apache.org/cassandra/DistributedDeletes
Just been having the same issue and I found that:
This has been fixed in 0.7
(https://issues.apache.org/jira/browse/CASSANDRA-1027).
And backported to 0.6.3
This is also relevant:
https://issues.apache.org/jira/browse/CASSANDRA-494

Resources