Why don't an upsert create Tombstones in Cassandra? - cassandra

As per Question regarding Tombstone, why doesn't upserts create tombstones?
As per datastax documentation, How is data updated ? for every upsert, cassandra considers as delete followed by insert, as the new timestamps of the insert overwrites the old timestamp. The old timestamp data has to be marked as delete which relates to tombstone.
Why do we have contradicting statements? or else am I missing anything here?
Usecase:
Data is inserted with unique key (uuid) in Cassandra and some of the columns in this data keeps updating frequently. Which approach do you recommend?
Inserting the same data with new column values in the
Insert query.
Updating the existing record based on given uuid
with new column values in the update query.
Which approach does or doesn't create tombstones? and how does Cassandra handle both queries?

As Russ pointed out, you may want to read other similar questions on this topic. However,
An upsert/overwrite is just-another-cell, with a name, a timestamp and a value.
A tombstone is just like an overwrite, except it gets one extra field indicating that it's been deleted, so that it isn't returned as valid output. The reason tombstones are often harmful is that they can accumulate in bad data models, even when people think the data is gone - and skipping them to get to live data actually requires memory.
When you update/upsert as you describe, the cell you create SHADOWS (obsoletes) the previous cell, which will be removed upon compaction. That previous cell is NOT a tombstone, even though it's no longer live/active - it will be compacted away and completely replaced by the new, live, highest-timestamp value as soon as compaction allows.
The biggest thing to keep in mind is this: tombstones aren't necessarily removed by compaction - they're kept around (persisted/rewritten) for at least gc_grace_seconds, and potentially even long if they need to shadow/cover other cells in sstables not-yet-compacted. Because of this, tombstones stay around for a long time, but shadowed/overwritten cells are gc'd as soon as the sstable they're in is compacted.

Related

Does Cassandra store only the affected columns when updating a record or does it store all columns every time it is updated?

If the answer is yes,
Does that mean unlike Mongo or RDMS, whether we retrieve every column or some column will have big performance impact in Cassandra?(I am not talking about transfer time over network as it will affect all of the above)
Does that mean during compaction, it cannot just stop when it finds the latest row for a primary key, it has to go through the full set in SSTables? (I understand there will be optimisations as previously compacted SSTable will have maximum one occurrence for row)
Please ask only one question per question.
That is entirely up to you. If you write one column value, it'll persist just that one. If you write them all, they will all persist, even if they are the same as the current value.
whether we retrieve every column or some column will have big performance impact
This is definitely the case. Queries for column values that are small or haven't been written to or deleted will be much faster than the opposite.
during compaction, it cannot just stop when it finds the latest row for a primary key, it has to go through the full set in SSTables?
Yes. And not just during compaction, but read queries will also check multiple SSTable files.

Tombstone scanning in system.log

I have a cassandra cluster with less delete use case. I found in my system.log "Read 10 live and 5645464 tombstones cells in keyspace.table" What does it mean? please help to understand.
Thanks.
For Cassandra, all the information recorded is immutable. This means that when you have a delete operation (explicit with a delete statement or with a Time To Live [TTL] clause), the database will add another record with a special flag named tombstone. All these records will stay on the database until the gc_grace_seconds periods have passed; the default is 10 days.
In your case, the engine found out that most of the records retrieved were deleted, but they are still waiting for the gc_grace_seconds to pass, to let compaction reclaim the space. One possible option to fix the issue is to decrease gc_grace_seconds for that table.
For more information, please refer to this article from the Last Pickle.
One more important thing to keep in mind when working with Cassandra is that tombstones cells do not directly correlate to deletes.
When you insert null value to an attribute when performing your insert, Cassandra internally marks that attribute/cell as a tombstone. So, even if you don't have a lot of deletes happening, you could end up with an enormous number of tombstones. Easy and simple solution is to not insert null values for an attribute while inserting.
As per this statement Read 10 live and 5645464 tombstones cells in keyspace.table goes, there might be a table scan for a query happening that is scanning 10 cells and 5645464 number of tombstones (cells with null value) while doing so is what I am guessing. Need to understand what type of queries are being executed to gain more insight into that.

Overwrite row in cassandra with INSERT, will it cause tombstone?

Writing data to Cassandra without causing it to create tombstones are vital in our case, due to the amount of data and speed. Currently we have only written a row once, and then never had the need to update the row again, only fetch the data again.
Now there has been a case, where we actually need to write data, and then complete it with more data, that is finished after awhile.
It can be made by either;
overwrite all of the data in a row again using INSERT (all data is available), or
performing an Update only on the new data.
What is the best way to do it, bear in mind of the speed and not creating a tombstone is of importance ?
Tombstones will only created when deleting data or using TTL values.
Cassandra does align very well to your described use case. Incrementally adding data will work for both INSERT and UPDATE statements. Cassandra will store data in different locations in case of adding data over time for the same partition key. Periodically running compactions will merge data again for a single key to optimize access and free disk space. This will happend based on the timestamp of written values but does not create any new tombstones.
You can learn more about how Cassandra stores data e.g. here.
It would be more efficient to do an update to add new or changed data. There is no need to rewrite the old data that isn't changing and it would be inefficient to make Cassandra rewrite it.
When you do an insert or update, Cassandra keeps a timestamp for the modify time for each column. When you do a read, Cassandra collects all the writes for that key from in memory, from on disk, and from other replicas depending on the consistency setting. It will then merge the column data so that the newest value is used for each column.
When data is compacted on disk, if there are separate updates for different columns of a row, those will be combined into a single row in the compacted data.
You don't need to worry about creating tombstones by doing an update unless you are using an update to set a TTL (Time To Live) value. In your application it sounds like you never delete data, so you will never have any tombstones.

How Cassandra manage insertion, update and Deletion of column and Column data. internally

Actually I am getting confused with some concepts regarding cassandra.
what do we Actually mean by updating Cassandra row? is it mean adding more column or updates in the value of the column. or it is both.?
When we are adding more column to a row. is the previous row in the sstable got invalidate and new row entry is inserted in the SSTABLE with the newly added rows.?
Since SSTable is immutable so each new update in Column data OR addition of Column OR Deletion of Column data will result in invalidating the previous row and inserting a new Row with all the previous column+new Column?
Please Help..
What do we Actually mean by updating Cassandra row? is it mean adding
more column or updates in the value of the column. or it is both.?
In cassandra, updating a row and inserting a row are the same operation, bot lead to adding data to a memtable (in-memory sstable) which is latter flushed to disk and becomes an sstable (also a log line is written to the commit log if persistent writes are enabled). If you insert a column (btw in cassandra terms, a column is the same as a cell, and a row is known as a partition, you might find this useful if you do any further reading) which already exists, e.g:
INSERT INTO db.tbl (id, value) VALUES ('text_id1', 'some text as a value');
INSERT INTO db.tbl (id, value) VALUES ('text_id1', 'some text as a value');
You'll end up with 1 partition, since the first one is overwritten by the second insert. This means that inserting partitions with duplicate keys leads to the previous one being overwritten (and the overwrite is based on the timestamp at the time of insert, last write wins).
When we are adding more column(cell) to a row(partition). is the
previous row in the sstable got invalidate and new row entry is
inserted in the SSTABLE with the newly added rows.?
For cql, the previous columns will just contain a null value. No invalidation will happen, you can alter schemas as you please. If you delete a column, its' data will be removed during the next compaction with the aim of reclaiming back disk space.
Since SSTable is immutable so each new update in Column data OR
addition of Column OR Deletion of Column data will result in
invalidating the previous row and inserting a new Row with all the
previous column+new Column?
Kind of, sstables are merged into larger sstables when necessary, how this is done depends on the compaction strategy that is being used. There are two flavours, size-tiered and levelled compaction. Covering how they work is a whole separate question that has been answered by people who are smarter than me so have a read here.
Updating is covered here:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_write_update_c.html
As you note, SSTables are immutable, so you're probably wondering what happens when a later write supercedes data already in an SSTable. The storage engine reads from all tables that might have data for a requested row (as determined by bloom filters for each table). Understanding the read path might clarify this for you:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_about_reads_c.html
Specifically:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_about_read_path_c.html

Deleting a row of supercolumns and immediately replacing it with new data

Say I have a row of super-columns in Cassandra. I delete the entire row (it is now marked with a tombstone). I then immediately (before any compaction / nodetool repair) add different data with the same exact row-key. My question is, does Cassandra properly handle this and delete the data, or is there a risk of sstables being orphaned that should have been deleted?
all depends on the timestamps. The later timestamp wins....so if deletes timestamp is before the modification timestampt, modification wins and puts stuff in there.
Dean
PlayOrm for Cassandra Developer

Resources