Why do Tombstones affect read performance but not updates? - cassandra

From the articles I read they say that tombstones affect read performance in Cassandra. I’m reading how data is updated in Cassandra and looks like data is written with a timestamp without modifying or reading the current data.
So when a read is performed before compaction is done, filtering needs to be done to take the latest value right? If that’s the case aren’t tombstones the same thing and why do they affect performance negatively but not updates to a row?

In Cassandra, update is a mutation, like, insert and delete, and except the use case of LWTs and some of the list operations, all mutations are just append to the memtable/commit log, without reading the data on the disk. So they are very fast - no checks are performed.
Read operation, in contrast to that, need to get all versions of the data from the disk/memtable, and then create an actual version of the data based on the timestamps. And for tombstones, we need to keep them in the memory, because we may read some data from the disk that could have older timestamp, and we need to detect this.

Related

Performance while writing events into Cassandra table

Query 1: Event data from device is stored in Cassandra table. Obviously this is time series data. If we need to store how older dated events (if cached in device due to some issue) at current time, are we going to get performance issue? If yes, what is the solution to avoid that?
Query 2: Is it good practice to write the event into Cassandra table as soon as the event comes in? Or shall we queue it for sometime to write multiple events in one go if that improves Cassandra write performance significantly?
Q1: this all depends on the table design. Usually this shouldn't be an issue, but this may depend on your access patterns & compaction strategy. If you have table structure, please share it.
Q2: Individual writes shouldn't be a problem, but it really depends on your requirements for throughput. If you'll write several data points that belong to the same partition key you potentially may use unlogged batches, and in this case Cassandra will perform only one write for several inserts that are in this batch. Please read this document.

Performance - TTL vs Deleting a row in Cassandra

We have a massive set of data that is written in to millions of rows in cassandra. We also have a scheduler that needs to process these records and remove them after processing them successfully.
Was wondering if Deleting the row after processing vs Marking a row with a TTL (essentially delaying its deletion).
Are there any pros / cons with Deletion vs TTL w.r.t Cassandra performance ?.
Thanks much
_DD
When using TTL the record is not removed from storage immediately, it is marked as tombstone. It gets physically removed only when the compaction occurs. Till that time the data impacts the nodes processing as it consumes the resources till the compaction happens. When you do a range query event the deleted(marked as tombstone) records are scanned by Cassandra. So using TTL to delete too many entries is considered as anti-pattern. The recommendation is to use temporary tables so that individual rows need not be removed. Just drop the entire table.
From what little information you have given here it sounds to me that you are using Cassandra as a queue which is a well known anti-pattern. You can read more about that here:
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
However to answer your basic question there is little difference in performance between using TTL and deletes. TTL's in C* are handled as tombstones which is the same as a delete. The major difference is that a tombstone is not written to a record who's TTL has expired until that record is read again. When a delete is called a tombstone is immediately created. Tombstones in general cause significant performance problems within C* and while there are some methods to mitigate the issues that they create having large numbers of them usually point to a poor data model or poor use case for C*. If you are really looking at using C* as a queue why not look at using something more fit for that purpose such as Redis?
Based on what I've read, TTL will probably be as fast as your fastest delete process could be. The reason for this is that TTL doesn't have to seek the data in order to mark it with a tombstone. The TTL lives on the record and when the record is read and the TTL has expired, then it is marked with a tombstone.
http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html

How to convert data in cassandra commitlogs to readable format

Is it possible to see data in commit-log, if so how can we convert this to readable form which we can interpret.
Commit log files, these are encrypted files maintained internally by Cassandra, so you won't be able to access them.
Uses of Commit log:
If Cassandra was writing these SSTables on every update it would be completely IO bound and very slow.
So Cassandra uses a few tricks to get better performance. Instead of writing SSTables to disk on every column update, it keeps the updates in memory and flushes those changes to disk periodically to keep the IO to a reasonable level.

Cassandra: Storing and retrieving large sized values (50MB to 100 MB)

I want to store and retrieve values from Cassandra which ranges from 50MB to 100MB.
As per documentation, Cassandra works well when the column value size is less than 10MB. Refer here
My table is as below. Is there a different approach to this ?
CREATE TABLE analysis (
prod_id text,
analyzed_time timestamp,
analysis text,
PRIMARY KEY (slno, analyzed_time)
) WITH CLUSTERING ORDER BY (analyzed_time DESC)
As for my own experience, although in theory Cassandra can handle large blobs, in practise it may be really painful. As for one of my past projects, we stored protobuf blobs in C* ranged from 3kb to 100kb, but there were some (~0.001%) of them with size up to 150mb. This caused problems:
Write timeouts. By default C* has 10s write timeout which is really not enough for large blobs.
Read timeouts. The same issue with read timeout, read repair, hinted handoff timeouts and so on. You have to debug all these possible failures and raise all these timeouts. C* has to read the whole heavy row to RAM from disk which is slow.
I personally suggest not to use C* for large blobs as it's not very effective. There are alternatives:
Distributed filesystems like HDFS. Store an URL of the file in C* and file contents in HDFS.
DSE (Commercial C* distro) has it's own distributed FS called CFS on top of C* which can handle large files well.
Rethink your schema in a way to have much lighter rows. But it really depends of your current task (and there's not enough information in original question about it)
Large values can be problematic, as the coordinator needs to buffer each row on heap before returning them to a client to answer a query. There's no way to stream the analysis_text value.
Internally Cassandra is also not optimized to handle such use case very well and you'll have to tweak a lot of settings to avoid problems such as described by shutty.

Storing binary blobs in Cassandra

I am building a simple HTTP service, that stores arbitrary binary objects. The service is backed by Cassandra. It is a simplified version of Amazon's S3. The system must withstand a heavy write load and should be highly available on the write and read path.
The stored data is kind of immutable. It can be deleted, but it cannot be updated. Therefore, data inconsistency is not an issue. The datastore must be able to efficiently expire old data.
The service uses Netflix's Astyanax library, which provides a recipe for storing (large) binary objects in Cassandra.
I see two solution to tackle the problem, which both have pros and cons. For me it is hard to estimate, which way fits Cassandra better.
Single table with TTL
Astyanax automatically chunks large objects into small pieces and stores them into a single table. A TTL is assigned to each blob to expire it after a certain period of time. A compaction run removes blobs, when the TTL is expired.
This solutions works and is pretty straight forward to implement. I started using the SizeTieredCompactionStrategy, but I think, that DateTieredCompactionStrategy might be the better choice, when dealing with TTL data.
My main concern is: can Cassandra's compaction keep up? Has anyone experience with a similar use case?
Sharding data by time
Another approach would be to shard the data by time. I could create a table for each day and store the chunks in that table. In this case I can drop the complete table to get rid of the expired data.
This solution requires a little more effort in the implementation, but simplifies and probably speeds up the deletion of expired data.
How performant is Cassandra in dropping a table?
Correct option for your scenario is DateTieredCompactionStrategy and Assign TTL to each blob.
Refer:
http://www.datastax.com/dev/blog/datetieredcompactionstrategy

Resources