Cassandra update process clarify

Cassandra update process clarify - cassandra

When we execute a read query it will retrieve data from SStable if it is true.
then, What will happen if I retrieve the last updated data before compaction happens?
in other words what Cassandra does to retrieve data which in memtable but not in SStable?

Here is a diagram of the Cassandra read path.
There are processes which check both RAM and on-disk for the requested data. When data is found from multiple paths, it is reconciled (to ensure most-recent timestamp) and returned.
So to answer your question, when requested data is not present in a SSTable, the results from a memtable can be returned.

Related

How Cassandra read works with single column datamodel partition in multiple SSTables?

We use a very simple key-value datamodel in Cassandra, and our partition key is in 17 SSTables. I would like to understand how read works in our concrete case.
If I undestand correctly, general Cassandra reads will need to search for the newest version of each column in the memtable and in different SSTables, until it retrieves all columns and merges them.
Since SSTables are sorted by time, and our data-model is single-column, Ideally our read operations should just hit the newest SSTable containing our partition key since this will contain the whole data.
Will our read operations hit the 17 SSTables? or just the newest one containing the searched partition key?

Cassandra will search all of them as it isn't sure which columns exist where (DML occurs at the cell level and because of that, variants can exist where reconciliation is performed). Reads are done at the partition level. However, Cassandra can filter out sstables if it knows the partition key doesn't exist in certain ones. That's why compaction is important for optimal reads - to remove the unnecessary cells.

Will our read operations hit the 17 SSTables? or just the newest one containing the searched partition key?
To add to Jim's answer, Cassandra has something called a bloom filter for this. Essentially, it's a probabilistic structure that can tell you one of two things:
The SSTable might contain the data requested.
OR
The SSTable definitely does not contain the data requested.
This should prevent Cassandra from having to scan all 17 SSTables. My advice would be to run a query with TRACING ON in cqlsh, and it'll tell you just how many SSTables it needed to look through.

Cassandra Load status does not update (nodetool status)

Using the nodetool status I can read out the Load of each node. Adding or removing data from the table should have direct impact on that value. However, the value remains the same, no matter how many times the nodetool status command is executed.
Cassandra documentation states that the Load value takes 90 seconds to update. Even allowing several minutes between running the command, the result is always wrong. The only way I was able to make this value update, was to restart the node.
I don't believe it is relevant, but I should add that I am using docker containers to create the cluster.

In the documentation that you linked, under Load it also says
Because all SSTable data files are included, any data that is not
cleaned up, such as TTL-expired cell or tombstoned data is counted.
It's important to note that when Cassandra deletes data, the data is marked with a tombstone and doesn't actually get removed until compaction. Thus, the load doesn't decrease immediately. You can force a major compaction with nodetool compact.
You can also try flushing memtable if data is being added. Apache notes that
Cassandra writes are first written to the CommitLog, and then to a
per-ColumnFamily structure called a Memtable. When a Memtable is full,
it is written to disk as an SSTable.
So you either need to add more data until the memtable is full, or you can run a nodetool flush (documented here) to force it.

Is update in place possible in Cassandra?

I have a table in Cassandra where I populate some rows with 1000s of entries (each row is with 10000+ columns). The entries in the rows are very frequently updated, basically just a field (which is an integer) is updated with different values. All other values for the columns remains unmodified. My question is, will the updates be done in-place ? How good is Cassandra for frequent update of entries ?

First of all every update is also a sequential write for cassandra so, as far as cassandra goes it does not make any difference to cassandra whether you update or write.
The real question is how fast do you need to read those writes to be available for reading? As #john suggested, first all the writes are written to a mutable CQL Memtable which resides in memory. So, every update is essentially appended as a new sequential entry to memtable for a particular CQL table. It is concurrently periodically also written to `commitlog' (every 10 seconds) for durability.
When Memtable is full or total size for comittlog is reached, cassandra flushes all the data to immutable Sorted String Table (SSTable). After the flush, compaction is the procedure where all the PK entries for the new column values are kept and all the previous values (before update) are removed.
With flushing frequently comes the overhead on frequent sequential writes to disk and compaction which could take lot of I/O and have a serious impact on cassandra performance.
As far as read goes, first cassandra will try to read from row cache (if its enabled) or from memtable. If it fails there it will go to bloom filter, key cache, partition summary, partition index and finally to SSTable in that order. When the data is collected for all the column values, its aggregate in memory and the column values with latest timestamp are returned to client after aggregation and an entry is made in row cache for that partition key`.
So, yes when you query a partition key, it will scan across all the SSTable for that particular CQL table and the memtable for all the column values that are not being flushed to disk yet.

Initially these updates are stored in an in-memory data structure called Memtable. Memtables are flushed to immutable SSTables at regular intervals.
So a single wide row will be read from various SSTables. It is during a process called 'compacation' the different SSTables will be merged into a bigger SSTable on the disk.
Increasing thresholds for flushing Memtables is one way of optimization. If updates are coming very fast before Memtable is flushed to disk, i think that update should be in-place in memory, not sure though.
Also each read operation checks Memtables first, if data is still there, it will be simply returned – this is the fastest possible access.
Cassandra read path:
When a read request for a row comes in to a node, the row must be combined from all SSTables on that node that contain columns from the row in question
Cassandra write path:

No, in place updates are not possible.
As #john suggested, if you have frequent writes then you should delay the flush process. During the flush, the multiple writes to the same partition that are stored in the MemTable will be written as a single partition in the newly created SSTable.
C* is fine for heavy writes. However, you'll need to monitor the number of SSTables accessed per read. If the # is too high, then you'll need to review your compaction strategy.

Retrieving "tombstoned" records in Cassandra

My question is very simple. Is it in any way possible to retrieve columns that have been marked tombstone before the GCGraceSeconds period expiry(default 10 days). If yes what would be the exact CQL query for that?
If I were to understand the deletion process the tombstones are marked on the MemTables and the SSTable being immutable waiting for compaction still has the the deleted data waiting for compaction. So before compaction occurs is there any way to read the tombstoned data from either the Memtable or SSTable?
Using CQL 3.0 on CQLSH command prompt & Cassandra 2.0.

You are right, when a tombstone is inserted it usually doesn't immediately delete the underlying data (unless all your data is in a memtable). However, you can't control when it does. If you don't have much data and compaction happens quickly, the underlying data may be deleted very quickly, much sooner than 10 days.
There is no query to read deleted data, but you can inspect all your SSTables with sstable2json to see if they contain the deleted data.

Just to add on to the previous comment. Have a low value of gc_grace_seconds for the column families that have frequent deletions. It will take some time for gc but tombstones are expected to get cleared .

cassandra's write operations sequencial. why it does not have issue?

This is as per the official documentation.
All writes are sequential, which is the primary reason that writes perform so well in
Cassandra. No reads or seeks of any kind are required for writing a value to Cassandra
because all writes are append operations.
am confused because in case there is insert operation and duplicate primary key case, cassandra will require the search first from memtable or in case if data is flushed to sstable.
so if user id with value 123 is already present and we are inserting row with 123, it fails because internally it does read based on that key. this is the doubt i have if someone can clarify pls.

There is no notion of duplicate keys in Cassandra. Every change written to Cassandra has a timestamp and Cassandra does timestamp resolution meaning the data with the latest timestamp always wins and returned. In read path, the content of the key from SSTable is merged with the content of the same key in memtable if exists and the data with latest timestamp is returned. It is worth nothing that each column has a timestamp.
In example:
Let's assume at time 139106495223456 you write the following:
123 => {column1:foo column1_timstamp:139106495223456}
Then after few microseconds (139106495223470) you write to the same key:
123 => {column1:bar column1_timstamp:139106495223470}
Both operations will succeed. When you try to read the kay the one with column1:bar is returned because it has the latest timestamp.
Now you may wonder how this works with deletes. Deletes are written the same way except the column/key which is being deleted will be marked with tombstone. If the tombstone has laster timestamp, the row or column will be considered deleted.
You may wonder how this plays with sequential writes to disk as these tombstones or old columns will consume space. It is true. That is why compaction exists and it takes care of compacting and removing expired tomstones.
You can read more about Cassandra write/read path here:
http://www.planetcassandra.org/blog/category/Cassandra%20read%20path
http://www.datastax.com/docs/1.1/dml/about_writes

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string