How to convert data in cassandra commitlogs to readable format - cassandra

Is it possible to see data in commit-log, if so how can we convert this to readable form which we can interpret.

Commit log files, these are encrypted files maintained internally by Cassandra, so you won't be able to access them.
Uses of Commit log:
If Cassandra was writing these SSTables on every update it would be completely IO bound and very slow.
So Cassandra uses a few tricks to get better performance. Instead of writing SSTables to disk on every column update, it keeps the updates in memory and flushes those changes to disk periodically to keep the IO to a reasonable level.

Related

Will the Write-Ahead-Log become the bottleneck of Cassandra?

In a Cassandra database, a write needs to be logged in the Write Ahead Log first and then added to the memtable in memory. Since the Write Ahead Log is on disk, although it performs sequential writes(i.e., append only), will it still be much slower than memory access, thus become the performance bottleneck for the writes?
If I understand it correctly, Cassandra supports the mechanism to store the Write Ahead Log in OS cache, and then flush it to disk every pre-configured amount of time(say 10 seconds). However, does it mean the data changes made within this 10 seconds could be all lost if the machine crashes?
You can control if the sync of commit log using the commitlog-sync configuration. By default it's periodic, and synced to disk every 10 seconds (controlled by commitlog_sync_period_in_ms setting).
And yes, if you lose the power there is a risk that data in the commit log is lost. But Cassandra relies on the fact that you have multiple replicas, and if you did setup correctly, each replica should be in separate rack (at least, better if you have additional data centers) with separate power, etc.

Why do Tombstones affect read performance but not updates?

From the articles I read they say that tombstones affect read performance in Cassandra. I’m reading how data is updated in Cassandra and looks like data is written with a timestamp without modifying or reading the current data.
So when a read is performed before compaction is done, filtering needs to be done to take the latest value right? If that’s the case aren’t tombstones the same thing and why do they affect performance negatively but not updates to a row?
In Cassandra, update is a mutation, like, insert and delete, and except the use case of LWTs and some of the list operations, all mutations are just append to the memtable/commit log, without reading the data on the disk. So they are very fast - no checks are performed.
Read operation, in contrast to that, need to get all versions of the data from the disk/memtable, and then create an actual version of the data based on the timestamps. And for tombstones, we need to keep them in the memory, because we may read some data from the disk that could have older timestamp, and we need to detect this.

Getting database for Cassandra or building one from scratch?

So, I'm new to Cassandra and I was wondering what the best approach would be to learn Cassandra.
Should I first focus on the design of a database and build one from scratch?
And as I was reading that Cassandra is great for writing. How can one observe that? Is there open source data that one can use? (I didn't really know where to look.)
A good point getting started with Cassandra are the free online courses from DataStax (an enterprise grade Cassandra distribution): https://academy.datastax.com/courses
And for Cassandra beeing good at writing data - have a look here: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlHowDataWritten.html
The write path comes down to these points:
write the data into the commitlog (append only sequentially, no random io - therefore should be on its own disk to prevent head movements, with ssd no issue)
write the data into memtables (kept in memory - very fast)
So in terms of disk, a write is a simple append to the commitlog in the first place. No data is directly written to the sstables (it's in the commitlog and memtable, which becomes flushed to disk at times as sstables), updates are not changing an sstable on disk (sstables are immutable, an update is written separately with a new timestamp), a delete does not remove data from sstables (sstables are immutable - instead a tombstone is written).
All updates and deletes produce new entries in memtable and sstables, to remove deleted data and to get rid of old versions of data from updates sstables on disk are compacted from time to time into a new one.
Also read about the different compaction strategies (can help you provide good performance), replication factor (how many copies of your data the cluster should keep) and consistency levels (how Cassandra should determine when a write or read is successful, hint: ALL is almost wrong all the time, look for QUORUM).

Cassandra commit log clarification

I have read over several documents regarding the Cassandra commit log and, to me, there is conflicting information regarding this "structure(s)". The diagram shows that when a write occurs, Cassandra writes to the memtable and commit log. The confusing part is where this commit log resides.
The diagram that I've seen over-and-over shows the commit log on disk. However, if you do some more reading, they also talk about a commit log buffer in memory - and that piece of memory is flushed to disk every 10 seconds.
DataStax Documentation states:
"When a write occurs, Cassandra stores the data in a memory structure called memtable, and to provide configurable durability, it also appends writes to the commit log buffer in memory. This buffer is flushed to disk every 10 seconds".
Nowhere in their diagram do they show a memory structure called a commit log buffer. They only show the commit log residing on disk.
It also states:
"When a write occurs, Cassandra stores the data in a structure in memory, the memtable, and also appends writes to the commit log on disk."
So I'm confused by the above. Is it written to the commit log memory buffer, which is eventually flushed to disk (which I would assume is also called the "commit log"), or is it written to the memtable and commit log on disk?
Apache's documentation states this:
"Instead, like other modern systems, Cassandra provides durability by appending writes to a commitlog first. This means that only the commitlog needs to be fsync'd, which, if the commitlog is on its own volume, obviates the need for seeking since the commitlog is append-only. Implementation details are in ArchitectureCommitLog.
Cassandra's default configuration sets the commitlog_sync mode to periodic, causing the commitlog to be synced every commitlog_sync_period_in_ms milliseconds, so you can potentially lose up to that much data if all replicas crash within that window of time."
What I have inferred from the Apache statement is that ONLY because of the asynchronous nature of writes (acknowledgement of a cache write) could you lose data (it even states you can lose data if all replicas crash before it is flushed/sync'd).
I'm not sure what I can infer from the DataStax documentation and diagram as they've mentioned two different statements regarding the commit log - one in memory, one on disk.
Can anyone clarify, what I consider, a poorly worded and conflicting set of documentation?
I'll assume there is a commit log buffer, as they both reference it (yet DataStax doesn't show it in the diagram). How and when this is managed, I think, is a key to understand.
Generally when explaining the write path, the commit log is characterized as a file - and it's true the commit log is the on-disk storage mechanism that provides durability. The confusion is introduced when going deeper and the part about buffer cache and having to issue fsyncs is introduced. The reference to "commit log buffer in memory" is talking about OS buffer cache, not a memory structure in Cassandra. You can see in the code that there's not a separate in-memory structure for the commit log, but rather the mutation is serialized and written to a file-backed buffer.
Cassandra comes with two strategies for managing fsync on the commit log.
commitlog_sync
(Default: periodic) The method that Cassandra uses to acknowledge writes in milliseconds:
periodic: (Default: 10000 milliseconds [10 seconds])
Used with commitlog_sync_period_in_ms to control how often the commit log is synchronized to disk. Periodic syncs are acknowledged immediately.
batch: (Default: disabled)note
Used with commitlog_sync_batch_window_in_ms (Default: 2 ms) to control how long Cassandra waits for other writes before performing a sync. When using this method, writes are not acknowledged until fsynced to disk.
The periodic offers better performance at the cost of a small increase in the chance that data can be lost. The batch setting guarantees durability at the cost of latency.

What makes CommitLog faster than writing to SSTable in Cassandra ?

I am currently exploring Cassandra in Depth as I am willing to specialize in it. I came across Cassandra "write path" and now trying to understand the Commit Logs. As I understand the write is acknowledged when it is written to the Commit Log, first, then to MemTable ( An in memory table ). But, if commit logs are written to the FILE SYSTEM, so as SSTables. What is the magical thing that makes writing to commit logs faster or as it is stated in many posts and documentations
A write is said to successful once it is written to the commit log and
memory, so there is very minimal disk I/O at the time of write
Why it is not written to SSTable and MemTable to be considered successful ?
SSTables are immutable, so appending to them would be impossible. Therefore writes are sent to both a memtable and the commit log (for durability). Under normal operations the memtable is periodically flushed to disk as an SSTable, after which it is compacted with existing SSTables to make reads more efficient. The commit log is only replayed on node restart to recover writes that had not been flushed to SSTables.
SSTables are created based on flushed memtables. While the commit log updates do happend periodically, the memtable flushing does not. That is because a memtable first needs to hit a certain treshold (ie. size) before getting written to disk. This makes sure that the created sstable will be large enough to be handled efficiently. In case memtables would be flushed periodically a couple of times a minute, we potentially end up with lots of tiny sstables that would have to be compacted again.
Writing to Cassandra is so fast because writing to a log is already very fast, you are also adding to an in memory datastructure like a b tree or an avl tree which is referred to as a memtable. Memtables are sorted and when they get written to disk, SStables also remain sorted and thus making reading very efficient but not as fast as writing.
The point to note is that clients never touch the commit log. It's only purpose is for creating a backup. If your machine dies then all your data in the memtable is lost. So the machine then uses the commit log to replay back the memtable.
You want your reads to be fast and this is only possible by putting all the data sequentially which also makes it easier to cache data. If you were to write to SStable on every write disk, either you would have to do random reads making reads slow, or you will have to wait for the disk to rotate so that you do sequential writes.

Resources