Cassandra SSTable and Memory mapped files - cassandra

In this article Reading and Writing from SSTable Perspective(yeah, quite old article) author says that indexdb and sstable files are warmed up using memory mapped files.
Row keys for each SSTable are stored in separate file called index.db,
during start Cassandra “goes over those files”, in order to warm up.
Cassandra uses memory mapped files, so there is hope, that when
reading files during startup, then first access on those files will be
served from memory.
I seee the usage of MappedByteBuffer in CommitLogSegment, but not for SSTable Loader/Reader. Also just mapping MappedByteBuffer to the file channel doesn't load the file into memory, I think load need to be called explicitly.
So my question is: when Cassandra starts up, how does it warm up? And am I missing something in this article's statement?

'going over index files' most probably refers to index sampling. At some point Cassandra was reading the files on startup for the sampling purposes.
Since Cassandra 1.2 results of that process are now being persisted in Partition summary file.

Related

Getting database for Cassandra or building one from scratch?

So, I'm new to Cassandra and I was wondering what the best approach would be to learn Cassandra.
Should I first focus on the design of a database and build one from scratch?
And as I was reading that Cassandra is great for writing. How can one observe that? Is there open source data that one can use? (I didn't really know where to look.)
A good point getting started with Cassandra are the free online courses from DataStax (an enterprise grade Cassandra distribution): https://academy.datastax.com/courses
And for Cassandra beeing good at writing data - have a look here: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlHowDataWritten.html
The write path comes down to these points:
write the data into the commitlog (append only sequentially, no random io - therefore should be on its own disk to prevent head movements, with ssd no issue)
write the data into memtables (kept in memory - very fast)
So in terms of disk, a write is a simple append to the commitlog in the first place. No data is directly written to the sstables (it's in the commitlog and memtable, which becomes flushed to disk at times as sstables), updates are not changing an sstable on disk (sstables are immutable, an update is written separately with a new timestamp), a delete does not remove data from sstables (sstables are immutable - instead a tombstone is written).
All updates and deletes produce new entries in memtable and sstables, to remove deleted data and to get rid of old versions of data from updates sstables on disk are compacted from time to time into a new one.
Also read about the different compaction strategies (can help you provide good performance), replication factor (how many copies of your data the cluster should keep) and consistency levels (how Cassandra should determine when a write or read is successful, hint: ALL is almost wrong all the time, look for QUORUM).

How to convert data in cassandra commitlogs to readable format

Is it possible to see data in commit-log, if so how can we convert this to readable form which we can interpret.
Commit log files, these are encrypted files maintained internally by Cassandra, so you won't be able to access them.
Uses of Commit log:
If Cassandra was writing these SSTables on every update it would be completely IO bound and very slow.
So Cassandra uses a few tricks to get better performance. Instead of writing SSTables to disk on every column update, it keeps the updates in memory and flushes those changes to disk periodically to keep the IO to a reasonable level.

Freeing disk space of overwritten data?

I have a table whose rows get overwritten frequently using the regular INSERT statements. This table holds ~50GB data, and the majority of it is overwritten daily.
However, according to OpsCenter, disk usage keeps going up and is not freed.
I have validated that rows are being overwritten and not simply being appended to the table. But they're apparently still taking up space on disk.
How can I free disk space?
Under the covers the way Cassandra during these writes is that a new row is being appended to the SSTable with a newer time stamp. When you perform a read the newest row (based on time stamp) is being returned to you as the row. However this also means that you are using twice the disk space to accomplish this. It is not until Cassandra runs a compaction operation that the older rows will be removed and the disk space recovered. Here is some information on how Cassandra writes to disk which explains the process:
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_write_path_c.html?scroll=concept_ds_wt3_32w_zj__dml-compaction
A compaction is done on a node by node basis and is a very disk intensive operation which may effect the performance of your cluster during the time it is running. You can run a manual compaction using the nodetool compact command:
https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCompact.html
As Aaron mentioned in his comment above overwriting all the data in your cluster daily is not really the best use case for Cassandra because of issues such as this one.

Cassandra: Storing and retrieving large sized values (50MB to 100 MB)

I want to store and retrieve values from Cassandra which ranges from 50MB to 100MB.
As per documentation, Cassandra works well when the column value size is less than 10MB. Refer here
My table is as below. Is there a different approach to this ?
CREATE TABLE analysis (
prod_id text,
analyzed_time timestamp,
analysis text,
PRIMARY KEY (slno, analyzed_time)
) WITH CLUSTERING ORDER BY (analyzed_time DESC)
As for my own experience, although in theory Cassandra can handle large blobs, in practise it may be really painful. As for one of my past projects, we stored protobuf blobs in C* ranged from 3kb to 100kb, but there were some (~0.001%) of them with size up to 150mb. This caused problems:
Write timeouts. By default C* has 10s write timeout which is really not enough for large blobs.
Read timeouts. The same issue with read timeout, read repair, hinted handoff timeouts and so on. You have to debug all these possible failures and raise all these timeouts. C* has to read the whole heavy row to RAM from disk which is slow.
I personally suggest not to use C* for large blobs as it's not very effective. There are alternatives:
Distributed filesystems like HDFS. Store an URL of the file in C* and file contents in HDFS.
DSE (Commercial C* distro) has it's own distributed FS called CFS on top of C* which can handle large files well.
Rethink your schema in a way to have much lighter rows. But it really depends of your current task (and there's not enough information in original question about it)
Large values can be problematic, as the coordinator needs to buffer each row on heap before returning them to a client to answer a query. There's no way to stream the analysis_text value.
Internally Cassandra is also not optimized to handle such use case very well and you'll have to tweak a lot of settings to avoid problems such as described by shutty.

Cassandra creates tens of thousands hd files for a column family

I have a column family with a lot of data. Tens of millions keys with small data items, and it's growing.
I've noticed cassandra created about 170k files called like this:
my_col_family-hd-702036-Data.db
my_col_family-hd-702036-Index.db
my_col_family-hd-702036-Digest.db
my_col_family-hd-702036-Statistics.db
my_col_family-hd-702036-Filter.db
They only differ by the number in the file name.
When I re-start cassandra it needs about an hour to get up, the log says:
INFO 09:26:34,649 Opening /var/lib/cassandra/data/foo/my_col_family-hd-805240 (5243383 bytes)
INFO 09:26:34,649 Opening /var/lib/cassandra/data/foo/my_col_family-hd-731915 (5242896 bytes)
INFO 09:26:34,714 Opening /var/lib/cassandra/data/foo/my_col_family-hd-797692 (5243454 bytes)
INFO 09:26:34,753 Opening /var/lib/cassandra/data/foo/my_col_family-hd-688013 (5243541 bytes)
It goes like this for about an hour until it reads all the 170k files
I wanted to ask if this is normal? Why does it create so many small files, 5 MB each and then read all of them on startup?
You have a lot of files because you are using an old version of Cassandra which uses a default file size of 5mb for Leveled compaction. Further testing has shown that ~160mb is a more optimal file size for this particular compaction strategy. I would recommend switching to the larger size asap.
https://issues.apache.org/jira/browse/CASSANDRA-5727
As to checking for all of them on startup, it isn't actually reading them all. Cassandra is just opening file handles so that it can access data from the files during reads from the database. This is necessary and normal.

Resources