How is data sorted in the Cassandra memtable in the absence of a clustering key? - cassandra

I am new to cassandra and was checking in how cassandra internals work. I checked this article and in this its stated that memtable is stored in sorted order.
But if there's no clustering key or multilple culstering keys, how cassandra store the data in that case in memtable? I want to know what is the criteria of sorting?

There are different ways where data is sorted when it comes to Cassandra.
The term "SSTable" stands for "sorted string table" meaning that the contents of a Cassandra data file are sorted. Data in memtables are sorted by the partition key so that they are already ordered when they are flushed to disk.
Additionally, it also makes it easy for Cassandra to determine whether a partition exists in an SSTable or not since it keeps metadata about the first and last partition key contained in the SSTable.
If the table has clustering columns, the rows are sorted based on the clustering order defined in the table schema. This is the only time where clustering keys are relevant for sorting. Cheers!
👉 Please support the Apache Cassandra community by hovering over the cassandra tag then click on the Watch tag button. 🙏 Thanks!

Related

How does Cassandra retrieve data from SStable and merage it to memetable? Will these data be flushed again?

When I request a row with whole primary key, will Cassandra fetch all rows of that partition from SSTable and merge them into memetable, then filter that requested row? or it can find that row with clustering keys and only retrieve one row to memetable?
How does SSTable store data(row by row or column by column, why some SSTables can only contain one column)? if I only request one column, could Cassandra find the location of that particular column and only return that column?
How does Cassandra deal with data that retrieved from SSTable when flush memetable to SSTable, will that data be write to a new SSTable again?
Thanks a lot for any answers.
You should take a look at datastaxacademy.com, specifically the course "DS201: DataStax Enterprise Foundations of Apache Cassandra™". The topics that you are asking for are "Read Path" and "Write path".

Cassandra simple primary key queries

We would like to create a Cassandra table with Simple Primary Key that is consisted of UUID column.
The table will look like:
CREATE TABLE simple_table(
id UUID PRIMARY KEY,
col1 text,
col2 text,
col3 UUID
);
This table will potentially store few billions of rows, and the rows should expire after some time (few months) using the TTL feature.
I have few questions regarding the efficiency of this table:
What is the efficiency of a query against this table using the primary key? Meaning, how Cassandra finds a specific row after resolving in which partition it resides?
Considering that the rows will expire and create many tombstones, how does this will effect the reads and writes to this table? Let's say that we expire the data after 180 days, if I am not mistaken, the ratio of tombstones would be 10/180~=0.056 (when 10 is the gc_grace_periods in days).
In your case, the primary key is equal to the partition key, so you have so-called "skinny" partitions, consisting of one row. If you remove data, then instead of data inside partition you'll have only tombstone, and it's not a problem. If the data is expired, then it will be simply removed during compaction - gc_grace_period isn't applied here - it's required only when you explicitly remove the data - we need to keep tombstone because other nodes may need to "catch up" with changes if they weren't able to receive delete operation. You can find more details about data deletion in following document.
Problem with tombstones arise when you have many (thousands) of rows inside the same partition, for example, if you use several clustering keys. And when such data is deleted, then the tombstone is generated, and should be skipped when we read data inside partition.
P.S. Have you seen this blog post that explains how deletions happen?
After reading the blog (and the comments) that #Alex referred me to, I concluded that tombstones are created for expired rows due to default_time_to_live of the table.
Those tombstones will be cleaned only after gc_grace_periods have passed. See this stack overflow question.
Regarding my first questions this datastax page describes it pretty well.

cassandra: `sstabledump` output questions

I'm inspecting the output of sstabledump, to gain better understanding of the cassandra data model, and I have some questions
From the output of of sstabledump it seems that
a table is a list of partitions (split by partition key)
a partition is a list of rows (split according to clustering key)
a row is a map of key-value pairs, where the keys belong in a predefined list
Question 1: For each partition, as well as for each row inside a partition, there is a position key. What does this value correspond to? Physical storage details? And how exactly?
Question 2: Each row inside each partition has a type: row key-value pair. Could this type be anything else? If yes, what? If not
why have a value that is always the same?
why is cassandra is classified as wide-column and other similar terms? Looks more like a two-level row storage.
Partition key is the murmur3 hash of whatever you assigned as the primary key. Consistent hashing is used with that hash to determine which node in the cluster that partition belongs to and its replicas. Within each partition data is sorted by clustering key, and then by cell name within the row. The structure is used so redundant things like timestamps if inserted for a row at once is only inserted once as a vint delta sequence from the partitions to save space.
On disk the partitions are sorted in order of this hashed key. The output of the position key is just referring to where in the sstable's data file its located (decompressed byte offset). type can also identify in that spot as a static block, which is located at the beginning of each partition for any static cells or a ranged tombstone marker (beginning or end). Note that values are sometimes for sstabledump repeated in json for readability even if not physically written on disk (ie repeated timestamps).
You can have many of these rows inside a partition, a common datamodel for time series for example is to use timestamp as the clustering key which makes very wide partitions with millions of rows. Pre 3.0 as well the data storage was closer to big table's design. It was essentially a Map<byte[], SortedMap<byte[], Cell>> where the Comparator of the sorted map was changed based on schema. It did not differentiate rows and columns within a partition, and it lead to massive amounts redundant data and was redesigned to fit the query language better.
Some more references:
Explanation of motivation of 3.0 change by DataStax here
Blog post by TLP has a good detailed explanation of the new disk format
CASSANDRA-8099

SparkSQL restrict queries by Cassandra partition key ranges

Imagine that my primary key is a timestamp.
I would like to restrict the query by timestamp ranges.
I don't seem to manage to make it work, even if I used token(). Also I can't create a secondary index on the partition key.
How should this be done?
Cassandra doesn't allow for range queries on partition key.
One way of dealing with this problem is changing your schema so that your timestamp value would be a clustering column. For this to work, you need to introduce a sentinel column as partition key. See this question for more detailed answers: Range Queries in Cassandra (CQL 3.0)
Another way is just to let Spark do the filtering. Range queries on primary key should work in Spark SQL. They would simply not be pushed down to Cassandra and Spark would fetch all data and filter them on the Spark side.

How to add the multiple column as a primary keys in cassandra?

I have an existing table with millions of records and initially we have two columns as partitioning key and clustering key and now I want add two more columns in a table as a partitioning key.
How?
If you make a change to the partition key you will need to create a new table and import the existing data. This is due to, in part, the fact that a partition key is not equal to a primary key in a relational database. The partition key is hashed by Cassandra and that hash is used to find partitions on disk. If you change the partition key you change the hash value and can no longer look up the partition!
CREATE TABLE KEYSPACE_NAME.AMAR_EXAMPLE (
COLUMN_1 TYPE,
COLUMN_2 TYPE,
COLUMN_3 TYPE,
...
COLUMN_N TYPE
// Here we declare the partition key columns and clustering columns
PRIMARY KEY ((COLUMN_1, COLUMN_2, COLUMN_3, COLUMN_4), CLUSTERING_COLUMN)
)
//If you need to change the default clustering order declare that here
WITH CLUSTERING ORDER BY (COLUMN_4 DESC);
You could export the data to CSV using COPY and then import the data to the new table via COPY or use the SSTABLELOADER. There is plenty of documentation and walkthroughs on how to use those tools. For example, this Datastax blog post talks about the changes made to the updated SSTABLELOADER. If you create a new table and import the existing data you will create new partitions and new hashes. Cassandra will not let you simply add additional columns to the partition key after the table has been created.
Understanding your data and the Cassandra data modeling techniques will help mitigate the amount of work you may find yourself doing changing partition keys. Check out the self-paced courses provided by Datastax. DS220: Data Modeling could really help.

Resources