Are sorted columns in Cassandra using just one set of nodes? (one set = repeat factor) - cassandra

Using older versions of Cassandra, we were expected to create our own sorted rows using a special row of columns, because columns are saved sorted in Cassandra.
Is Cassandra 3.0 with CQL using the same concept when you create a PRIMARY KEY?
Say, for example, that I create a table like so:
CREATE TABLE my_table (
created_on timestamp,
...,
PRIMARY KEY (created_on)
);
Then I add various entries like so:
INSERT INTO my_table (created_on, ...) VALUES (1, ...);
...
INSERT INTO my_table (created_on, ...) VALUES (9, ...);
How does Cassandra manage the sort on the PRIMARY KEY? Will that happens on all nodes, or only one set (what I call a set is the number of replicates, so if you have a cluster of 100 nodes with a replication factor of 4, would the primary key appear on 100 nodes, 25, or just 4? With older versions, it would only be on 4 nodes.)

In your case the primary key is the partition key, which used to be the row key. Which means the data your are inserting will be present on 4 out of 100 nodes if the replication factor is set to 4.
In CQL you can add more columns to the primary key, which are called clustering keys. When querying C* with CQL the result set might contain more than one row for a partition key. Those rows are logical and are stored in the partition of which they share the partition key (but vary in their clustering key values). The data in those logical rows is replicated as the partition is.
Have a look at the example for possible primary keys in the official documentation of the CREATE TABLE statement.
EDIT (row sorting):
C* keeps the partitions of a table in the order of their partition key values' hash code. The ordering is therefor not straight forward and results for range queries by partition key values are not what you would expect them to be. But as partitions are in fact ordered you still can do server side pagination with the help of the token function.
That said, you could employ the ByteOrderedPartitioner to achieve lexical ordering of your partitions. But it is very easy to create hotspots with that partitioner and it is generally discouraged to use it.
The rows of a given partition are ordered by the actual values of their clustering keys. Range queries on those behave as you'd expect them to.

Related

How does Cassandra Partitioning actually work?

I understand that two tables with same partition columns and values have same token generated. Does that mean that all the cells of this partition in both tables are actually in the same partition ? How does Cassandra store data internally ?
Eg:
Create table table1 (emp_id int PRIMARY KEY, name text, role text);
Create table table2 (emp_id int PRIMARY KEY, name text, role text);​
​​
​​INSERT INTO table1(emp_id, name, role) VALUES (1, 'sahil', 'MTS');
​​INSERT INTO table2(emp_id, name, role) VALUES (1, 'sahil', 'MTS');
SELECT token(emp_id) from table1 where token(emp_id) = token(11596);
system.token(emp_id)
----------------------
**7447223576279188802**​
SELECT token(emp_id) from table2 where token(emp_id) = token(1);
system.token(emp_id)
----------------------
**7447223576279188802**
​​
For your example, because both tables have the same partition key, then when identical values are inserted, they will be mapped to the same token. It is on insert that the hash function to the PK is applied to determine what replica will get the data. If you use the Murmur3 partitioner (which is used by default) then you get a consistent token value, i.e. using the same PK and PK value, the result is the same. You can reference this page for understanding:
https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/architecture/archDataDistributeHashing.html
Rows (items of data) that have the same table and the same partition key are said to be in the same partition. The most important consequence of being in the same partition is that data in the same partition is guaranteed to be co-located - handled by the same replica nodes and in ScyllaDB, even by the same CPU. This allows efficiently scanning a partition: All the partition's data can be read from a single node and Cassandra doesn't to go back and forth between replicas to read the various pieces of the partition and combine them. This is also what allows a node that handles the partition's full data to maintain it sorted by the clustering key: A process called compaction is merging different pieces of a sorted partition (these are sstables, or sorted string tables) into a bigger sorted partition.
When you have two different tables in the same keyspace, and use the same partition key in both, they are not stored physically on disk together - because each table has its own set of sstables (files on disk), so in that sense they are not "in the same partition". However, the co-location property which I mentioned earlier still holds (if the two tables are in the same keyspace): Two identically-keyed partitions in the two tables will be stored on exactly the same node. Why is this important/useful? Usually it isn't. One place where this knowledge can become useful is that it can be used in some situations to achieve atomic batch write to both tables at once, utilizing the fact that all replicas will see both writes together, whereas usually two writes to two tables go to different nodes at different times.

Primary Key in Cassandra

I have the following scenario;
My data has the id field, and this field constantly increasing.
When event created, id is assigned = 1 automatically.
Then 2, 3, 4 and so on.
When data that has the id = 1 is generated, then it will never be generated again.
I want to store this dat ain Cassandra. I can set primary key as the id field, but i dont know how cassandra will create partitions for each record?
Will it create one partition for each record?
Or will it create range partition by primary key. For example; id from 1 to 100 is the first partition, 100-200 is the second partition etc.
In Cassandra, the partition key uniquely identifies a single partition (record) in the table. For clarification, the primary key:
must have 1 partition key
zero or more clustering columns
So the primary key doesn't equate to a range of partitions.
Compared to traditional RDBMS which have two-dimensional tables, Cassandra tables have the traditional 2D tables but can also be 3D or more. The power of Cassandra is that tables can be multi-dimensional meaning each partition can have one or more rows (it can have thousands).
If you're interested, I've explained this in a bit more detail with examples in this post -- https://community.datastax.com/questions/6171/. Cheers!

Cassandra: Is partition key also used in clustering?

Let's say I have a primary key like this: primary key (PK, CK).
Based on what I read (see refs), I think I can loosely describe the way Cassandra uses PK and CK as follows - PK will be used to decide which node(s) the data should go to and CK will be used for clustering (aka ordering) of data within that node.
Then, it seems PK is not used in clustering data within the node and that sounds wrong. What if I have a simple primary with with just PK? Will Cassandra only distribute data across nodes and not order data within each node since there is no clustering column?
refs:
https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html
Difference between partition key, composite key and clustering key in Cassandra?
Then, it seems PK is not used in clustering data within the node and
that sounds wrong. What if I have a simple primary with with just PK?
Will Cassandra only distribute data across nodes and not order data
within each node since there is no clustering column?
Good question. Let's try this out. I'll create a simple table and INSERT some data:
aploetz#cqlsh:stackoverflow> CREATE TABLE programs
(name text PRIMARY KEY, data text);
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Tron');
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Yori');
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Quorra');
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Clu');
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Flynn');
aploetz#cqlsh:stackoverflow> INSERT INTO programs (name) VALUES ('Zuze');
Now, let's run a query that should answer your question:
aploetz#cqlsh:stackoverflow> SELECT name, token(name) FROM programs;
name | system.token(name)
--------+----------------------
Flynn | -1059892732813900311
Zuze | 1815531347795840810
Yori | 2854211700591734382
Quorra | 3079126743186967718
Tron | 6359222509420865788
Clu | 8304850648940574176
(6 rows)
As you can see, they are definitely not in order by name, which is the partition key and lone PRIMARY KEY. But, my query runs the token() function on name, which shows the hashed value of the partition key (name in this case). The results are ordered by that.
So to answer your question, Cassandra orders its partitions by the hashed value of the partition key. Note that this order is maintained throughout the cluster, not just on a single node. Therefore, results for an unbound query (not recommended to be run in a multi-node configuration) will be ordered by the hashed value of the partition key, regardless of the number of nodes in the cluster.
Since all data for a table will be written to the same SSTables with a ordering of the partition key. So yes they are sorted.
I think what you're asking is why you can't use a primary key the same way you use a clustering key. For example you can't do less than (<) or greater than (>) on a partition key. Since one node doesn't have all the partition keys this type of query would have to check with all nodes in your cluster to see if they have any partition key that matches your query.

Does the same partition key in different cassandra tables add up to cell theoretical limit?

It is known that a Cassandra partition has a theoretical limit of 2 billion cells. But how does that work in a situation like this one below:
create table table1 (
some_id int PRIMARY KEY,
some_name text
);
create table table2 (
other_id int PRIMARY KEY,
other_name text
);
Assume we have 1 billion cells in partition (some_id = 1) on table1.
If we had another 1 billion cells in partition (other_id = 1) on table2, would those add up to the 2 billion theoretical limit?
In other words, are equal partition keys in different tables stored together?
Different tables have different partitions. This makes the structure of any particular partition homogenous (it will always follow the proscribed schema of a single table) which allows for optimizations.
If you look at the storage engine under the hood you'll see that every table even has it's own directory structure making it clear that a partition from one table will never interact with the partition of another. (see /var/lib/cassandra/)

What is the difference between a clustering column and secondary index in cassandra

I'm trying to understand the difference between these two and the scenarios in which you would prefer to use one over the other.
My specific use case is using cassandra as an event ingestion system backed by an analytics engine that interprets the event.
My model includes
event id (the partition key)
event time (a clustering column)
event type (i'm not sure whether to use clustering column or secondary index)
I figure the most common read scenario will be to get the events over a time range hence event time is the clustering column. A less frequent read scenario might involve further filtering the event query by event type.
A secondary index is pretty similar to what we know from regular relational databases. If you have a query with a where clause that uses column values that are not part of the primary key, lookup would be slow because a full row search has to be performed. Secondary indexes make it possible to service such queries efficiently. Secondary indexes are stored as extra tables, and just store extra data to make it easy to find your way in the main table.
So that's a good ol' index, which we already know about. So far, there's nothing new to cassandra and its distributed nature.
Partitioning and clustering is all about deciding how rows from the main table are spread among the nodes. This is unique to cassandara since it determines the distribution of data. So, the primary key consists of at least one column. The first column in the primary key is used as the partition key. The partition key is used to decide which node to store a row. If the primary key has additional columns, the columns are used to cluster the data on a given node - the data is stored in lexicographic order on a node by clustering columns.
This question has more specifics on clustering columns: Clustering Keys in Cassandra
So an index on a given column X makes the lookup X --> primary key efficient. The partition key (first column in the primary key) determines which node a row is stored on. Clustering columns (additional columns in the primary key) determine which order rows are stored in on their assigned node.
So your intuition sounds about right - the event ID is presumably guaranteed unique, so is great for building a primary key. Event time is a great way to order rows on disk on a given node.
If you never needed to lookup data by event type, eg, never had a query like SELECT * FROM Events WHERE Type = Warning, then you have no need for your additional indexes, but your demands for partitioning don't change. Indexes make it easy to serve queries with different predicates. Since you mentioned that you indeed were planning on performing queries like that, you do in fact likely want an index on your EventType column.
Check out the cassandra documentation: http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_compound_keys_c.html
Cassandra uses the first column name in the primary key definition as the partition key.
...
In the case of the playlists table, the song_order is the clustering column. The data for each partition is clustered by the remaining column or columns of the primary key definition. On a physical node, when rows for a partition key are stored in order based on the clustering columns

Resources