Partition key only in Cassandra - cassandra

In Cassandra, I understand that by default, given PRIMARY KEY(id1, id2), id1 will be partition key and id2 will be clustering key.
I want to know if can I define two partition keys without any clustering key as follows:
PRIMARY KEY ((id1, id2));

Your understanding is correct.
Your PRIMARY KEY ((id1, id2)) is correct and you are specifying one partition key consisting of two columns.
In the second case, you can query the data only by specifying both columns values. EG:
SELECT * FROM mytable WHERE id1=1 AND id2=3;
and queries like:
SELECT * FROM mytable WHERE id1=1;
will fail because id2 is part of your primary key.

Related

Cassandra DB misunderstanding partition key and primary key

Good Evening,
my problem is, that my recent understanding for partition and primary key is, that the partition key is to distribute the data between the nodes, and the primary ALWAYS contains the partition key. I want to create a partition key to cluster the data with duplicate partition keys and in these clusters I want to have a primary key for unique rows. In my first understanding of Cassandra, it could be possible if can take apart the partition and primary key. Is this possible?
An example to ease my idea:
country
state
unique_id
USA
TEXAS
123
USA
TEXAS
114
country and state as the partition key and the unique id as the primary key.
If I create the primary key like this: PRIMARY KEY ((country, state,unique_id)) I can't filter without using the unique_id but I want e.g. a query like SELECT unique_id FROM table WHERE state = 'Texas' and country = 'USA'.
If I create the primary key in this way: PRIMARY KEY ((country, state)), it obviously overwrites the data every time one entry gets inserted with the same country and state that's why I need the unique primary key.
Primary key always includes the partition key, that's always a first item in the primary key. Partition key could consist out of multiple columns, that's why you have brackets around first item in your example. I believe that in your case, primary key should be as following:
PRIMARY KEY ((country, state),unique_id)
In this case, partition key is a combination of country + state, and then inside that partition you will have unique IDs that will be used to select specific items. General syntax for primary key is:
partition key, clustering column1, clustering column2, ...
where partition key could be either:
column - single column
(column1, column2, ...) - multiple columns

How do order by with one primary key cassandra?

I'm trying to use the order by feature of cassandra, but with only one primary key. But when I try to create my table, this is what cassandra returns.
CREATE TABLE user_classement
(
user_name set<text>,
score float,
PRIMARY KEY (score)
) WITH CLUSTERING ORDER BY (score DESC);
But cassandra throws this error:
Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive
In case there are two primary keys when I create a new column, it works but with only one primary key, I get this error.
Do you know if it is possible to make an order by with only one primary key?
primary key in Cassandra consists of partition key and clustering key. First part in primary key represents partition key. So in your example score is the partition key and ordering can be applied on clustering keys. If you have had a primary key like PRIMARY KEY (score, rank) then you can apply ordering on rank. For partition ordering you may try ByteOrderedPartitioner. But I have not tried it so cannot comment further than this.
Edit 1: As added by Aaron in comments only Murmur3 paritioner should be used. ByteOrderPartitioner is only for backward compatibility for upgrade from old versions.

Understanding the relationship between primary key and partitioning in Cassandra

I am new to Cassandra and have a few novice level questions in the primary key.
Is the Primary key supposed to be unique per record? (My guess would be not.)
To elaborate. Suppose my table looks like this
CREATE TABLE user_action (
user_id int,
action text,
date_of_action date,
PRIMARY KEY (user_id)
)
I am guessing I can have multiple rows with the same user_id
If primary key is not one per record, can a primary key be split across many partitions?
Can a partition have multiple primary keys?
Is the primary key itself decided to pick the partition or is the hashCode of the primary key used to pick a partition?
Is it fair to think of a partition as a file?
Primary key and Partition key in some case would be the same but not always, it depends upon the number of primary keys. Data is distributing based on partition key which is unique across the Cassandra cluster. I am not explaining all the scenario and concept here but yes, you should go through the documentation and I am sure you can understand the things very quick after reading the below link.
https://www.datastax.com/blog/2016/02/most-important-thing-know-cassandra-data-modeling-primary-key
https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useCompoundPrimaryKeyConcept.html
1>Is the Primary key supposed to be unique per record? (My guess would
be not.) To elaborate. Suppose my table looks like this
CREATE TABLE user_action ( user_id int, action text, date_of_action
date, PRIMARY KEY (user_id) )
Primary key is supposed to be unique per record /row. In the example you mentioned, you can have only one record for user_id. For allowing multiple rows with same user_id, you have to introduce a differentiating key. This key is called clustering key in Cassandra and it forms a part of primary key.
Primary key is a combination of (partition key and clustering key(s)). Partition key is used by Cassandra to find a partition/record. If clustering key is defined in data model then it will be used to differentiate different rows. If no clustering key is defined as in your case then only one record will be kept in database.
In example below you can have same user_id records who live different states. Here Primary key is combination of (user_id, state). user_id is the partition key and state is clustering key.
CREATE TABLE user_action (
user_id int,
state text,
action text,
date_of_action date,
PRIMARY KEY (user_id,state)
)
I am guessing I can have multiple rows with the same user_id
As explained above you can have multiple rows with the same user_id if you define a clustering key otherwise with the example you quoted, it is not possible.
2>If primary key is not one per record, can a primary key be split
across many partitions?
Primary key cannot be split across many partitions. As explained above partition key part of primary key will always point to unique partition.
3>Can a partition have multiple primary keys?
In the example I have quoted, (1,RJ), (1,GJ) can be possible primary keys pointing towards single partition pointed by parition key value 1. So you can have multiple primary keys for a partitions in that sense.
4>Is the primary key itself decided to pick the partition or is the
hashCode of the primary key used to pick a partition?
Hashcode of partition key (part of primary key) is used to get the partition
5>Is it fair to think of a partition as a file?
It will depend on your data model.

Retrieve rows based on column of type "time" in cassandra db

How to retrieve rows based on column of type "time" in cassandra db.
We tried with query
select *
from payment_transactions_by_transactiondate
where transaction_time>='00:00:00'
and transaction_time<='23:59:59'
and transaction_date='2018-03-21'
allow filtering;
,
but its not fetching the rows (where transaction_time is a primary key).
You can not do a range query on the primary key. It's because Cassandra distributes data on different node based on a primary key. Instead What you can do, is to make the transaction_time clustering key. See the difference between primary key and clustering key. From the above query, it seems you need transactions in a particular date(transaction_date). So to do this query make transaction_date primary key and transaction_time clustering key.
For example:
create table payment_transactions_by_transactiondate(
....
....
....
primary key (transaction_date, transaction_time)
);

How Cassandra choose clustering key when we only specify Primary key in a table

We can specify compound primary key in cassandra. When we specify only one column as Primary key then how Cassandra will generate clustering key.
In cassandra primary key automatically becomes partition key.
Example :-
CASE 1 - K1: primary key has only one partition key and no cluster key.
CASE 2 - (K1, K2): column K1 is a partition key and column K2 is a cluster key.
CASE 3 - (K1,K2,K3,...): column K1 is a partition key and columns K2, K3 and so on make cluster key.
In Case 1 how cassandra will choose the Clustering key for a given table with only Specifying Primary Key
Thank you
In case no clustering key is provided (CASE 1), then cassandra will not choose any clustering key.
How CQL3 maps to cassandra internal data structure
If you describe a table with such a primary key, you will not get WITH CLUSTERING ORDER BY option.

Resources