How can I set Cassandra primary key? - cassandra

I specify 2 unique data types, but when one of them is different, it keeps adding records.

The table schema has a compound primary key, i.e. it is composed of a partition key (username) and clustering key (email). This means that each partition has one or more rows of emails.
It is a completely different schema to a table with just a simple primary key (only has a partition key, no clustering key) like this:
CREATE TABLE users_by_username (
username text,
...
PRIMARY KEY (username)
)
This table would only ever have one row in each partition. Cheers!
[UPDATE] If you want your table to be partitioned by BOTH username + email, you need to create a new table which has a composite partition key (partition key has two or more columns):
CREATE TABLE users_by_username_email (
username text,
email text,
...
PRIMARY KEY ( (username, email) )
)
Note the difference: BOTH columns are enclosed in a bracket so they are treated as one key.

Related

Not able to run multiple where clause without Cassandra allow filtering

Hi I am new to Cassandra.
We are working on IOT project where car sensor data will be stored in cassandra.
Here is the example of one table where I am going to store one of the sensor data.
This is some sample data.
The way I want to partition the data is based on the organization_id so that different organization data is partitioned.
Here is the create table command:
CREATE TABLE IF NOT EXISTS engine_speed (
id UUID,
engine_speed_rpm text,
position int,
vin_number text,
last_updated timestamp,
organization_id int,
odometer int,
PRIMARY KEY ((id, organization_id), vin_number)
);
This works fine. However all my queries will be as bellow:
select * from engine_speed
where vin_number='xyz'
and organization_id = 1
and last_updated >='from time stamp' and last_updated <='to timestamp'
Almost all queries in all the table will have similar / same where clause.
I am getting error and it is asking to add "Allow filtering".
Kindly let me know how do I partition the table and define right primary key and indexs so that I don't have to add "allow filtering" in the query.
Apologies for this basic question but I'm just starting using cassandra.(using apache cassandra:3.11.12 )
The order of where clause should match with the order of partition and clustering keys you have defined in your DDL and you cannot skip any part of primary key while applying the WHERE clause before using the next key. So as per the query pattern u have defined, you can try the below DDL:
CREATE TABLE IF NOT EXISTS autonostix360.engine_speed (
vin_number text,
organization_id int,
last_updated timestamp,
id UUID,
engine_speed_rpm text,
position int,
odometer int,
PRIMARY KEY ((vin_number, organization_id), last_updated)
);
But remember,
PRIMARY KEY ((vin_number, organization_id), last_updated)
PRIMARY KEY ((vin_number), organization_id, last_updated)
above two are different in Cassandra, In case 1 your data will be partitioned by combination of vin_number and organization_id while last_updated will act as ordering key. In case 2, your data will be partitioned only by vin_number while organization_id and last_updated will act as ordering key. So you need to figure out which case suits your use case.

Ordering by username in Cassandra

Let's say I have this table:
CREATE TABLE "users" (
username text,
created_at timeuuid,
email text,
firstname text,
groups list<text>,
is_active boolean,
lastname text,
"password" text,
roles list<text>,
PRIMARY KEY (username, created_at)
)
I want to order users by username, which is not possible as ordering is only possible via the clustering column. How can I order my users by username?
I need to query users by username, so that is the reason, why username is the indexing column.
What is the right approach here?
If you absolutely must have the username sorted, and return all usernames in one query then you will need to create another table for this effect:
CREATE TABLE "users" (
field text,
value text,
PRIMARY KEY (field, value)
)
Unfortunately, this will put all the usernames in just one partition, but it's the only way of keeping them sorted. On the other hand, you could expand the table to store different values that you need to retrieve in the same way. So for instance, the partition field="username" would have all the usernames, but you could create another partition field="Surname" to store all the usernames sorted.
Cassandra is NoSQL, so duplication of data can be expected.
Cassandra stores the partition key data by hashing the value.
So when the data is returned, the order is done by the hash values and not order of the data itself. Thus, you can't order on the partition key.
Coming back to your question, I'm not sure about what kind of data it is and what kind of query you would want to run. Assuming multiple users per email I'd create the following table:
CREATE TABLE "users" (
username text,
created_at timeuuid,
email text,
firstname text,
groups list<text>,
is_active boolean,
lastname text,
"password" text,
roles list<text>,
PRIMARY KEY (email, username)
)

Cassandra order by on combination of composite keys

I originally wrote a table that tracks feeds that have been assigned to a user for review.
create table user_feed
{
userid uuid,
languageid uuid,
topicid_uuid,
dateinserted timeuuid,
primary key (userid, languageid, topicid, dateinserted)
};
I realized soon after I created this table that I wouldn't be able to sort this table (order by DESC) by dateinserted because for some weird reason, in Cassandra I can only order by the second (and last) column of a composite key table (as in, the table has to have 2 composite keys and order by can only happen on the second column of this key) so I changed my table to this:
create table user_feed
{
userid uuid,
languageid uuid,
topicid_uuid,
dateinserted timeuuid,
primary key (userid, dateinserted)
};
and now I was able to run a query to get the latest feeds for the user, using order by.
However, I have a new requirement that requires me to sort the feeds by a combination of (languageid + userid) or (topicid + userid) or (languageid + topicid + userid).
I had an idea to create three new tables and have the keys combined into one key column. For example, for userid + topic query, I would use:
create table user_feed_by_topic
{
usertopicidkey text,
dateinserted timeuuid,
primary key (usertopicidkey, dateinserted)
};
where usertopididkey = userid.toString() + topicid.toString().
Of course, this solution requires 4 separate inserts whenever I need to insert a new feed row since I have 4 rows, tracking identical data but partitioned differently to allow sorting.
My question is, is there a better way to do this? Is there any way to achieve what I want (query by a combination of columns and order by another column) or am I stuck with my 4 table design approach?
Many thanks,
Cassandra will order all rows based on the PKs clustering columns. In case your PK is primary key (userid, languageid, topicid, dateinserted) all rows will be sorted by languageid, topicid and dateinserted in ascending order. This implies that all rows will only be sorted within a specific language and topic by date. You'd have to use the date as the first clustering key column to change this behaviour.
Its common practice to denormalize your data across multiple tables to implement different ordering strategies.

ORDER BY reloaded, cassandra

A given column family I would like to sort and to this I am trying to create a table with the option CLUSTERING ORDER BY. I always encounter the following errors:
1.) Variant A resulting in
Bad Request: Missing CLUSTERING ORDER for column userid
Statement:
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc);
2.) Variant B resulting in
Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directive
Statement:
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc, userID asc);
As far as I can see in the manual this is the correct syntax for creating a table for which I would like to run queries as "SELECT .... FROM user WHERE ... ORDER BY lastname". How could I achieve this? (The column 'lastname' I would like to keep as the first part of the primary key, so that I could use it in delete statements with the WHERE-clause.)
Thanks a lot, Tamas
You can only specify clustering order on your clustering keys.
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc);
In your first example, your only clustering key is userID. Thus, it is the only valid entry for CLUSTERING ORDER BY.
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc, userID asc);
The second example fails because you are specifying your partition key in CLUSTERING ORDER BY, and that's not going to work either.
Cassandra works by ordering CQL rows according to clustering keys, but only when a partition key is specified. This is because the whole idea of Cassandra wide-row modeling is to query by partition key, and read a series of ordered rows in one query operation.
I would like to run queries as "SELECT .... FROM user WHERE ... ORDER BY lastname".
Given this statement, I am going to suggest that you need another column in this model before it will work the way you want. What you need is an appropriate partition key for your users table. Say...like group. With your users partitioned by group, and clustered by lastname, your definition would look something like this:
CREATE TABLE test.usersbygroup (
userID timeuuid,
firstname varchar,
lastname varchar,
group text,
PRIMARY KEY (group,lastname)
)WITH CLUSTERING ORDER BY (lastname desc);
Then, this query will work, returning users (in this case) who are fans of the show "Firefly," ordered by lastname (descending):
SELECT * FROM usersbygroup WHERE group='Firefly Fans';
Read through this DataStax doc on Compound Keys and Clustering to get a better understanding.
NOTE: You don't need to specify ORDER BY in your SELECT. The rows will come back ordered by their clustering key(s), and ORDER BY cannot change that. All ORDER BY can really do, is alter the sort direction (DESCending vs. ASCending).
Clustering would be limited to whats defined in partitioning key, in your case (lastName + userId). So cassandra would store result in sorted order whose (lastName+userId) combination. Thats why u nned to give both for retrieval purpose. Its still not useful schema if you want to sort all data in table as last name as userId is unique(timeuuid) so clustering key would be of no use.
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
bucket int,
PRIMARY KEY (bucket)
)WITH CLUSTERING ORDER BY (lastname desc);
Here if u provide buket value say 1 for all user records then , all user would go in same bucket and hense it would retrieve all rows in sorted order of last name. (By no mean this is a good design, just to give you an idea).
Revised :
CREATE TABLE user1 (
userID uuid,
firstname varchar,
lastname varchar,
bucket int,
PRIMARY KEY ((bucket), lastname,userID)
)WITH CLUSTERING ORDER BY (lastname desc);

How is Cassandra sorting static column families

As far as I know, a comparator is specified on the column family level. So far I have use it with dynamic columns (wide-rows). Which type of comparator is Cassandra using when you create a static column family using CQL?
CREATE TABLE songs (
id uuid PRIMARY KEY,
title text,
album text,
artist text,
data blob
);
and what happens if you throw a composite key into the mix.
CREATE TABLE songs (
id uuid,
title text,
album text,
artist text,
data blob,
PRIMARY KEY ((id, title), album)
);
http://cassandra.apache.org/doc/cql3/CQL.html#createTablepartitionClustering
http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_compound_keys_c.html
On a given physical node, rows for a given partition key are stored in the order induced by the clustering columns.
So in the 2nd case your partition key is (id, title), and clustering key is album, meaning all the rows for a given partition key will be stored ordered by album

Resources