How is Cassandra sorting static column families - cassandra

As far as I know, a comparator is specified on the column family level. So far I have use it with dynamic columns (wide-rows). Which type of comparator is Cassandra using when you create a static column family using CQL?
CREATE TABLE songs (
id uuid PRIMARY KEY,
title text,
album text,
artist text,
data blob
);
and what happens if you throw a composite key into the mix.
CREATE TABLE songs (
id uuid,
title text,
album text,
artist text,
data blob,
PRIMARY KEY ((id, title), album)
);

http://cassandra.apache.org/doc/cql3/CQL.html#createTablepartitionClustering
http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_compound_keys_c.html
On a given physical node, rows for a given partition key are stored in the order induced by the clustering columns.
So in the 2nd case your partition key is (id, title), and clustering key is album, meaning all the rows for a given partition key will be stored ordered by album

Related

How can I set Cassandra primary key?

I specify 2 unique data types, but when one of them is different, it keeps adding records.
The table schema has a compound primary key, i.e. it is composed of a partition key (username) and clustering key (email). This means that each partition has one or more rows of emails.
It is a completely different schema to a table with just a simple primary key (only has a partition key, no clustering key) like this:
CREATE TABLE users_by_username (
username text,
...
PRIMARY KEY (username)
)
This table would only ever have one row in each partition. Cheers!
[UPDATE] If you want your table to be partitioned by BOTH username + email, you need to create a new table which has a composite partition key (partition key has two or more columns):
CREATE TABLE users_by_username_email (
username text,
email text,
...
PRIMARY KEY ( (username, email) )
)
Note the difference: BOTH columns are enclosed in a bracket so they are treated as one key.

Not able to run multiple where clause without Cassandra allow filtering

Hi I am new to Cassandra.
We are working on IOT project where car sensor data will be stored in cassandra.
Here is the example of one table where I am going to store one of the sensor data.
This is some sample data.
The way I want to partition the data is based on the organization_id so that different organization data is partitioned.
Here is the create table command:
CREATE TABLE IF NOT EXISTS engine_speed (
id UUID,
engine_speed_rpm text,
position int,
vin_number text,
last_updated timestamp,
organization_id int,
odometer int,
PRIMARY KEY ((id, organization_id), vin_number)
);
This works fine. However all my queries will be as bellow:
select * from engine_speed
where vin_number='xyz'
and organization_id = 1
and last_updated >='from time stamp' and last_updated <='to timestamp'
Almost all queries in all the table will have similar / same where clause.
I am getting error and it is asking to add "Allow filtering".
Kindly let me know how do I partition the table and define right primary key and indexs so that I don't have to add "allow filtering" in the query.
Apologies for this basic question but I'm just starting using cassandra.(using apache cassandra:3.11.12 )
The order of where clause should match with the order of partition and clustering keys you have defined in your DDL and you cannot skip any part of primary key while applying the WHERE clause before using the next key. So as per the query pattern u have defined, you can try the below DDL:
CREATE TABLE IF NOT EXISTS autonostix360.engine_speed (
vin_number text,
organization_id int,
last_updated timestamp,
id UUID,
engine_speed_rpm text,
position int,
odometer int,
PRIMARY KEY ((vin_number, organization_id), last_updated)
);
But remember,
PRIMARY KEY ((vin_number, organization_id), last_updated)
PRIMARY KEY ((vin_number), organization_id, last_updated)
above two are different in Cassandra, In case 1 your data will be partitioned by combination of vin_number and organization_id while last_updated will act as ordering key. In case 2, your data will be partitioned only by vin_number while organization_id and last_updated will act as ordering key. So you need to figure out which case suits your use case.

Apache Cassandra table not sorting by name or title correctly

I have the following Apache Cassandra Table working.
CREATE TABLE user_songs (
member_id int,
song_id int,
title text,
timestamp timeuuid,
album_id int,
album_title text,
artist_names set<text>,
PRIMARY KEY ((member_id, song_id), title)
) WITH CLUSTERING ORDER BY (title ASC)
CREATE INDEX user_songs_member_id_idx ON music.user_songs (member_id);
When I try to do a select * FROM user_songs WHERE member_id = 1; I thought the Clustering Order by title would have given me a sorted ASC of the return - but it doesn't
Two questions:
Is there something with the table in terms of ordering or PK?
Do I need more tables for my needs in order to have a sorted title by member_id.
Note - my Cassandra queries for this table are:
Find all songs with member_id
Remove a song from memeber_id given song_id
Hence why the PK is composite
UPDATE
It is simialr to: Query results not ordered despite WITH CLUSTERING ORDER BY
However one of the suggestion in the comments is to put member_id,song_id,title as primary instead of the composite that I currently have. When I do that It seems that I cannot delete with only song_id and member_id which is the data that I get for deleting (hence title is missing when deleting)

Cassandra Composite Column Family

I have a simple requirement in sql world i want to create
CREATE TABLE event_tracking (
key text,
trackingid timeuuid,
entityId bigint,
entityType text
userid bigint
PRIMARY KEY (key, trackingid)
)
I need a cli create command which is I am not able to do it. I need to create column family through cli as pig cannot read column family created through cqlsh (duh)
Here what I tried and didnt worked
create column family event_tracking
... WITH comparator='CompositeType(TimeUUIDType)'
... AND key_validation_class=UTF8Type
... AND default_validation_class = UTF8Type;
1) I dont know why it add the value column to it when I see it in cqlsh
CREATE TABLE event_tracking (
key text,
trackingid timeuuid,
value text,
PRIMARY KEY (key, trackingid)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
2) I am using asynatax to insert the row.
OperationResult<CqlResult<Integer, String>> result = keyspace.prepareQuery(CQL3_CF)
.withCql("INSERT INTO event_tracking (key, column1, value) VALUES ("+System.currentTimeMillis()+","+TimeUUIDUtils.getTimeUUID(System.currentTimeMillis())+",'23232323');").execute();
but as soon as i try to add dynamic columns, it is not able to recognize
OperationResult<CqlResult<Integer, String>> result = keyspace.prepareQuery(CQL3_CF)
.withCql("INSERT INTO event_tracking (key, column1, value, userId, event) VALUES ("+System.currentTimeMillis()+","+TimeUUIDUtils.getTimeUUID(System.currentTimeMillis())+",'23232323', 123455, 'view');").execute();
looks like I cannot add dynamic columns through cql3
3) If I try to add new column through cql3
alter table event_tracking add eventid bigint;
it gives me
Bad Request: Cannot add new column to a compact CF
0) If you create the table with COMPACT STORAGE Pig should be able to see it, even if you create it from CQL3. But you would need to put entityId and entityType into the primary key too for that to work (compact storage basically means that the first column in the primary key becomes the row key and the following become a composite type used as the column key, and then there is only room for one more column which will be the value).
1) When you create tables the old way there will always be a value, it's the value of the column, and in CQL3 that is represented as a column called value. This is just how CQL3 maps the underlying storage model onto tables.
2) You have created a table whose columns are of the type CompositeType(TimeUUIDType), so you can only add columns that are TimeUUIDs. You can't tell C* to save a string as a TimeUUID column key.
3) Looping back to 0 use this table:
CREATE TABLE event_tracking (
key text,
trackingid timeuuid,
entityId bigint,
entityType text,
userid bigint,
PRIMARY KEY (key, trackingid, entityId, entityType)
) WITH COMPACT STORAGE
this one assumes that there can only be one trackingId/entityId/entityType combination for each userid (what's up with your inconsistent capitalization, btw?). It that's not the case you need to go the full dynamic columns route, but then you can't have different data types for entityId and entityType (but this would have been the case before CQL3 too), see this question for an example of how to do dynamic columns: Inserting arbitrary columns in Cassandra using CQL3

How does a CQL3 composite index with 3 fields map in the thrift column family world?

After reading this blog at planetcassandra, I'm wondering how does a CQL3 composite index with 3 fields map in the thrift column family word, For e.g.:
CREATE TABLE comments (
article_id uuid,
posted_at timestamp,
author text,
karma int,
content text,
PRIMARY KEY (article_id, posted_at)
)
Here the column article_id will be mapped to the internal row key and posted_at will be mapped to (the first part of) the cell name.
What if the table design will be
CREATE TABLE comments (
author_id varchar,
posted_at timestamp,
article_id uuid,
author text,
karma int,
content text,
PRIMARY KEY (author_id, posted_at, article_id)
)
And will the internal row key mapped to 1st 2 fields of the composite index with article_id mapped to cell name, essentially slicing for as many articles upto 2 billion entries and any query on author_id and posted_at combination is one seek on the disk?
Is the behavior same for any number of fields in a composite key?
Your answers much appreciated.
The above observation is incorrect and the correct one is here
I've personally verified:
In the first case:
article_id = partition key, posted_at = cluster key
In the second case:
author_id = partition key, posted_at:article_id = cluster key
First part of composite key (author_id) is called "Partition Key",
rest (posted_at,article_id) are remaining keys.
Cassandra stores columns differently when composite keys are used. Partition key
becomes row key. Remaining keys are concatenated with each column
name (":" as separator) to form column names. Column values remain
unchanged.
Remaining keys (other than partition keys) are ordered,
and it's not allowed to search on any random column, you have to
start with the first one and then you can move to the second one and
so on. This is evident from "Bad Request" error.
There's an excellent explanation by Aaron Morton # his site thelastpickle.
In the first case:
article_id = partition key, posted_at = cluster key
In the second case:
author_id + posted_at = partition key, article_id = cluster key
hence be mindful of the disk seeks as you go by second method and see the row is not getting too wide and gives real benefit compared to the first case.
If you aren't crossing the 2 billion and well within the limits, don't overdo by adopting the 2nd method, as the dispersion of records happens on the combo key.

Resources