viewing as list in cassandra - cassandra

Table
CREATE TABLE vehicle_details (
owner_name text,
vehicle list<text>,
price float,
vehicle_type text,
PRIMARY KEY(price , vehicle_type)
)
I have two issues over here
I am trying to view the list of the vehicle per user. If owner1 has 2 cars then it should show as owner_name1 vehicle1 & owner_name1 vehicle2. is it possible to do with a select query?
The output I am expecting
owner_name_1 | vehicle_1
owner_name_1 | vehicle_2
owner_name_2 | vehicle_1
owner_name_2 | vehicle_2
owner_name_2 | vehicle_3
I am trying to use owner_name in the primary key but whenever I use WHERE or DISTINCT or ORDER BY it does not work properly. I am going to query price, vehicle_type most of the time. but Owner_name would be unique hence I am trying to use it. I tried several combinations.
Below are three combinations I tried.
PRIMARY KEY(owner_name, price, vehicle_type) WITH CLUSTERING ORDER BY (price)
PRIMARY KEY((owner_name, price), vehicle_type)
PRIMARY KEY((owner_name, vehicle_type), price) WITH CLUSTERING ORDER BY (price)
Queries I am running
SELECT owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV';
SELECT Owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV' ORDER BY price desc;

Since your table has:
PRIMARY KEY(price , vehicle_type)
you can only run queries with filters on the partition key (price) or the partition key + clustering column (price + vehicle_type):
SELECT ... FROM ... WHERE price = ?
SELECT ... FROM ... WHERE price = ? AND vehicle_type = ?
If you want to be able to query by owner name, you need to create a new table which is partitioned by owner_name. I also recommend not storing the vehicle in a collection:
CREATE TABLE vehicles_by_owner
owner_name text,
vehicle text,
...
PRIMARY KEY (owner_name, vehicle)
)
By using vehicle as a clustering column, each owner will have rows of vehicles in the table. Cheers!

Related

How to index high cardinality column in cassandra

I have a column of high cardinality and i need to index that column, because i have to perform range queries on that column. I know that secondary indexes are not fit for high cardinality column in cassandra, so i tried to create materialized view on that table with that column as partition key, but range queries are not working without allow filtering on that view. Always perform allow filtering queries is not a good practice for large amount of data. What schema should I use?
CREATE TABLE testing_status.input_log_profile_1 (
cid text,
ctdon bigint,
ctdat bigint,
email text,
addrs set<frozen<udt_addrs>>,
asset set<frozen<udt_asset>>,
cntno set<frozen<udt_cntno>>,
dob frozen<udt_date>,
dvc set<frozen<udt_dvc>>,
eaka set<text>,
edmn text,
educ set<frozen<udt_educ>>,
error_status text,
gen tinyint,
hobby set<text>,
income set<frozen<udt_income>>,
interest set<text>,
lang set<frozen<udt_lang>>,
levnt set<frozen<udt_levnt>>,
like map<text, frozen<set<text>>>,
loc set<frozen<udt_loc>>,
mapp set<text>,
name frozen<udt_name>,
params map<text, frozen<set<text>>>,
prfsn set<frozen<udt_prfsn>>,
rel set<frozen<udt_rel>>,
rel_s tinyint,
skills_prfsn set<frozen<udt_skill_prfsn>>,
snw set<frozen<udt_snw>>,
sport set<text>,
status_id tinyint,
PRIMARY KEY (cid, ctdon, ctdat, email)
) WITH CLUSTERING ORDER BY (ctdon ASC, ctdat ASC, email ASC)
CREATE CUSTOM INDEX status_idx ON testing_status.input_log_profile_1 (status_id) USING 'org.apache.cassandra.index.sasi.SASIIndex';
CREATE CUSTOM INDEX err_idx ON testing_status.input_log_profile_1 (error_status) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
where status_id contains sequence of ids which is high cardinality field.
And my query is like
select * from input_log_profile_1 where cid='1_1' and status_id >= 1 and status_id <= 100 ;

How should I design the schema to get the last 2 records of each clustering key in Cassandra?

Each row in my table has 4 values product_id, user_id, updated_at, rating.
I'd like to create a table to find out how many users changed rating during a given period.
Currently my schema looks like:
CREATE TABLE IF NOT EXISTS ratings_by_product (
product_id int,
updated_at timestamp,
user_id int,
rating int,
PRIMARY KEY ((product_id ), updated_at , user_id ))
WITH CLUSTERING ORDER BY (updated_at DESC, user_id ASC);
but I couldn't figure out the way to only get the last 2 rows of each user in a given time window.
Any advice on query or changing the schema would be appreciated.
Cassandra requires a query-based approach to table design. Which means that typically one table will serve one query. So to serve the query you are talking about (last two updated rows per user) you should build a table specifically designed to serve it:
CREATE TABLE ratings_by_user_by_time (
product_id int,
updated_at timestamp,
user_id int,
rating int,
PRIMARY KEY ((user_id ), updated_at, product_id ))
WITH CLUSTERING ORDER BY (updated_at DESC, product_id ASC );
Then you will be able to get the last two updated ratings for a user by doing the following:
SELECT * FROM ratings_by_user_by_time
WHERE user_id = 'Bob' LIMIT 2;
Note that you'll need to keep the two ratings tables in-sync yourself, and using a batch statement is a good way to accomplish that.

CQL: Select all rows, distinct partition key

I have a table
CREATE TABLE userssbyprofit (
userid text,
profit double,
dateupdated timeuuid,
PRIMARY KEY (userid, profit, dateupdated)
) WITH CLUSTERING ORDER BY (profit DESC, dateupdated DESC)
Userid can be used to lookup the full user details in another table. This table will provide a history of the users profits. Taking the most recent one to find their current profit amount.
How do I retrieve the 10 most profitable users with their profit amount. I want it to be distinct based on the userID
Thanks.
You need to create one more table or view which have only user id and profit . New table or view will have user id order by profit with desc order .

Apache Cassandra table not sorting by name or title correctly

I have the following Apache Cassandra Table working.
CREATE TABLE user_songs (
member_id int,
song_id int,
title text,
timestamp timeuuid,
album_id int,
album_title text,
artist_names set<text>,
PRIMARY KEY ((member_id, song_id), title)
) WITH CLUSTERING ORDER BY (title ASC)
CREATE INDEX user_songs_member_id_idx ON music.user_songs (member_id);
When I try to do a select * FROM user_songs WHERE member_id = 1; I thought the Clustering Order by title would have given me a sorted ASC of the return - but it doesn't
Two questions:
Is there something with the table in terms of ordering or PK?
Do I need more tables for my needs in order to have a sorted title by member_id.
Note - my Cassandra queries for this table are:
Find all songs with member_id
Remove a song from memeber_id given song_id
Hence why the PK is composite
UPDATE
It is simialr to: Query results not ordered despite WITH CLUSTERING ORDER BY
However one of the suggestion in the comments is to put member_id,song_id,title as primary instead of the composite that I currently have. When I do that It seems that I cannot delete with only song_id and member_id which is the data that I get for deleting (hence title is missing when deleting)

how to query to get an ordered result from cassandra table

i need to query like:
select * from items where group="a" order by update_time desc;
however,the column "update_time" of each row will not be fixed,it will change as we need.
so,how can i design the cassandra tables to achieve the goal : querying to get an ordered result?
i need to query like:
select * from items where group="a" order by update_time desc;
however,the column "update_time" of each row will not be fixed,it will change as we need.
so,how can i design the cassandra tables to achieve the goal : querying to get an ordered result?
Sample Table
CREATE TABLE items (
ItemA text,
update_time timeuuid,
ItemB int,
PRIMARY KEY ( ItemA, update_time)
) WITH CLUSTERING ORDER BY (update_time DESC);
Ordering Field should be part of clustering key.
Please refer the above table, where we ordering the rows update_time as Desending order.

Resources