CQL: Select all rows, distinct partition key - cassandra

I have a table
CREATE TABLE userssbyprofit (
userid text,
profit double,
dateupdated timeuuid,
PRIMARY KEY (userid, profit, dateupdated)
) WITH CLUSTERING ORDER BY (profit DESC, dateupdated DESC)
Userid can be used to lookup the full user details in another table. This table will provide a history of the users profits. Taking the most recent one to find their current profit amount.
How do I retrieve the 10 most profitable users with their profit amount. I want it to be distinct based on the userID
Thanks.

You need to create one more table or view which have only user id and profit . New table or view will have user id order by profit with desc order .

Related

viewing as list in cassandra

Table
CREATE TABLE vehicle_details (
owner_name text,
vehicle list<text>,
price float,
vehicle_type text,
PRIMARY KEY(price , vehicle_type)
)
I have two issues over here
I am trying to view the list of the vehicle per user. If owner1 has 2 cars then it should show as owner_name1 vehicle1 & owner_name1 vehicle2. is it possible to do with a select query?
The output I am expecting
owner_name_1 | vehicle_1
owner_name_1 | vehicle_2
owner_name_2 | vehicle_1
owner_name_2 | vehicle_2
owner_name_2 | vehicle_3
I am trying to use owner_name in the primary key but whenever I use WHERE or DISTINCT or ORDER BY it does not work properly. I am going to query price, vehicle_type most of the time. but Owner_name would be unique hence I am trying to use it. I tried several combinations.
Below are three combinations I tried.
PRIMARY KEY(owner_name, price, vehicle_type) WITH CLUSTERING ORDER BY (price)
PRIMARY KEY((owner_name, price), vehicle_type)
PRIMARY KEY((owner_name, vehicle_type), price) WITH CLUSTERING ORDER BY (price)
Queries I am running
SELECT owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV';
SELECT Owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV' ORDER BY price desc;
Since your table has:
PRIMARY KEY(price , vehicle_type)
you can only run queries with filters on the partition key (price) or the partition key + clustering column (price + vehicle_type):
SELECT ... FROM ... WHERE price = ?
SELECT ... FROM ... WHERE price = ? AND vehicle_type = ?
If you want to be able to query by owner name, you need to create a new table which is partitioned by owner_name. I also recommend not storing the vehicle in a collection:
CREATE TABLE vehicles_by_owner
owner_name text,
vehicle text,
...
PRIMARY KEY (owner_name, vehicle)
)
By using vehicle as a clustering column, each owner will have rows of vehicles in the table. Cheers!

How should I design the schema to get the last 2 records of each clustering key in Cassandra?

Each row in my table has 4 values product_id, user_id, updated_at, rating.
I'd like to create a table to find out how many users changed rating during a given period.
Currently my schema looks like:
CREATE TABLE IF NOT EXISTS ratings_by_product (
product_id int,
updated_at timestamp,
user_id int,
rating int,
PRIMARY KEY ((product_id ), updated_at , user_id ))
WITH CLUSTERING ORDER BY (updated_at DESC, user_id ASC);
but I couldn't figure out the way to only get the last 2 rows of each user in a given time window.
Any advice on query or changing the schema would be appreciated.
Cassandra requires a query-based approach to table design. Which means that typically one table will serve one query. So to serve the query you are talking about (last two updated rows per user) you should build a table specifically designed to serve it:
CREATE TABLE ratings_by_user_by_time (
product_id int,
updated_at timestamp,
user_id int,
rating int,
PRIMARY KEY ((user_id ), updated_at, product_id ))
WITH CLUSTERING ORDER BY (updated_at DESC, product_id ASC );
Then you will be able to get the last two updated ratings for a user by doing the following:
SELECT * FROM ratings_by_user_by_time
WHERE user_id = 'Bob' LIMIT 2;
Note that you'll need to keep the two ratings tables in-sync yourself, and using a batch statement is a good way to accomplish that.

Cassandra - Is there a way to update column value for entire table

I have Cassandra table:
CREATE TABLE test (
network_id int,
date date,
score float,
id uuid,
user_id int,
user_name text,
PRIMARY KEY ((network_id, date), score, id))
WITH CLUSTERING ORDER BY (score DESC);
Query which I need to satisfy is:
"Give me all users which belongs to specific network for specific day sorted by score."
The problem is when user change his name (today) and when I have to execute query for some day in past my report will show old version of the name.
Changing column user_name to STATIC doesn't work because my table should be partitioned by day.
Any ideas how to solve this?
Thank You.
Since you have denormalized user_name for faster access, If the user_name updated you have to update all the copy of that user_name.
You need to maintain another table
CREATE TABLE network_by_user_id (
user_id int,
network_id int,
date date,
score float,
id uuid,
PRIMARY KEY (user_id, network_id, date, score, id)
);
So now whenever any user update their name you have to select all the record of that user from network_by_user_id table and for each record update user_name of base table
update test set user_name = 'New Name' where network_id = ? and date = ? and score = ? and id = ?
If the number of record for a user fastly increase over time, then the cost of update user_name will also fastly increase over time.
Another approach is to normalize the base table like below :
CREATE TABLE test (
network_id int,
date date,
score float,
id uuid,
user_id int,
PRIMARY KEY ((network_id, date), score, id)
);
CREATE TABLE users (
user_id int,
user_name text,
PRIMARY KEY (user_id)
);
For each user_id found in the base table you can query into users with execute async to get the user_name
Learn More about executeAsync
you can use SELECT command if you want to get any data from your Table

One to many mapping in Cassandra

I am new to Cassandra and would like to do One to many mapping of User and its vehicle. One user may have multiple Vehicles. My User table will contain User details like name, surname, etc. And Vehicle table will have Vehicle details.
My select query will fetch all Vehicle details for particular User.
How should I design this in Cassandra?
You can easily model this in a single table:
CREATE TABLE userVehicles (
userid text,
vehicleid text,
name text static,
surname text static,
vehicleMake text,
vehicleModel text,
vehicleYear text,
PRIMARY KEY (userid,vehicleid)
);
This way you can query vehicles for a single user in one shot, and your user data can be static so that it is stored at the partition key level. As long as the cardinality of user to vehicle isn't too big (as-in, like a user has 1000 vehicles) this should work just fine.
The case I have considered above is very simple. But what if my User has lot of details around 20 to 30 fields and same for Vehicle. Still you would suggest to have a single table and copying User data for all vehicle?
It depends. Would your use case require returning all of them? If so, then "yes" I would still recommend this approach. The way to get the best query performance out of Cassandra, is to model your tables to fit your queries. Cassandra works best when it can read a single row by a specific key, or a range of rows (stored sequentially). You want to avoid performing multiple queries or writing queries that force Cassandra to perform random reads.
What are the consequences of having 2 different tables like User and Vehicle and Vehicle table will have primary key as User_Id and Vehicle_Id?
In a distributed system network time is the enemy. By having two tables, you are now making two queries...assuming a 1 to 1 ratio of users to vehicles. But if your user has 8 vehicles, you now need 9 queries to achieve your result. With the design above you can build your result set in 1 query (minimizing network time). Also with userid as a partition key, that query is guaranteed to be served by one node, as opposed to additional queries for vehicle data which will most likely require contacting multiple nodes.
This seems as simple as having two tables, one holding all of your vehicles data and another one for satisfying your query:
CREATE TABLE vehicles (
vehicle_id bigint,
vehicle_type int,
vehicle_name text,
...
PRIMARY KEY (vehicle_type)
)
CREATE TABLE vehicles_to_users (
user_id bigint,
vehicle_id bigint,
vehicle_type int,
vehicle_name text,
...
PRIMARY KEY (user_id, vehicle_type)
)
Then you would query by:
SELECT * FROM vehicles_to_users WHERE user_id = 9;
or something like that to get all specific vehicle type belonging to a particular user:
SELECT * FROM vehicles_to_users WHERE user_id = 9 AND vehicle_type = 1;
This is a solution with denormalized data, and you should always consider that approach instead of having something like:
CREATE TABLE vehicles (
vehicle_id bigint,
vehicle_type int,
vehicle_name text,
...
PRIMARY KEY (vehicle_type)
)
CREATE TABLE vehicles_to_users (
user_id bigint,
vehicle_id bigint,
PRIMARY KEY (user_id)
)
because it belongs to the relational databases world and you'd have to run N+1 queries to satisfy your requirements: one to get all the ids belonging to a particular user, and then N queries to get all the information for each vehicle:
SELECT * FROM vehicles_to_users WHERE user_id = 9;
SELECT * FROM vehicles WHERE vehicle_id = 115;
SELECT * FROM vehicles WHERE vehicle_id = 116;
SELECT * FROM vehicles WHERE vehicle_id = ...;
And don't be tempted to use the IN clausole like this:
SELECT * FROM vehicles WHERE vehicle_id IN (115,116,....);
because it would perform even worse due to extra work that a coordinator node have to do.

how to query to get an ordered result from cassandra table

i need to query like:
select * from items where group="a" order by update_time desc;
however,the column "update_time" of each row will not be fixed,it will change as we need.
so,how can i design the cassandra tables to achieve the goal : querying to get an ordered result?
i need to query like:
select * from items where group="a" order by update_time desc;
however,the column "update_time" of each row will not be fixed,it will change as we need.
so,how can i design the cassandra tables to achieve the goal : querying to get an ordered result?
Sample Table
CREATE TABLE items (
ItemA text,
update_time timeuuid,
ItemB int,
PRIMARY KEY ( ItemA, update_time)
) WITH CLUSTERING ORDER BY (update_time DESC);
Ordering Field should be part of clustering key.
Please refer the above table, where we ordering the rows update_time as Desending order.

Resources