how to query to get an ordered result from cassandra table - cassandra

i need to query like:
select * from items where group="a" order by update_time desc;
however,the column "update_time" of each row will not be fixed,it will change as we need.
so,how can i design the cassandra tables to achieve the goal : querying to get an ordered result?
i need to query like:
select * from items where group="a" order by update_time desc;
however,the column "update_time" of each row will not be fixed,it will change as we need.
so,how can i design the cassandra tables to achieve the goal : querying to get an ordered result?

Sample Table
CREATE TABLE items (
ItemA text,
update_time timeuuid,
ItemB int,
PRIMARY KEY ( ItemA, update_time)
) WITH CLUSTERING ORDER BY (update_time DESC);
Ordering Field should be part of clustering key.
Please refer the above table, where we ordering the rows update_time as Desending order.

Related

viewing as list in cassandra

Table
CREATE TABLE vehicle_details (
owner_name text,
vehicle list<text>,
price float,
vehicle_type text,
PRIMARY KEY(price , vehicle_type)
)
I have two issues over here
I am trying to view the list of the vehicle per user. If owner1 has 2 cars then it should show as owner_name1 vehicle1 & owner_name1 vehicle2. is it possible to do with a select query?
The output I am expecting
owner_name_1 | vehicle_1
owner_name_1 | vehicle_2
owner_name_2 | vehicle_1
owner_name_2 | vehicle_2
owner_name_2 | vehicle_3
I am trying to use owner_name in the primary key but whenever I use WHERE or DISTINCT or ORDER BY it does not work properly. I am going to query price, vehicle_type most of the time. but Owner_name would be unique hence I am trying to use it. I tried several combinations.
Below are three combinations I tried.
PRIMARY KEY(owner_name, price, vehicle_type) WITH CLUSTERING ORDER BY (price)
PRIMARY KEY((owner_name, price), vehicle_type)
PRIMARY KEY((owner_name, vehicle_type), price) WITH CLUSTERING ORDER BY (price)
Queries I am running
SELECT owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV';
SELECT Owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV' ORDER BY price desc;
Since your table has:
PRIMARY KEY(price , vehicle_type)
you can only run queries with filters on the partition key (price) or the partition key + clustering column (price + vehicle_type):
SELECT ... FROM ... WHERE price = ?
SELECT ... FROM ... WHERE price = ? AND vehicle_type = ?
If you want to be able to query by owner name, you need to create a new table which is partitioned by owner_name. I also recommend not storing the vehicle in a collection:
CREATE TABLE vehicles_by_owner
owner_name text,
vehicle text,
...
PRIMARY KEY (owner_name, vehicle)
)
By using vehicle as a clustering column, each owner will have rows of vehicles in the table. Cheers!

How to sum up cassandra counter grouping by only one column in the primary key set?

I am trying to keep track of the amount of events of each type that occured in one-hour buckets of time, and then sum the counts per category in arbitrary time ranges. So, I create a table like this:
CREATE TABLE IF NOT EXISTS sensor_activity_stats(
sensor_id text,
datetime_hour_bucket timestamp,
activity_type text,
activity_count counter,
PRIMARY KEY ((sensor_id), datetime_hour_bucket, activity_type)
)
WITH CLUSTERING ORDER BY(datetime_hour_bucket DESC, activity_type ASC);
I would like to be able to achieve this kind of query:
SELECT datetime_hour_bucket, activity_type, SUM(activity_count) as count
FROM sensor_activity_stats
WHERE sensor_id=:sensorId
AND datetime_hour_bucket >= :fromDate AND datetime_hour_bucket < :untilDate
GROUP BY activity_type
Cassandra complains about because grouping must be done in the order of the primary key columns. And, if I change the order I won't be able to query by a range over any activity_type.
Some notes:
I am grouping by hours because some users could ask me to show the data in different timezones and I want to be able to perform a decent conversion.
The activity_type has low cardinality, however I can not be sure I'll always be able to predict it's possible values.
Right now my solution was to query the whole data in the range and perform the aggregation myself in code. Have you have faced similar situation and what was your solution? Would you suggest a different way of querying or arranging the data?
I hope you've found the solution of your problem, however I have a way to you try.
First, you can chage the create table to change the order of fields:
CREATE TABLE IF NOT EXISTS sensor_activity_stats(
sensor_id text,
datetime_hour_bucket timestamp,
activity_type text,
activity_count counter,
PRIMARY KEY (activity_type, sensor_id, datetime_hour_bucket, activity_count)
)
WITH CLUSTERING ORDER BY(activity_type ASC, datetime_hour_bucket DESC);
Then, the query you can add the field "datetime_hour_bucket" in the Group By clause:
SELECT datetime_hour_bucket, activity_type, SUM(activity_count) as count
FROM sensor_activity_stats
WHERE sensor_id=:sensorId
AND datetime_hour_bucket >= :fromDate AND datetime_hour_bucket < :untilDate
GROUP BY activity_type, datetime_hour_bucket;

Order Column Family with different id by date

I use the following CQL queries to create a table and write data, the problem is that the data in my table are not organized by date order.
I would like to have them organized by date without having to put the same id.
To create table :
CREATE TABLE IF NOT EXISTS sk1_000.data(id varchar, date_serveur timestamp ,nom_objet varchar, temperature double, etat boolean , PRIMARY KEY (id, date_serveur)) with clustering order by (date_serveur DESC);
To insert :
INSERT INTO sk1_000.data(id, date_serveur,nom_objet, temperature, etat) VALUES ('"+ uuid.v4() +"', '1501488930499','Raspberry_pi', 22.5, true) if not exists ;
Here is the output :
In Cassandra, the clustering key guarantees sort order for a given partition key and not across different partitioning key(s).
To achieve what you are looking for "sort by date across all keys", you will have to redesign the table to have date_serveur as partitioning key and id as clustering column. But guess what you can't directly query based on an id with this table design.

Apache Cassandra table not sorting by name or title correctly

I have the following Apache Cassandra Table working.
CREATE TABLE user_songs (
member_id int,
song_id int,
title text,
timestamp timeuuid,
album_id int,
album_title text,
artist_names set<text>,
PRIMARY KEY ((member_id, song_id), title)
) WITH CLUSTERING ORDER BY (title ASC)
CREATE INDEX user_songs_member_id_idx ON music.user_songs (member_id);
When I try to do a select * FROM user_songs WHERE member_id = 1; I thought the Clustering Order by title would have given me a sorted ASC of the return - but it doesn't
Two questions:
Is there something with the table in terms of ordering or PK?
Do I need more tables for my needs in order to have a sorted title by member_id.
Note - my Cassandra queries for this table are:
Find all songs with member_id
Remove a song from memeber_id given song_id
Hence why the PK is composite
UPDATE
It is simialr to: Query results not ordered despite WITH CLUSTERING ORDER BY
However one of the suggestion in the comments is to put member_id,song_id,title as primary instead of the composite that I currently have. When I do that It seems that I cannot delete with only song_id and member_id which is the data that I get for deleting (hence title is missing when deleting)

Order in Limited query with composite keys on cassandra

In the following scenario:
CREATE TABLE temperature_by_day (
weatherstation_id text,
date text,
event_time timestamp,
temperature text,
PRIMARY KEY ((weatherstation_id,date),event_time)
)WITH CLUSTERING ORDER BY (event_time DESC);
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-03','2013-04-03 07:01:00','72F');
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-03','2013-04-03 08:01:00','74F');
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-04','2013-04-04 07:01:00','73F');
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-04','2013-04-04 08:01:00','76F');
If I do the following query:
SELECT *
FROM temperature_by_day
WHERE weatherstation_id='1234ABCD'
AND date in ('2013-04-04', '2013-04-03') limit 2;
I realized that the result of cassandra is ordered by the same sequence of patkeys in clausa IN. In this case, I'd like to know if the expected result is ALWAYS the two records of the day '2013-04-04'? Ie Cassadra respects the order of the IN clause in the ordering of the result even in a scenario with multiple nodes?

Resources