Order Column Family with different id by date - cassandra

I use the following CQL queries to create a table and write data, the problem is that the data in my table are not organized by date order.
I would like to have them organized by date without having to put the same id.
To create table :
CREATE TABLE IF NOT EXISTS sk1_000.data(id varchar, date_serveur timestamp ,nom_objet varchar, temperature double, etat boolean , PRIMARY KEY (id, date_serveur)) with clustering order by (date_serveur DESC);
To insert :
INSERT INTO sk1_000.data(id, date_serveur,nom_objet, temperature, etat) VALUES ('"+ uuid.v4() +"', '1501488930499','Raspberry_pi', 22.5, true) if not exists ;
Here is the output :

In Cassandra, the clustering key guarantees sort order for a given partition key and not across different partitioning key(s).
To achieve what you are looking for "sort by date across all keys", you will have to redesign the table to have date_serveur as partitioning key and id as clustering column. But guess what you can't directly query based on an id with this table design.

Related

How to sum up cassandra counter grouping by only one column in the primary key set?

I am trying to keep track of the amount of events of each type that occured in one-hour buckets of time, and then sum the counts per category in arbitrary time ranges. So, I create a table like this:
CREATE TABLE IF NOT EXISTS sensor_activity_stats(
sensor_id text,
datetime_hour_bucket timestamp,
activity_type text,
activity_count counter,
PRIMARY KEY ((sensor_id), datetime_hour_bucket, activity_type)
)
WITH CLUSTERING ORDER BY(datetime_hour_bucket DESC, activity_type ASC);
I would like to be able to achieve this kind of query:
SELECT datetime_hour_bucket, activity_type, SUM(activity_count) as count
FROM sensor_activity_stats
WHERE sensor_id=:sensorId
AND datetime_hour_bucket >= :fromDate AND datetime_hour_bucket < :untilDate
GROUP BY activity_type
Cassandra complains about because grouping must be done in the order of the primary key columns. And, if I change the order I won't be able to query by a range over any activity_type.
Some notes:
I am grouping by hours because some users could ask me to show the data in different timezones and I want to be able to perform a decent conversion.
The activity_type has low cardinality, however I can not be sure I'll always be able to predict it's possible values.
Right now my solution was to query the whole data in the range and perform the aggregation myself in code. Have you have faced similar situation and what was your solution? Would you suggest a different way of querying or arranging the data?
I hope you've found the solution of your problem, however I have a way to you try.
First, you can chage the create table to change the order of fields:
CREATE TABLE IF NOT EXISTS sensor_activity_stats(
sensor_id text,
datetime_hour_bucket timestamp,
activity_type text,
activity_count counter,
PRIMARY KEY (activity_type, sensor_id, datetime_hour_bucket, activity_count)
)
WITH CLUSTERING ORDER BY(activity_type ASC, datetime_hour_bucket DESC);
Then, the query you can add the field "datetime_hour_bucket" in the Group By clause:
SELECT datetime_hour_bucket, activity_type, SUM(activity_count) as count
FROM sensor_activity_stats
WHERE sensor_id=:sensorId
AND datetime_hour_bucket >= :fromDate AND datetime_hour_bucket < :untilDate
GROUP BY activity_type, datetime_hour_bucket;

Cassandra order by on combination of composite keys

I originally wrote a table that tracks feeds that have been assigned to a user for review.
create table user_feed
{
userid uuid,
languageid uuid,
topicid_uuid,
dateinserted timeuuid,
primary key (userid, languageid, topicid, dateinserted)
};
I realized soon after I created this table that I wouldn't be able to sort this table (order by DESC) by dateinserted because for some weird reason, in Cassandra I can only order by the second (and last) column of a composite key table (as in, the table has to have 2 composite keys and order by can only happen on the second column of this key) so I changed my table to this:
create table user_feed
{
userid uuid,
languageid uuid,
topicid_uuid,
dateinserted timeuuid,
primary key (userid, dateinserted)
};
and now I was able to run a query to get the latest feeds for the user, using order by.
However, I have a new requirement that requires me to sort the feeds by a combination of (languageid + userid) or (topicid + userid) or (languageid + topicid + userid).
I had an idea to create three new tables and have the keys combined into one key column. For example, for userid + topic query, I would use:
create table user_feed_by_topic
{
usertopicidkey text,
dateinserted timeuuid,
primary key (usertopicidkey, dateinserted)
};
where usertopididkey = userid.toString() + topicid.toString().
Of course, this solution requires 4 separate inserts whenever I need to insert a new feed row since I have 4 rows, tracking identical data but partitioned differently to allow sorting.
My question is, is there a better way to do this? Is there any way to achieve what I want (query by a combination of columns and order by another column) or am I stuck with my 4 table design approach?
Many thanks,
Cassandra will order all rows based on the PKs clustering columns. In case your PK is primary key (userid, languageid, topicid, dateinserted) all rows will be sorted by languageid, topicid and dateinserted in ascending order. This implies that all rows will only be sorted within a specific language and topic by date. You'd have to use the date as the first clustering key column to change this behaviour.
Its common practice to denormalize your data across multiple tables to implement different ordering strategies.

ORDER BY reloaded, cassandra

A given column family I would like to sort and to this I am trying to create a table with the option CLUSTERING ORDER BY. I always encounter the following errors:
1.) Variant A resulting in
Bad Request: Missing CLUSTERING ORDER for column userid
Statement:
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc);
2.) Variant B resulting in
Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directive
Statement:
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc, userID asc);
As far as I can see in the manual this is the correct syntax for creating a table for which I would like to run queries as "SELECT .... FROM user WHERE ... ORDER BY lastname". How could I achieve this? (The column 'lastname' I would like to keep as the first part of the primary key, so that I could use it in delete statements with the WHERE-clause.)
Thanks a lot, Tamas
You can only specify clustering order on your clustering keys.
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc);
In your first example, your only clustering key is userID. Thus, it is the only valid entry for CLUSTERING ORDER BY.
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc, userID asc);
The second example fails because you are specifying your partition key in CLUSTERING ORDER BY, and that's not going to work either.
Cassandra works by ordering CQL rows according to clustering keys, but only when a partition key is specified. This is because the whole idea of Cassandra wide-row modeling is to query by partition key, and read a series of ordered rows in one query operation.
I would like to run queries as "SELECT .... FROM user WHERE ... ORDER BY lastname".
Given this statement, I am going to suggest that you need another column in this model before it will work the way you want. What you need is an appropriate partition key for your users table. Say...like group. With your users partitioned by group, and clustered by lastname, your definition would look something like this:
CREATE TABLE test.usersbygroup (
userID timeuuid,
firstname varchar,
lastname varchar,
group text,
PRIMARY KEY (group,lastname)
)WITH CLUSTERING ORDER BY (lastname desc);
Then, this query will work, returning users (in this case) who are fans of the show "Firefly," ordered by lastname (descending):
SELECT * FROM usersbygroup WHERE group='Firefly Fans';
Read through this DataStax doc on Compound Keys and Clustering to get a better understanding.
NOTE: You don't need to specify ORDER BY in your SELECT. The rows will come back ordered by their clustering key(s), and ORDER BY cannot change that. All ORDER BY can really do, is alter the sort direction (DESCending vs. ASCending).
Clustering would be limited to whats defined in partitioning key, in your case (lastName + userId). So cassandra would store result in sorted order whose (lastName+userId) combination. Thats why u nned to give both for retrieval purpose. Its still not useful schema if you want to sort all data in table as last name as userId is unique(timeuuid) so clustering key would be of no use.
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
bucket int,
PRIMARY KEY (bucket)
)WITH CLUSTERING ORDER BY (lastname desc);
Here if u provide buket value say 1 for all user records then , all user would go in same bucket and hense it would retrieve all rows in sorted order of last name. (By no mean this is a good design, just to give you an idea).
Revised :
CREATE TABLE user1 (
userID uuid,
firstname varchar,
lastname varchar,
bucket int,
PRIMARY KEY ((bucket), lastname,userID)
)WITH CLUSTERING ORDER BY (lastname desc);

Order in Limited query with composite keys on cassandra

In the following scenario:
CREATE TABLE temperature_by_day (
weatherstation_id text,
date text,
event_time timestamp,
temperature text,
PRIMARY KEY ((weatherstation_id,date),event_time)
)WITH CLUSTERING ORDER BY (event_time DESC);
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-03','2013-04-03 07:01:00','72F');
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-03','2013-04-03 08:01:00','74F');
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-04','2013-04-04 07:01:00','73F');
INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature)
VALUES ('1234ABCD','2013-04-04','2013-04-04 08:01:00','76F');
If I do the following query:
SELECT *
FROM temperature_by_day
WHERE weatherstation_id='1234ABCD'
AND date in ('2013-04-04', '2013-04-03') limit 2;
I realized that the result of cassandra is ordered by the same sequence of patkeys in clausa IN. In this case, I'd like to know if the expected result is ALWAYS the two records of the day '2013-04-04'? Ie Cassadra respects the order of the IN clause in the ordering of the result even in a scenario with multiple nodes?

Cassandra range slicing on composite key

I have columnfamily with composite key like this
CREATE TABLE sometable(
keya varchar,
keyb varchar,
keyc varchar,
keyd varchar,
value int,
date timestamp,
PRIMARY KEY (keya,keyb,keyc,keyd,date)
);
What I need to do is to
SELECT * FROM sometable
WHERE
keya = 'abc' AND
keyb = 'def' AND
date < '2014-01-01'
And that is giving me this error
Bad Request: PRIMARY KEY part date cannot be restricted (preceding part keyd is either not restricted or by a non-EQ relation)
What's the best way to solve this? Do I need to alter my columnfamily?
I also need to query those table with all keya, keyb, keyc, and date.
You cannot do it in cassandra. Moreover, such a range slicing is costlier too. You are trying to slice through a set of equalities that have the lower priority according to your schema.
I also need to query those table with all keya, keyb, keyc, and date.
If you are considering to solve this problem, considering having this schema. What i would suggest is to have the keys in a separate schema
create table (
timeuuid id,
keyType text,
primary key (timeuuid,keyType))
Use the timeuuid to store the values and do a range scan based on that.
create table(
timeuuid prevTableId,
value int,
date timestamp,
primary key(prevTableId,date))
Guess , in this way, your table is normalized for better scalability in your use case and may save a lot of disk space if keys are repetitive too.

Resources