Group By using date, name and amount in Cassandra - cassandra

I'm new in using Cassandra and I can't use the Group By, is there a way that I can use the GROUP BY in Cassandra like in SQL? I want to group my data by date and also by the name of the user, and I want to sum all the amount in a specific date. I still don't have a code for this because I don't know how to start and I also aware that the group by is not supported by cassandra

You can't use group by without materialized view
But if you want to find the sum of amount for a specific date and name you can get easily.
Using Apache Cassandra 3.x
1.Create a table
CREATE TABLE data (
date bigint,
name text,
amount double,
PRIMARY KEY (date, name, amount)
);
2.Insert dummy Some data
INSERT INTO data (date , name , amount) VALUES ( 1, 'a1', 10);
INSERT INTO data (date , name , amount) VALUES ( 1, 'a1', 20);
INSERT INTO data (date , name , amount) VALUES ( 1, 'a1', 30);
INSERT INTO data (date , name , amount) VALUES ( 1, 'a1', 40);
INSERT INTO data (date , name , amount) VALUES ( 1, 'a2', 50);
INSERT INTO data (date , name , amount) VALUES ( 1, 'a2', 60);
3.Now you can find the sum of amount in a specific date and name
SELECT sum(amount) FROM data WHERE date = 1 AND name = 'a1' ;

Related

How do I group by date in Cassandra?

I'm trying to find a query in Cassandra cql to group by date. I have "date" datatype where the date is like: "mm-dd-yyyy". I'm just trying to extract the year and then group by. How to achieve that?
SELECT sum(amount) FROM data WHERE date = 'yyyy'
You cannot do a partial filter with just the year on a column of type date. It is an invalid query in Cassandra.
The CQL date type is encoded as a 32-bit integer that represents the days since epoch (Jan 1, 1970).
If you need to filter based on year the you will need to add a column to your table like in this example:
CREATE TABLE movies (
movie_title text,
release_year int,
...
PRIMARY KEY ((movie_title, release_year))
)
Here's an example for retrieving information about a movie:
SELECT ... FROM movies WHERE movie_title = ? AND release_year = ?

viewing as list in cassandra

Table
CREATE TABLE vehicle_details (
owner_name text,
vehicle list<text>,
price float,
vehicle_type text,
PRIMARY KEY(price , vehicle_type)
)
I have two issues over here
I am trying to view the list of the vehicle per user. If owner1 has 2 cars then it should show as owner_name1 vehicle1 & owner_name1 vehicle2. is it possible to do with a select query?
The output I am expecting
owner_name_1 | vehicle_1
owner_name_1 | vehicle_2
owner_name_2 | vehicle_1
owner_name_2 | vehicle_2
owner_name_2 | vehicle_3
I am trying to use owner_name in the primary key but whenever I use WHERE or DISTINCT or ORDER BY it does not work properly. I am going to query price, vehicle_type most of the time. but Owner_name would be unique hence I am trying to use it. I tried several combinations.
Below are three combinations I tried.
PRIMARY KEY(owner_name, price, vehicle_type) WITH CLUSTERING ORDER BY (price)
PRIMARY KEY((owner_name, price), vehicle_type)
PRIMARY KEY((owner_name, vehicle_type), price) WITH CLUSTERING ORDER BY (price)
Queries I am running
SELECT owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV';
SELECT Owner_name, vprice, vehicle_type from vehicle_details WHERE vehicle_type='SUV' ORDER BY price desc;
Since your table has:
PRIMARY KEY(price , vehicle_type)
you can only run queries with filters on the partition key (price) or the partition key + clustering column (price + vehicle_type):
SELECT ... FROM ... WHERE price = ?
SELECT ... FROM ... WHERE price = ? AND vehicle_type = ?
If you want to be able to query by owner name, you need to create a new table which is partitioned by owner_name. I also recommend not storing the vehicle in a collection:
CREATE TABLE vehicles_by_owner
owner_name text,
vehicle text,
...
PRIMARY KEY (owner_name, vehicle)
)
By using vehicle as a clustering column, each owner will have rows of vehicles in the table. Cheers!

Storing time specific data in cassandra

I am looking for a good way to store time specific data in cassandra.
Each entry can look like (start_time, value). Later, I would like to retrieve the current value.
Logic of retrieving current value is like following.
Find all rows with start_time<=current_time.
Then find the value with maximum start_time from the rows obtained in the first step.
PS:- Edited the question to make it more clear
The exact requirements are not possible. But we can get close to it with one more column.
First, to be able to use <= operator, your start_time column need to be the clustering key of your table.
Then, you need a different partition key. You could choose a fixed value but it could bring problems when the partition will have too many rows. Then you should better use something like the year or the month of the start_time.
CREATE TABLE time_specific_table (
year bigint,
start_time timestamp,
value text,
PRIMARY KEY((year), start_time)
) WITH CLUSTERING ORDER BY (start_time DESC);
The problem is that when you will query the table, you will need to know the value of the partition key :
Find all rows with start_time<=current_time
SELECT * FROM time_specific_table
WHERE year = :year AND start_time <= :time;
select the value with maximum start_time
SELECT * FROM time_specific_table
WHERE year = :year LIMIT 1;
Create two separate table like below :
CREATE TABLE data (
start_time timestamp,
value int,
PRIMARY KEY(start_time, value)
);
CREATE TABLE current_value (
partition int PRIMARY KEY,
value int
);
Now you have to insert data into both table, to insert data into second table use a static value like 1
INSERT INTO current_value(partition, value) VALUES(1, 10);
Now In current value table your data will be upsert and You will get latest value whenever you select.

SASI indexes on year and month

I am new to SASI indexes in Cassandra and I am unclear how they index when multiple columns are included in the "where" predicate that are indexed.
Here is one option I am looking at:
Option 1:
CREATE TABLE IF NOT EXISTS my_timeseries_data (
id text,
event_time timestamp,
value text,
year int,
month int,
PRIMARY KEY (id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
CREATE CUSTOM INDEX year_idx ON my_timeseries_data (year)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mode': 'SPARSE' };
CREATE CUSTOM INDEX month_idx ON my_timeseries_data (month)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mode': 'SPARSE' };
I expect to query like this sometimes:
select * from my_timeseries_data
where year = 2016 and month = 1 ALLOW FILTERING;
Does the SASI index on 'month' column help my performance?
Option 2:
Would it be better to index a concatenated column like 'year_and_month' below?
CREATE TABLE IF NOT EXISTS my_timeseries_data (
id text,
event_time timestamp,
value text,
year_and_month text,
PRIMARY KEY (id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
CREATE CUSTOM INDEX year_idx ON my_timeseries_data (year_and_month)
USING 'org.apache.cassandra.index.sasi.SASIIndex';
And then query like this on a single SASI index:
select * from my_timeseries_data
where year_and_month = '2016_1';
Option 3:
NO need for extra month and year columns and SASI indexes because having 'event_time' as a CLUSTERING COLUMN allows scalable time-range queries that I want to do anway?

Cassandra : Select records based on "timeuuid where conditions"

I created one table in Cassandra and want to select data based on where condition of the column which has timeuuid type.
CREATE TABLE shahid.stock_ticks(
symbol varchar,
date int,
trade timeuuid,
trade_details text,
PRIMARY KEY ( (symbol, date), trade )
) WITH CLUSTERING ORDER BY (trade DESC) ;
INSERT INTO shahid.stock_ticks (symbol, date, trade, trade_details) VALUES ('NFLX', 1, now(), 'this is 10' );
INSERT INTO shahid.stock_ticks (symbol, date, trade, trade_details) VALUES ('NFLX', 1, now(), 'this is 2' );
INSERT INTO shahid.stock_ticks (symbol, date, trade, trade_details) VALUES ('NFLX', 1, now(), 'this is 3' );
Above query has inserted records and one record has value '2045d660-9415-11e5-9742-c53da2f1a8ec' in trade column.
I want to select like this but it is giving error
select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < '2045d660-9415-11e5-9742-c53da2f1a8ec';
It is giving below error
InvalidQueryException: Invalid STRING constant (2045d660-9415-11e5-9742-c53da2f1a8ec) for "trade" of type timeuuid
I tried below queries also with no luck
select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < maxTimeuuid('2045d660-9415-11e5-9742-c53da2f1a8ec');
select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < dateOf('2045d660-9415-11e5-9742-c53da2f1a8ec');
select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < unixTimestampOf('2045d660-9415-11e5-9742-c53da2f1a8ec');
Remove the quotes around your UUID. Cassandra has native support for them, not via Strings.
select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < 2045d660-9415-11e5-9742-c53da2f1a8ec;

Resources