Cassandra : Select records based on "timeuuid where conditions"

Cassandra : Select records based on "timeuuid where conditions" - cassandra

I created one table in Cassandra and want to select data based on where condition of the column which has timeuuid type.
CREATE TABLE shahid.stock_ticks(
symbol varchar,
date int,
trade timeuuid,
trade_details text,
PRIMARY KEY ( (symbol, date), trade )
) WITH CLUSTERING ORDER BY (trade DESC) ;
INSERT INTO shahid.stock_ticks (symbol, date, trade, trade_details) VALUES ('NFLX', 1, now(), 'this is 10' );
INSERT INTO shahid.stock_ticks (symbol, date, trade, trade_details) VALUES ('NFLX', 1, now(), 'this is 2' );
INSERT INTO shahid.stock_ticks (symbol, date, trade, trade_details) VALUES ('NFLX', 1, now(), 'this is 3' );
Above query has inserted records and one record has value '2045d660-9415-11e5-9742-c53da2f1a8ec' in trade column.
I want to select like this but it is giving error
select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < '2045d660-9415-11e5-9742-c53da2f1a8ec';
It is giving below error
InvalidQueryException: Invalid STRING constant (2045d660-9415-11e5-9742-c53da2f1a8ec) for "trade" of type timeuuid
I tried below queries also with no luck
select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < maxTimeuuid('2045d660-9415-11e5-9742-c53da2f1a8ec');
select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < dateOf('2045d660-9415-11e5-9742-c53da2f1a8ec');
select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < unixTimestampOf('2045d660-9415-11e5-9742-c53da2f1a8ec');

Remove the quotes around your UUID. Cassandra has native support for them, not via Strings.
select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < 2045d660-9415-11e5-9742-c53da2f1a8ec;

Related

Search using Secondary Index and range on Timeuuid column changes the order of range search

My table schema is as follows :-
create table test (
devicename text,
date text,
id timeuuid,
code text,
flag int,
primary key (devicename, date, id)
) with clustering order by ( date desc , id desc )
CREATE CUSTOM INDEX code_idx ON test (code) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mode': 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'case_sensitive': 'false' };
Now my select Query is :-
select * from test where devicename = 'ABC'
and id > minTimeuuid('2017-08-25 00:00:00') and id < maxTimeuuid('2017-08-25 23:59:59') allow filtering;
It gives result. Now if I want to use Secondary Index , the order of search reverses :-
Select * from test where devicename = 'ABC'
and id > minTimeuuid('2017-08-25 00:00:00') and id < maxTimeuuid('2017-08-25 23:59:59')
and code like '%25%' allow filtering;
The above code gives 0 result whereas when I change the order of timeuuid column in search , it gives correct result like this :-
Select * from test where devicename = 'ABC'
and id < minTimeuuid('2017-08-25 00:00:00') and id > maxTimeuuid('2017-08-25 23:59:59')
and code like '%25%' allow filtering;
My table schema is still unchanged i.e. it shows id is in descending order , then why the search order changes when I'am using secondary Index column in select . Please suggest where I'am going wrong.
Thanks,

Group By using date, name and amount in Cassandra

I'm new in using Cassandra and I can't use the Group By, is there a way that I can use the GROUP BY in Cassandra like in SQL? I want to group my data by date and also by the name of the user, and I want to sum all the amount in a specific date. I still don't have a code for this because I don't know how to start and I also aware that the group by is not supported by cassandra

You can't use group by without materialized view
But if you want to find the sum of amount for a specific date and name you can get easily.
Using Apache Cassandra 3.x
1.Create a table
CREATE TABLE data (
date bigint,
name text,
amount double,
PRIMARY KEY (date, name, amount)
);
2.Insert dummy Some data
INSERT INTO data (date , name , amount) VALUES ( 1, 'a1', 10);
INSERT INTO data (date , name , amount) VALUES ( 1, 'a1', 20);
INSERT INTO data (date , name , amount) VALUES ( 1, 'a1', 30);
INSERT INTO data (date , name , amount) VALUES ( 1, 'a1', 40);
INSERT INTO data (date , name , amount) VALUES ( 1, 'a2', 50);
INSERT INTO data (date , name , amount) VALUES ( 1, 'a2', 60);
3.Now you can find the sum of amount in a specific date and name
SELECT sum(amount) FROM data WHERE date = 1 AND name = 'a1' ;

SASI indexes on year and month

I am new to SASI indexes in Cassandra and I am unclear how they index when multiple columns are included in the "where" predicate that are indexed.
Here is one option I am looking at:
Option 1:
CREATE TABLE IF NOT EXISTS my_timeseries_data (
id text,
event_time timestamp,
value text,
year int,
month int,
PRIMARY KEY (id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
CREATE CUSTOM INDEX year_idx ON my_timeseries_data (year)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mode': 'SPARSE' };
CREATE CUSTOM INDEX month_idx ON my_timeseries_data (month)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = { 'mode': 'SPARSE' };
I expect to query like this sometimes:
select * from my_timeseries_data
where year = 2016 and month = 1 ALLOW FILTERING;
Does the SASI index on 'month' column help my performance?
Option 2:
Would it be better to index a concatenated column like 'year_and_month' below?
CREATE TABLE IF NOT EXISTS my_timeseries_data (
id text,
event_time timestamp,
value text,
year_and_month text,
PRIMARY KEY (id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
CREATE CUSTOM INDEX year_idx ON my_timeseries_data (year_and_month)
USING 'org.apache.cassandra.index.sasi.SASIIndex';
And then query like this on a single SASI index:
select * from my_timeseries_data
where year_and_month = '2016_1';
Option 3:
NO need for extra month and year columns and SASI indexes because having 'event_time' as a CLUSTERING COLUMN allows scalable time-range queries that I want to do anway?

Query min partition key based on date range (clustering key)

I have a table Foo in cassandra with 4 columns foo_id bigint, date datetime, ref_id bigint, type int
here the partitioning key is foo_id. the clustering keys are date desc, ref_id and type
I want to write a CSQL query which is the equivalent of the SQL below
select min(foo_id) from foo where date >= '2016-04-01 00:00:00+0000'
I wrote the following CSQL
select foo_id from foo where
foo_id IN (-9223372036854775808, 9223372036854775807)
and date >= '2016-04-01 00:00:00+0000';
but this returns empty results.
Then I tried
select foo_id from foo where
token(foo_id) > -9223372036854775808
and token(foo_id) < 9223372036854775807
and date >= '2016-04-01 00:00:00+0000';
but this results in error
Unable to execute CSQL Script on 'Cassandra'. Cannot execute this query
as it might involve data filtering and thus may have unpredictable
performance. If you want to execute this query despite performance
unpredictability, use ALLOW FILTERING.
I don't want to use ALLOW FILTERING. but I want the minimum of foo_id at the start of the specified date.

You should probably denormalize your data and create a new table for the purpose. I propose something like:
CREATE TABLE foo_reverse (
year int,
month int,
day int,
foo_id bigint,
date datetime,
ref_id bigint,
type int,
PRIMARY KEY ((year, month, day), foo_id)
)
To get the minimum foo_id you would query that table by something like:
SELECT * FROM foo_reverse WHERE year = 2016 AND month = 4 AND day = 1 LIMIT 1;
That table would allow you to query on a "per day" basis. You can change the partition key to better reflect your needs. Beware of the potential hot spots you (and I) could create by selecting an appropriate time range.

Cassandra event storage

Is there a best way to store data in a Cassandra database if I will want to search the data in these 2 ways:
1) The last 20 "error" event_types for user_id "123"
2) All "login" event_types in the past day
Would this work:
CREATE TABLE events (
user_id text,
event_type text,
data text,
timestamp timestamp,
PRIMARY KEY (event_type, timestamp, userid) );

You will need to create two tables for this (at least in version 2.x).
From version 3.5 onward you can use SASI.
1) The last 20 "error" event_types for user_id "123"
CREATE TABLE events (
user_id text,
event_type text,
data text,
timestamp timestamp DESC,
PRIMARY KEY ((userid,event_type), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Now you can get the data by the following query.
select * from events where user_id = '123' and event_type = 'error' limit 20
2) All "login" event_types in the past day
CREATE TABLE events_by_type (
user_id text,
event_type text,
data text,
timestamp timestamp DESC,
PRIMARY KEY (event_type, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Now you can get the data by the following query.
select * from events where event_type = 'login' and timestamp > ddmmyyyy

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra : Select records based on "timeuuid where conditions" - cassandra

Remove the quotes around your UUID. Cassandra has native support for them, not via Strings. select * from shahid.stock_ticks where symbol = 'NFLX' and date = 1 and trade < 2045d660-9415-11e5-9742-c53da2f1a8ec;

Related

Search using Secondary Index and range on Timeuuid column changes the order of range search

Group By using date, name and amount in Cassandra

SASI indexes on year and month

Query min partition key based on date range (clustering key)

Cassandra event storage

Categories

Resources