How to find last and first entry in cassandra (date is part of partition key) - cassandra

Is it possible to find first and last entry in Cassandra database if my partition key contains text date as a part of partition key to avoid large partitions?
CREATE TABLE trades (
stockexchange text,
symbol text,
ts timestamp,
date text,
tid text,
price decimal,
side text,
size decimal,
PRIMARY KEY ((stockexchange, symbol, date), ts, tid)
) WITH CLUSTERING ORDER BY (ts ASC, tid ASC)

The one solution is - to create the second table and store separately.
only: stockexchange, symbol, timestamp
This gives you ability to find the first and last timestamp by your key (stockexchange:symbol)
Please pay attention, that you have to store the data in the same moment and Cassandra is not ACID database type.
CREATE TABLE trades (
stockexchange text,
symbol text,
ts timestamp,
date text,
tid text,
price decimal,
side text,
size decimal,
PRIMARY KEY ((stockexchange, symbol, date), ts, tid)
) WITH CLUSTERING ORDER BY (ts ASC, tid ASC)
CREATE TABLE trades_timestampts (
stockexchange text,
symbol text,
tid text,
ts timestamp,
PRIMARY KEY ((stockexchange, symbol), ts, tid)) WITH CLUSTERING ORDER BY (ts asc, tid asc);

Related

Cassandra Invalid Query: Some cluster keys are missing

I'm using Cassandra 3.0.
My table was created with this query, but when I try to insert data into the table, I get the error: 'Some cluster keys are missing: created'
Table Structure:
CREATE TABLE db.feed (
action_object_id int,
owner_id int,
created timeuuid,
action_object text,
action_object_type int,
actor text,
feed_type text,
target text,
target_type int,
verb text,
PRIMARY KEY (action_object_id, owner_id, created)
) WITH CLUSTERING ORDER BY (owner_id ASC, created ASC)
You must have to provide values for all the primary keys. action_object_id, owner_id, created must have to be mentioned in your insert query.
Ex: insert into db.feed(action_object_id, owner_id, created, ...) values (?,?,?,...). And you cannot provide NULL values for primary keys. created cannot be null.

Cassandra event storage

Is there a best way to store data in a Cassandra database if I will want to search the data in these 2 ways:
1) The last 20 "error" event_types for user_id "123"
2) All "login" event_types in the past day
Would this work:
CREATE TABLE events (
user_id text,
event_type text,
data text,
timestamp timestamp,
PRIMARY KEY (event_type, timestamp, userid) );
You will need to create two tables for this (at least in version 2.x).
From version 3.5 onward you can use SASI.
1) The last 20 "error" event_types for user_id "123"
CREATE TABLE events (
user_id text,
event_type text,
data text,
timestamp timestamp DESC,
PRIMARY KEY ((userid,event_type), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Now you can get the data by the following query.
select * from events where user_id = '123' and event_type = 'error' limit 20
2) All "login" event_types in the past day
CREATE TABLE events_by_type (
user_id text,
event_type text,
data text,
timestamp timestamp DESC,
PRIMARY KEY (event_type, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Now you can get the data by the following query.
select * from events where event_type = 'login' and timestamp > ddmmyyyy

Columns ordering in Cassandra

When I create a table in CQL, is it necessary to be exact for the order of column that are NOT in the primary_key and NOT clustering columns :
CREATE TABLE user (
a ascii,
b ascii,
c ascii,
PRIMARY KEY (a)
);
Is it equivalent to ?
CREATE TABLE user (
a ascii,
c ascii, <-- switched
b ascii, <-- switched
PRIMARY KEY (a)
);
Thank you for your help
Both of those statements will fail, because of:
The extra comma.
You have not provided a primary key definition.
Assuming you had those fixed, then the answer is still "yes they are the same."
Cassandra applies its own order to your columns at table creation time. Consider this table as I have typed it:
CREATE TABLE testorder (
acolumn text,
jcolumn text,
dcolumn text,
bcolumn text,
apkey text,
bpkey text,
ackey text,
bckey text,
PRIMARY KEY ((bpkey,apkey),bckey,ackey));
After creating it, I'll describe the table so you can see the order that Cassandra has applied to the columns.
aploetz#cqlsh:stackoverflow> desc table testorder ;
CREATE TABLE stackoverflow.testorder (
bpkey text,
apkey text,
bckey text,
ackey text,
acolumn text,
bcolumn text,
dcolumn text,
jcolumn text,
PRIMARY KEY ((bpkey, apkey), bckey, ackey)
) WITH CLUSTERING ORDER BY (bckey ASC, ackey ASC)
Essentially, Cassandra will order the partition keys and the clustering keys (ordered by their precedence in the PRIMARY KEY definition), and then the columns follow in ascending order.

Cassandra aggregation to second table

i have a cassandra table like:
CREATE TABLE sensor_data (
sensor VARCHAR,
timestamp timestamp,
value float,
PRIMARY KEY ((sensor), timestamp)
)
And and aggregation table.
CREATE TABLE sensor_data_aggregated (
sensor VARCHAR,
aggregation VARCHAR /* hour or day */
timestamp timestamp,
aggragation
min_timestamp timestamp,
min_value float,
max_timestamp timestamp,
max_value float,
avg_value float,
PRIMARY KEY ((sensor, aggregation), timestamp)
)
Is there a possibility of any trigger, to fill the "sensor_data_aggregated" table automaticly on insert, update, delete or "sensor_data" table?
My current solution whould be to write an custom trigger, with second commit log.
And an application that read and truncate this log peridicly to generate the aggregated data.
But i also found information that the datastax ops center can do this but no instruction how to do that?
What will be the best solution how to to this?
You can implement your own C* trigger for that, which will execute additional queries for your aggregation table after each row insert into sensor_data.
Also, for maintaining min/max values you can use CAS and C* lightweight transactions like
update sensor_data_aggregated
set min_value=123
where
sensor='foo'
and aggregation='bar'
and ts='2015-01-01 00:00:00'
if min_value>123;
using a bit updated schema ('timestamp' is a reserved keyword in cql3, you cannot use it unescaped):
CREATE TABLE sensor_data_aggregated (
sensor text,
aggregation text,
ts timestamp,
min_timestamp timestamp,
min_value float,
max_timestamp timestamp,
max_value float,
avg_value float,
PRIMARY KEY ((sensor, aggregation), ts)
)

Non-EQ relation error Cassandra - how fix primary key?

I created a one table posts. When I make request SELECT:
return $this->db->query('SELECT * FROM "posts" WHERE "id" IN(:id) LIMIT '.$this->limit_per_page, ['id' => $id]);
I get error:
PRIMARY KEY column "id" cannot be restricted (preceding column
"post_at" is either not restricted or by a non-EQ relation)
My table dump is:
CREATE TABLE posts (
id uuid,
post_at timestamp,
user_id bigint,
name text,
category set<text>,
link varchar,
image set<varchar>,
video set<varchar>,
content map<text, text>,
private boolean,
PRIMARY KEY (user_id,post_at,id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
I read some article about PRIMARY AND CLUSTER KEYS, and understood, when there are some primary keys - I need use operator = with IN. In my case, i can not use a one PRIMARY KEY. What you advise me to change in table structure, that error will disappear?
My dummy table structure
CREATE TABLE posts (
id timeuuid,
post_at timestamp,
user_id bigint,
PRIMARY KEY (id,post_at,user_id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
And after inserting some dummy data
I ran query select * from posts where id in (timeuuid1,timeuuid2,timeuuid3);
I was using cassandra 2.0 with cql 3.0

Resources