Cassandra event storage - cassandra

Is there a best way to store data in a Cassandra database if I will want to search the data in these 2 ways:
1) The last 20 "error" event_types for user_id "123"
2) All "login" event_types in the past day
Would this work:
CREATE TABLE events (
user_id text,
event_type text,
data text,
timestamp timestamp,
PRIMARY KEY (event_type, timestamp, userid) );

You will need to create two tables for this (at least in version 2.x).
From version 3.5 onward you can use SASI.
1) The last 20 "error" event_types for user_id "123"
CREATE TABLE events (
user_id text,
event_type text,
data text,
timestamp timestamp DESC,
PRIMARY KEY ((userid,event_type), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Now you can get the data by the following query.
select * from events where user_id = '123' and event_type = 'error' limit 20
2) All "login" event_types in the past day
CREATE TABLE events_by_type (
user_id text,
event_type text,
data text,
timestamp timestamp DESC,
PRIMARY KEY (event_type, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Now you can get the data by the following query.
select * from events where event_type = 'login' and timestamp > ddmmyyyy

Related

How to find last and first entry in cassandra (date is part of partition key)

Is it possible to find first and last entry in Cassandra database if my partition key contains text date as a part of partition key to avoid large partitions?
CREATE TABLE trades (
stockexchange text,
symbol text,
ts timestamp,
date text,
tid text,
price decimal,
side text,
size decimal,
PRIMARY KEY ((stockexchange, symbol, date), ts, tid)
) WITH CLUSTERING ORDER BY (ts ASC, tid ASC)
The one solution is - to create the second table and store separately.
only: stockexchange, symbol, timestamp
This gives you ability to find the first and last timestamp by your key (stockexchange:symbol)
Please pay attention, that you have to store the data in the same moment and Cassandra is not ACID database type.
CREATE TABLE trades (
stockexchange text,
symbol text,
ts timestamp,
date text,
tid text,
price decimal,
side text,
size decimal,
PRIMARY KEY ((stockexchange, symbol, date), ts, tid)
) WITH CLUSTERING ORDER BY (ts ASC, tid ASC)
CREATE TABLE trades_timestampts (
stockexchange text,
symbol text,
tid text,
ts timestamp,
PRIMARY KEY ((stockexchange, symbol), ts, tid)) WITH CLUSTERING ORDER BY (ts asc, tid asc);

Cassandra - Schema for chat application

I am working on a chat application schema on Cassandra, would like to get some advice on how I can improve this further.
Here are my queries:
get user's room by user id ordered by last reply time
get messages by room id order by timestamp
get participants by room id
Here are my tables:
CREATE TABLE users(
user_id bigint,
nickname text,
email text,
PRIMARY KEY(user_id)
);
CREATE TABLE messages(
message_id timeuuid,
room_id timeuuid,
author_id bigint,
time_bucket int,
content text,
PRIMARY KEY((room_id, time_bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
CREATE TABLE rooms(
room_id timeuuid,
room_name text,
status text,
creator_id bigint,
PRIMARY KEY(room_id)
);
CREATE TABLE room_users(
room_id timeuuid,
user_id bigint,
last_reply_time timestamp,
PRIMARY KEY((room_id), user_id)
);
CREATE MATERIALIZED VIEW room_users_by_user_id AS
SELECT *
FROM room_users
WHERE room_id IS NOT NULL
AND user_id IS NOT NULL
AND last_reply_time IS NOT NULL
PRIMARY KEY ((user_id), last_reply_time, room_id)
WITH CLUSTERING ORDER BY (last_reply_time DESC);
I can get user's room by user id ordered by last reply time like so:
SELECT * FROM room_users_by_user_id WHERE user = 1;
I can get messages by room id like so:
SELECT * FROM messages WHERE room_id = 1;
I can get participants by room id like so:
SELECT * FROM room_users WHERE room_id = 1;
One of the downsides of this design is that when there are new message for room 1, I have to first get a list of user_id from room_users and update the last_reply_time for each row using IN statement.
If there are 100 users in the room, I will have to update 100 rows for each new message in the room. I understand that the write speed for Cassandra is exceptionally fast, but are there any more efficient ways to achieve the same result?
Thanks!

Cassandra data modeling for range queries using timestamp

I need to create a table with 4 columns:
timestamp BIGINT
name VARCHAR
value VARCHAR
value2 VARCHAR
I have 3 required queries:
SELECT *
FROM table
WHERE timestamp > xxx
AND timestamp < xxx;
SELECT *
FROM table
WHERE name = 'xxx';
SELECT *
FROM table
WHERE name = 'xxx'
AND timestamp > xxx
AND timestamp < xxx;
The result needs to be sorted by timestamp.
When I use:
CREATE TABLE table (
timestamp BIGINT,
name VARCHAR,
value VARCHAR,
value2 VARCHAR,
PRIMARY KEY (timestamp)
);
the result is never sorted.
When I use:
CREATE TABLE table (
timestamp BIGINT,
name VARCHAR,
value VARCHAR,
value2 VARCHAR,
PRIMARY KEY (name, timestamp)
);
the result is sorted by name > timestamp which is wrong.
name | timestamp
------------------------
a | 20170804142825729
a | 20170804142655569
a | 20170804142650546
a | 20170804142645516
a | 20170804142640515
a | 20170804142620454
b | 20170804143446311
b | 20170804143431287
b | 20170804143421277
b | 20170804142920802
b | 20170804142910787
How do I do this using Cassandra?
Cassandra order data by clustering key group by partition key
In your case first table have only partition key timestamp, no clustering key. So data will not be sorted.
And For the second table partition key is name and clustering key is timestamp. So your data will sorted by timestamp group by name. Means data will be first group by it's name then each group will be sorted separately by timestamp.
Edited
So you need to add a partition key like below :
CREATE TABLE table (
year BIGINT,
month BIGINT,
timestamp BIGINT,
name VARCHAR,
value VARCHAR,
value2 VARCHAR,
PRIMARY KEY ((year, month), timestamp)
);
here (year, month) is the composite partition key. You have to insert the year and month from the timestamp. So your data will be sorted by timestamp within a year and month

Am I using cassandra efficiently?

I have these table
CREATE TABLE user_info (
userId uuid PRIMARY KEY,
userName varchar,
fullName varchar,
sex varchar,
bizzCateg varchar,
userType varchar,
about text,
joined bigint,
contact text,
job set<text>,
blocked boolean,
emails set<text>,
websites set<text>,
professionTag set<text>,
location frozen<location>
);
create table publishMsg
(
rowKey uuid,
msgId timeuuid,
postedById uuid,
title text,
time bigint,
details text,
tags set<text>,
location frozen<location>,
blocked boolean,
anonymous boolean,
hasPhotos boolean,
esIndx boolean,
PRIMARY KEY(rowKey, msgId)
) with clustering order by (msgId desc);
create table publishMsg_by_user
(
rowKey uuid,
msgId timeuuid,
title text,
time bigint,
details text,
tags set<text>,
location frozen<location>,
blocked boolean,
anonymous boolean,
hasPhotos boolean,
PRIMARY KEY(rowKey, msgId)
) with clustering order by (msgId desc);
CREATE TABLE followers
(
rowKey UUID,
followedBy uuid,
time bigint,
PRIMARY KEY(rowKey, orderKey)
);
I doing 3 INSERT statement in BATCH to put data in publishMsg publishMsg_by_user followers table.
To show a single message I have to query three SELECT query on different table:
publishMsg - to get a publish message details where rowkey & msgId given.
userInfo - to get fullName based on postedById
followers - to know whether a postedById is following a given topic or not
Is this a fit way of using cassandra ? will that be efficient because the given scanerio data can't fit in single table.
Sorry to ask this in an answer but I don't have the rep to comment.
Ignoring the tables for now, what information does your application need to ask for? Ideally in Cassandra, you will only have to execute one query on one table to get the data you need to return to the client. You shouldn't need to have to execute 3 queries to get what you want.
Also, your followers table appears to be missing the orderkey field.

getting result in the same order it was added in the table

I have this table
CREATE TABLE tag_by_user (
userId uuid,
tagId uuid,
colId timeuuid,
tagLabel text,
PRIMARY KEY (userId, tagId,colId)
);
here is my data
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,b0b328fa-0f96-11e5-a6c0-1697f925ec7b,now(),'html');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,b0b330d4-0f96-11e5-a6c0-1697f925ec7b,now(),'java');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,c0f22450-0f96-11e5-a6c0-1697f925ec7b,now(),'javascript');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,c0f226b2-0f96-11e5-a6c0-1697f925ec7b,now(),'scala pro');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,c0f22ab8-0f96-11e5-a6c0-1697f925ec7b,now(),'c++');
Now i want to get the tags of a given user in same order it was added to the row (i.e in the ascending order of time when it was added and here that one is colId)
cqlsh:ks_demo> select taglabel from tag_by_user where userid= 77c4d46c-0f96-11e5-a6c0-1697f925ec7b order by colid;
it gives this error
Bad Request: Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY
What changes i will have to in schema or in query cqlsh 4.1.1 | Cassandra 2.0.8
You need to leave only userId and colId in the PRIMARY KEY:
CREATE TABLE tag_by_user (
userId uuid,
colId timeuuid,
tagId uuid,
tagLabel text,
PRIMARY KEY (userId, colId)
);
And then use
SELECT * FROM tag_by_user WHERE userId={yourUserId}
to get the tags of a given user in ascending order of time.
If you need to avoid duplicate tags, then you can create an index on tagId and use it to find out if a tag already exists for a given user and process it. Though you cannot modify colId once data is inserted.
As the message suggests, to use order by, you should follow the same order as in PRIMARY KEY.
CREATE TABLE tag_by_user (
userid uuid,
colid timeuuid,
tagid uuid,
taglabel text,
PRIMARY KEY (userid, colid, tagid)
);
select taglabel from tag_by_user where userid = 4978f728-0f96-11e5-a6c0-1697f925ec7b order by colid;
taglabel
------------
html
java
javascript
scala pro
c++

Resources