Cassandra Schema for Reddit Posts,Top posts,new posts - cassandra

I am new to Cassandra and trying to implement Reddit mock with limited functionalities. I am not considering subreddits and comments as of now. There is a single home page that displays 'Top' posts and 'New' posts. By clicking any post I can navigate into the post.
1)Is this a correct schema design?
2)If I want to show all-time top posts how can that be achieved?
Table for Post Details
CREATE TABLE main.post (
user_id text,
post_id text,
timeuuid timeuuid,
downvoted_user_id list<text>,
img_ids list<text>,
islocked boolean,
isnsfw boolean,
post_date date,
score int,
upvoted_user_id list<text>,
PRIMARY KEY ((user_id, post_id), timeuuid)
) WITH CLUSTERING ORDER BY (timeuuid DESC)
Table for Top & New Posts
CREATE TABLE main.posts_by_year (
post_year text,
timeuuid timeuuid,
score int,
img_ids list<text>,
islocked boolean,
isnsfw boolean,
post_date date,
post_id text,
user_id text,
PRIMARY KEY (post_year, timeuuid, score)
) WITH CLUSTERING ORDER BY (timeuuid DESC, score DESC)

Related

cassandra how would the query look like?

I have seen this data model:
CREATE TABLE IF NOT EXISTS social_media.posts_by_user (
user_id uuid,
post_id uuid,
message_text text,
created_on timestamp,
deleted boolean,
user_full_name text,
PRIMARY KEY ((user_id, created_on))
);
CREATE TABLE IF NOT EXISTS social_media.user_timeline (
follower_id uuid,
post_id uuid,
user_id uuid,
location_name text,
user_full_name text,
created_on timestamp,
PRIMARY KEY ((user_id, created_on))
);
CREATE TABLE IF NOT EXISTS social_media.post_counts (
likes_count counter,
view_count counter,
comments_count counter,
post_id uuid,
PRIMARY KEY (post_id)
);
My Question is now:
If I want to show a post with likes. How I query it ? I cant join the post_counts table so how I do it ? It should be in the posts_by_user query or I am wrong ?
Output as User Interface:
--username
--profilimage
--likes
--follow-user

Cassandra upsert not working

My cqlsh version is 3.3.1, and I am using datastax.cassandra java driver for connecting. Following is my Cassandra schema.
url text PRIMARY KEY,
ae_rank int,
age int,
brands list<text>,
cities list<text>,
content text,
countries list<text>,
created timestamp,
customer_id text,
datasource list<text>,
diseases list<frozen<disease>>,
drugs list<text>,
gender int,
host text,
lang text,
meddracoding list<text>,
molecules list<text>,
owners list<text>,
page_rank int,
pj_terms list<frozen<pj_term>>,
projects list<text>,
quintilesims_id int,
reviewed int,
sentiment int,
social_tags list<text>,
switchovers list<frozen<switchover>>,
taxonomies list<frozen<taxonomy>>,
therapy_areas list<text>,
title text,
total_count bigint,
ts timestamp,
type text,
updated timestamp,
viralities map<text, bigint>
When there's an update (or insert) statement issue with same primary key value, the relevant row values does not get updated. Here we use an URL as primary key value (ex. http://example.com/example/1234test)
Is this an issue with the versions of Cassandra or the java driver?
Please help get this resolved.

Am I using cassandra efficiently?

I have these table
CREATE TABLE user_info (
userId uuid PRIMARY KEY,
userName varchar,
fullName varchar,
sex varchar,
bizzCateg varchar,
userType varchar,
about text,
joined bigint,
contact text,
job set<text>,
blocked boolean,
emails set<text>,
websites set<text>,
professionTag set<text>,
location frozen<location>
);
create table publishMsg
(
rowKey uuid,
msgId timeuuid,
postedById uuid,
title text,
time bigint,
details text,
tags set<text>,
location frozen<location>,
blocked boolean,
anonymous boolean,
hasPhotos boolean,
esIndx boolean,
PRIMARY KEY(rowKey, msgId)
) with clustering order by (msgId desc);
create table publishMsg_by_user
(
rowKey uuid,
msgId timeuuid,
title text,
time bigint,
details text,
tags set<text>,
location frozen<location>,
blocked boolean,
anonymous boolean,
hasPhotos boolean,
PRIMARY KEY(rowKey, msgId)
) with clustering order by (msgId desc);
CREATE TABLE followers
(
rowKey UUID,
followedBy uuid,
time bigint,
PRIMARY KEY(rowKey, orderKey)
);
I doing 3 INSERT statement in BATCH to put data in publishMsg publishMsg_by_user followers table.
To show a single message I have to query three SELECT query on different table:
publishMsg - to get a publish message details where rowkey & msgId given.
userInfo - to get fullName based on postedById
followers - to know whether a postedById is following a given topic or not
Is this a fit way of using cassandra ? will that be efficient because the given scanerio data can't fit in single table.
Sorry to ask this in an answer but I don't have the rep to comment.
Ignoring the tables for now, what information does your application need to ask for? Ideally in Cassandra, you will only have to execute one query on one table to get the data you need to return to the client. You shouldn't need to have to execute 3 queries to get what you want.
Also, your followers table appears to be missing the orderkey field.

How to design the cassandra table for one query with a ordering and limit?

Now I created a table:
CREATE TABLE posts_by_user(
user_id bigint,
post_id uuid,
post_at timestamp,
PRIMARY KEY (user_id,post_id)
);
I want to select last 10 rows with operator IN for user_id and ordering by post_at field.
Also I read a good article:
http://planetcassandra.org/blog/the-in-operator-in-cassandra-cql/
I can nit use query: WHERE post_at = time AND user_id IN (1,2) because I need all notes, not for a concrete date.
How i can change my design schema? Thank you.
I change on:
CREATE TABLE posts_by_user (
user_id bigint,
post_id uuid,
post_at timestamp,
PRIMARY KEY (user_id, post_at)
) WITH CLUSTERING ORDER BY (post_at DESC);
Think it is a good...
How about using this approach: http://www.datastax.com/documentation/cql/3.1/cql/cql_using/use-slice-partition.html

Non-EQ relation error Cassandra - how fix primary key?

I created a one table posts. When I make request SELECT:
return $this->db->query('SELECT * FROM "posts" WHERE "id" IN(:id) LIMIT '.$this->limit_per_page, ['id' => $id]);
I get error:
PRIMARY KEY column "id" cannot be restricted (preceding column
"post_at" is either not restricted or by a non-EQ relation)
My table dump is:
CREATE TABLE posts (
id uuid,
post_at timestamp,
user_id bigint,
name text,
category set<text>,
link varchar,
image set<varchar>,
video set<varchar>,
content map<text, text>,
private boolean,
PRIMARY KEY (user_id,post_at,id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
I read some article about PRIMARY AND CLUSTER KEYS, and understood, when there are some primary keys - I need use operator = with IN. In my case, i can not use a one PRIMARY KEY. What you advise me to change in table structure, that error will disappear?
My dummy table structure
CREATE TABLE posts (
id timeuuid,
post_at timestamp,
user_id bigint,
PRIMARY KEY (id,post_at,user_id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
And after inserting some dummy data
I ran query select * from posts where id in (timeuuid1,timeuuid2,timeuuid3);
I was using cassandra 2.0 with cql 3.0

Resources