I have seen this data model:
CREATE TABLE IF NOT EXISTS social_media.posts_by_user (
user_id uuid,
post_id uuid,
message_text text,
created_on timestamp,
deleted boolean,
user_full_name text,
PRIMARY KEY ((user_id, created_on))
);
CREATE TABLE IF NOT EXISTS social_media.user_timeline (
follower_id uuid,
post_id uuid,
user_id uuid,
location_name text,
user_full_name text,
created_on timestamp,
PRIMARY KEY ((user_id, created_on))
);
CREATE TABLE IF NOT EXISTS social_media.post_counts (
likes_count counter,
view_count counter,
comments_count counter,
post_id uuid,
PRIMARY KEY (post_id)
);
My Question is now:
If I want to show a post with likes. How I query it ? I cant join the post_counts table so how I do it ? It should be in the posts_by_user query or I am wrong ?
Output as User Interface:
--username
--profilimage
--likes
--follow-user
Related
I am new to Cassandra and trying to implement Reddit mock with limited functionalities. I am not considering subreddits and comments as of now. There is a single home page that displays 'Top' posts and 'New' posts. By clicking any post I can navigate into the post.
1)Is this a correct schema design?
2)If I want to show all-time top posts how can that be achieved?
Table for Post Details
CREATE TABLE main.post (
user_id text,
post_id text,
timeuuid timeuuid,
downvoted_user_id list<text>,
img_ids list<text>,
islocked boolean,
isnsfw boolean,
post_date date,
score int,
upvoted_user_id list<text>,
PRIMARY KEY ((user_id, post_id), timeuuid)
) WITH CLUSTERING ORDER BY (timeuuid DESC)
Table for Top & New Posts
CREATE TABLE main.posts_by_year (
post_year text,
timeuuid timeuuid,
score int,
img_ids list<text>,
islocked boolean,
isnsfw boolean,
post_date date,
post_id text,
user_id text,
PRIMARY KEY (post_year, timeuuid, score)
) WITH CLUSTERING ORDER BY (timeuuid DESC, score DESC)
I have these table
CREATE TABLE user_info (
userId uuid PRIMARY KEY,
userName varchar,
fullName varchar,
sex varchar,
bizzCateg varchar,
userType varchar,
about text,
joined bigint,
contact text,
job set<text>,
blocked boolean,
emails set<text>,
websites set<text>,
professionTag set<text>,
location frozen<location>
);
create table publishMsg
(
rowKey uuid,
msgId timeuuid,
postedById uuid,
title text,
time bigint,
details text,
tags set<text>,
location frozen<location>,
blocked boolean,
anonymous boolean,
hasPhotos boolean,
esIndx boolean,
PRIMARY KEY(rowKey, msgId)
) with clustering order by (msgId desc);
create table publishMsg_by_user
(
rowKey uuid,
msgId timeuuid,
title text,
time bigint,
details text,
tags set<text>,
location frozen<location>,
blocked boolean,
anonymous boolean,
hasPhotos boolean,
PRIMARY KEY(rowKey, msgId)
) with clustering order by (msgId desc);
CREATE TABLE followers
(
rowKey UUID,
followedBy uuid,
time bigint,
PRIMARY KEY(rowKey, orderKey)
);
I doing 3 INSERT statement in BATCH to put data in publishMsg publishMsg_by_user followers table.
To show a single message I have to query three SELECT query on different table:
publishMsg - to get a publish message details where rowkey & msgId given.
userInfo - to get fullName based on postedById
followers - to know whether a postedById is following a given topic or not
Is this a fit way of using cassandra ? will that be efficient because the given scanerio data can't fit in single table.
Sorry to ask this in an answer but I don't have the rep to comment.
Ignoring the tables for now, what information does your application need to ask for? Ideally in Cassandra, you will only have to execute one query on one table to get the data you need to return to the client. You shouldn't need to have to execute 3 queries to get what you want.
Also, your followers table appears to be missing the orderkey field.
I have this table
CREATE TABLE tag_by_user (
userId uuid,
tagId uuid,
colId timeuuid,
tagLabel text,
PRIMARY KEY (userId, tagId,colId)
);
here is my data
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,b0b328fa-0f96-11e5-a6c0-1697f925ec7b,now(),'html');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,b0b330d4-0f96-11e5-a6c0-1697f925ec7b,now(),'java');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,c0f22450-0f96-11e5-a6c0-1697f925ec7b,now(),'javascript');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,c0f226b2-0f96-11e5-a6c0-1697f925ec7b,now(),'scala pro');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,c0f22ab8-0f96-11e5-a6c0-1697f925ec7b,now(),'c++');
Now i want to get the tags of a given user in same order it was added to the row (i.e in the ascending order of time when it was added and here that one is colId)
cqlsh:ks_demo> select taglabel from tag_by_user where userid= 77c4d46c-0f96-11e5-a6c0-1697f925ec7b order by colid;
it gives this error
Bad Request: Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY
What changes i will have to in schema or in query cqlsh 4.1.1 | Cassandra 2.0.8
You need to leave only userId and colId in the PRIMARY KEY:
CREATE TABLE tag_by_user (
userId uuid,
colId timeuuid,
tagId uuid,
tagLabel text,
PRIMARY KEY (userId, colId)
);
And then use
SELECT * FROM tag_by_user WHERE userId={yourUserId}
to get the tags of a given user in ascending order of time.
If you need to avoid duplicate tags, then you can create an index on tagId and use it to find out if a tag already exists for a given user and process it. Though you cannot modify colId once data is inserted.
As the message suggests, to use order by, you should follow the same order as in PRIMARY KEY.
CREATE TABLE tag_by_user (
userid uuid,
colid timeuuid,
tagid uuid,
taglabel text,
PRIMARY KEY (userid, colid, tagid)
);
select taglabel from tag_by_user where userid = 4978f728-0f96-11e5-a6c0-1697f925ec7b order by colid;
taglabel
------------
html
java
javascript
scala pro
c++
Now I created a table:
CREATE TABLE posts_by_user(
user_id bigint,
post_id uuid,
post_at timestamp,
PRIMARY KEY (user_id,post_id)
);
I want to select last 10 rows with operator IN for user_id and ordering by post_at field.
Also I read a good article:
http://planetcassandra.org/blog/the-in-operator-in-cassandra-cql/
I can nit use query: WHERE post_at = time AND user_id IN (1,2) because I need all notes, not for a concrete date.
How i can change my design schema? Thank you.
I change on:
CREATE TABLE posts_by_user (
user_id bigint,
post_id uuid,
post_at timestamp,
PRIMARY KEY (user_id, post_at)
) WITH CLUSTERING ORDER BY (post_at DESC);
Think it is a good...
How about using this approach: http://www.datastax.com/documentation/cql/3.1/cql/cql_using/use-slice-partition.html
I created a one table posts. When I make request SELECT:
return $this->db->query('SELECT * FROM "posts" WHERE "id" IN(:id) LIMIT '.$this->limit_per_page, ['id' => $id]);
I get error:
PRIMARY KEY column "id" cannot be restricted (preceding column
"post_at" is either not restricted or by a non-EQ relation)
My table dump is:
CREATE TABLE posts (
id uuid,
post_at timestamp,
user_id bigint,
name text,
category set<text>,
link varchar,
image set<varchar>,
video set<varchar>,
content map<text, text>,
private boolean,
PRIMARY KEY (user_id,post_at,id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
I read some article about PRIMARY AND CLUSTER KEYS, and understood, when there are some primary keys - I need use operator = with IN. In my case, i can not use a one PRIMARY KEY. What you advise me to change in table structure, that error will disappear?
My dummy table structure
CREATE TABLE posts (
id timeuuid,
post_at timestamp,
user_id bigint,
PRIMARY KEY (id,post_at,user_id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
And after inserting some dummy data
I ran query select * from posts where id in (timeuuid1,timeuuid2,timeuuid3);
I was using cassandra 2.0 with cql 3.0