Cassandra & Solr Join 2 Cores

Cassandra & Solr Join 2 Cores - cassandra

I've 2 models for Cassandra with the same partition key:
CREATE TABLE users(
parent_id int,
user_id text,
PRIMARY KEY ((parent_id), user_id )
);
CREATE TABLE user_actions(
parent_id int,
user_id text,
type text,
created_at int,
data map<text, text>,
PRIMARY KEY((parent_id), user_id, created_at)
);
I want to find all the users how made an action and belong to the same parent_id.
Right now I'm getting all the users, even if they did not made an action, I'm using it like this:
http://ADDRESS/solr/name.users/select?q=parent_id:1&fq={!join+fromIndex=name.user_actions}type:click
Thanks!

There are not 'from' and 'to' parameters to tell solr on which fields it should make the join, so your filter query should be something like:
fq={!join from=user_id fromIndex=name.user_actions to=user_id force=true}type:click

Related

Cassandra - Is there a way to update column value for entire table

I have Cassandra table:
CREATE TABLE test (
network_id int,
date date,
score float,
id uuid,
user_id int,
user_name text,
PRIMARY KEY ((network_id, date), score, id))
WITH CLUSTERING ORDER BY (score DESC);
Query which I need to satisfy is:
"Give me all users which belongs to specific network for specific day sorted by score."
The problem is when user change his name (today) and when I have to execute query for some day in past my report will show old version of the name.
Changing column user_name to STATIC doesn't work because my table should be partitioned by day.
Any ideas how to solve this?
Thank You.

Since you have denormalized user_name for faster access, If the user_name updated you have to update all the copy of that user_name.
You need to maintain another table
CREATE TABLE network_by_user_id (
user_id int,
network_id int,
date date,
score float,
id uuid,
PRIMARY KEY (user_id, network_id, date, score, id)
);
So now whenever any user update their name you have to select all the record of that user from network_by_user_id table and for each record update user_name of base table
update test set user_name = 'New Name' where network_id = ? and date = ? and score = ? and id = ?
If the number of record for a user fastly increase over time, then the cost of update user_name will also fastly increase over time.
Another approach is to normalize the base table like below :
CREATE TABLE test (
network_id int,
date date,
score float,
id uuid,
user_id int,
PRIMARY KEY ((network_id, date), score, id)
);
CREATE TABLE users (
user_id int,
user_name text,
PRIMARY KEY (user_id)
);
For each user_id found in the base table you can query into users with execute async to get the user_name
Learn More about executeAsync

you can use SELECT command if you want to get any data from your Table

Am I using cassandra efficiently?

I have these table
CREATE TABLE user_info (
userId uuid PRIMARY KEY,
userName varchar,
fullName varchar,
sex varchar,
bizzCateg varchar,
userType varchar,
about text,
joined bigint,
contact text,
job set<text>,
blocked boolean,
emails set<text>,
websites set<text>,
professionTag set<text>,
location frozen<location>
);
create table publishMsg
(
rowKey uuid,
msgId timeuuid,
postedById uuid,
title text,
time bigint,
details text,
tags set<text>,
location frozen<location>,
blocked boolean,
anonymous boolean,
hasPhotos boolean,
esIndx boolean,
PRIMARY KEY(rowKey, msgId)
) with clustering order by (msgId desc);
create table publishMsg_by_user
(
rowKey uuid,
msgId timeuuid,
title text,
time bigint,
details text,
tags set<text>,
location frozen<location>,
blocked boolean,
anonymous boolean,
hasPhotos boolean,
PRIMARY KEY(rowKey, msgId)
) with clustering order by (msgId desc);
CREATE TABLE followers
(
rowKey UUID,
followedBy uuid,
time bigint,
PRIMARY KEY(rowKey, orderKey)
);
I doing 3 INSERT statement in BATCH to put data in publishMsg publishMsg_by_user followers table.
To show a single message I have to query three SELECT query on different table:
publishMsg - to get a publish message details where rowkey & msgId given.
userInfo - to get fullName based on postedById
followers - to know whether a postedById is following a given topic or not
Is this a fit way of using cassandra ? will that be efficient because the given scanerio data can't fit in single table.

Sorry to ask this in an answer but I don't have the rep to comment.
Ignoring the tables for now, what information does your application need to ask for? Ideally in Cassandra, you will only have to execute one query on one table to get the data you need to return to the client. You shouldn't need to have to execute 3 queries to get what you want.
Also, your followers table appears to be missing the orderkey field.

getting result in the same order it was added in the table

I have this table
CREATE TABLE tag_by_user (
userId uuid,
tagId uuid,
colId timeuuid,
tagLabel text,
PRIMARY KEY (userId, tagId,colId)
);
here is my data
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,b0b328fa-0f96-11e5-a6c0-1697f925ec7b,now(),'html');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,b0b330d4-0f96-11e5-a6c0-1697f925ec7b,now(),'java');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,c0f22450-0f96-11e5-a6c0-1697f925ec7b,now(),'javascript');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,c0f226b2-0f96-11e5-a6c0-1697f925ec7b,now(),'scala pro');
insert into tag_by_user(userId,tagId,colId,tagLabel) values(4978f728-0f96-11e5-a6c0-1697f925ec7b
,c0f22ab8-0f96-11e5-a6c0-1697f925ec7b,now(),'c++');
Now i want to get the tags of a given user in same order it was added to the row (i.e in the ascending order of time when it was added and here that one is colId)
cqlsh:ks_demo> select taglabel from tag_by_user where userid= 77c4d46c-0f96-11e5-a6c0-1697f925ec7b order by colid;
it gives this error
Bad Request: Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY
What changes i will have to in schema or in query cqlsh 4.1.1 | Cassandra 2.0.8

You need to leave only userId and colId in the PRIMARY KEY:
CREATE TABLE tag_by_user (
userId uuid,
colId timeuuid,
tagId uuid,
tagLabel text,
PRIMARY KEY (userId, colId)
);
And then use
SELECT * FROM tag_by_user WHERE userId={yourUserId}
to get the tags of a given user in ascending order of time.
If you need to avoid duplicate tags, then you can create an index on tagId and use it to find out if a tag already exists for a given user and process it. Though you cannot modify colId once data is inserted.

As the message suggests, to use order by, you should follow the same order as in PRIMARY KEY.
CREATE TABLE tag_by_user (
userid uuid,
colid timeuuid,
tagid uuid,
taglabel text,
PRIMARY KEY (userid, colid, tagid)
);
select taglabel from tag_by_user where userid = 4978f728-0f96-11e5-a6c0-1697f925ec7b order by colid;
taglabel
------------
html
java
javascript
scala pro
c++

How to design the cassandra table for one query with a ordering and limit?

Now I created a table:
CREATE TABLE posts_by_user(
user_id bigint,
post_id uuid,
post_at timestamp,
PRIMARY KEY (user_id,post_id)
);
I want to select last 10 rows with operator IN for user_id and ordering by post_at field.
Also I read a good article:
http://planetcassandra.org/blog/the-in-operator-in-cassandra-cql/
I can nit use query: WHERE post_at = time AND user_id IN (1,2) because I need all notes, not for a concrete date.
How i can change my design schema? Thank you.
I change on:
CREATE TABLE posts_by_user (
user_id bigint,
post_id uuid,
post_at timestamp,
PRIMARY KEY (user_id, post_at)
) WITH CLUSTERING ORDER BY (post_at DESC);
Think it is a good...

How about using this approach: http://www.datastax.com/documentation/cql/3.1/cql/cql_using/use-slice-partition.html

Non-EQ relation error Cassandra - how fix primary key?

I created a one table posts. When I make request SELECT:
return $this->db->query('SELECT * FROM "posts" WHERE "id" IN(:id) LIMIT '.$this->limit_per_page, ['id' => $id]);
I get error:
PRIMARY KEY column "id" cannot be restricted (preceding column
"post_at" is either not restricted or by a non-EQ relation)
My table dump is:
CREATE TABLE posts (
id uuid,
post_at timestamp,
user_id bigint,
name text,
category set<text>,
link varchar,
image set<varchar>,
video set<varchar>,
content map<text, text>,
private boolean,
PRIMARY KEY (user_id,post_at,id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
I read some article about PRIMARY AND CLUSTER KEYS, and understood, when there are some primary keys - I need use operator = with IN. In my case, i can not use a one PRIMARY KEY. What you advise me to change in table structure, that error will disappear?

My dummy table structure
CREATE TABLE posts (
id timeuuid,
post_at timestamp,
user_id bigint,
PRIMARY KEY (id,post_at,user_id)
)
WITH CLUSTERING ORDER BY (post_at DESC);
And after inserting some dummy data
I ran query select * from posts where id in (timeuuid1,timeuuid2,timeuuid3);
I was using cassandra 2.0 with cql 3.0

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra & Solr Join 2 Cores - cassandra

There are not 'from' and 'to' parameters to tell solr on which fields it should make the join, so your filter query should be something like: fq={!join from=user_id fromIndex=name.user_actions to=user_id force=true}type:click

Related

Cassandra - Is there a way to update column value for entire table

Am I using cassandra efficiently?

getting result in the same order it was added in the table

How to design the cassandra table for one query with a ordering and limit?

Non-EQ relation error Cassandra - how fix primary key?

Categories

Resources