Ordering in Cassandra - cassandra

Yes, so I've been researching for some time and found out it is not uncommon for people to have problems with ordering data in Cassandra, but I still can't figure out why my selects are not being ordered in the right way.
So here is my table creation query:
CREATE TABLE library.query1 (
id int,
gender text,
surname text,
email text,
addinfo text,
endid int,
name text,
phone int,
PRIMARY KEY ((id), gender, surname, email)
) WITH CLUSTERING ORDER BY (gender DESC, surname DESC, email DESC);
As implicit, I want to order my data by gender > surname > email.
I then import data via CVN, as I'm importing data from PostgreSQL tables. Here's the SELECT I'm using:
SELECT id, gender, name, surname, phone, email
FROM library.query1;
Is there something I'm forgetting in the query for the ordering to be done, or is my modeling wrong?

You could create a partition for male users for example. Then your ordering should work fine.
CREATE TABLE library.query1 (
id int,
gender text,
surname text,
email text,
addinfo text,
endid int,
name text,
phone int,
PRIMARY KEY (gender, surname, email)
) WITH CLUSTERING ORDER BY (surname DESC, email DESC);

Related

Ordering by username in Cassandra

Let's say I have this table:
CREATE TABLE "users" (
username text,
created_at timeuuid,
email text,
firstname text,
groups list<text>,
is_active boolean,
lastname text,
"password" text,
roles list<text>,
PRIMARY KEY (username, created_at)
)
I want to order users by username, which is not possible as ordering is only possible via the clustering column. How can I order my users by username?
I need to query users by username, so that is the reason, why username is the indexing column.
What is the right approach here?
If you absolutely must have the username sorted, and return all usernames in one query then you will need to create another table for this effect:
CREATE TABLE "users" (
field text,
value text,
PRIMARY KEY (field, value)
)
Unfortunately, this will put all the usernames in just one partition, but it's the only way of keeping them sorted. On the other hand, you could expand the table to store different values that you need to retrieve in the same way. So for instance, the partition field="username" would have all the usernames, but you could create another partition field="Surname" to store all the usernames sorted.
Cassandra is NoSQL, so duplication of data can be expected.
Cassandra stores the partition key data by hashing the value.
So when the data is returned, the order is done by the hash values and not order of the data itself. Thus, you can't order on the partition key.
Coming back to your question, I'm not sure about what kind of data it is and what kind of query you would want to run. Assuming multiple users per email I'd create the following table:
CREATE TABLE "users" (
username text,
created_at timeuuid,
email text,
firstname text,
groups list<text>,
is_active boolean,
lastname text,
"password" text,
roles list<text>,
PRIMARY KEY (email, username)
)

Cassandra - Is there a way to update column value for entire table

I have Cassandra table:
CREATE TABLE test (
network_id int,
date date,
score float,
id uuid,
user_id int,
user_name text,
PRIMARY KEY ((network_id, date), score, id))
WITH CLUSTERING ORDER BY (score DESC);
Query which I need to satisfy is:
"Give me all users which belongs to specific network for specific day sorted by score."
The problem is when user change his name (today) and when I have to execute query for some day in past my report will show old version of the name.
Changing column user_name to STATIC doesn't work because my table should be partitioned by day.
Any ideas how to solve this?
Thank You.
Since you have denormalized user_name for faster access, If the user_name updated you have to update all the copy of that user_name.
You need to maintain another table
CREATE TABLE network_by_user_id (
user_id int,
network_id int,
date date,
score float,
id uuid,
PRIMARY KEY (user_id, network_id, date, score, id)
);
So now whenever any user update their name you have to select all the record of that user from network_by_user_id table and for each record update user_name of base table
update test set user_name = 'New Name' where network_id = ? and date = ? and score = ? and id = ?
If the number of record for a user fastly increase over time, then the cost of update user_name will also fastly increase over time.
Another approach is to normalize the base table like below :
CREATE TABLE test (
network_id int,
date date,
score float,
id uuid,
user_id int,
PRIMARY KEY ((network_id, date), score, id)
);
CREATE TABLE users (
user_id int,
user_name text,
PRIMARY KEY (user_id)
);
For each user_id found in the base table you can query into users with execute async to get the user_name
Learn More about executeAsync
you can use SELECT command if you want to get any data from your Table

Cassandra & Solr Join 2 Cores

I've 2 models for Cassandra with the same partition key:
CREATE TABLE users(
parent_id int,
user_id text,
PRIMARY KEY ((parent_id), user_id )
);
CREATE TABLE user_actions(
parent_id int,
user_id text,
type text,
created_at int,
data map<text, text>,
PRIMARY KEY((parent_id), user_id, created_at)
);
I want to find all the users how made an action and belong to the same parent_id.
Right now I'm getting all the users, even if they did not made an action, I'm using it like this:
http://ADDRESS/solr/name.users/select?q=parent_id:1&fq={!join+fromIndex=name.user_actions}type:click
Thanks!
There are not 'from' and 'to' parameters to tell solr on which fields it should make the join, so your filter query should be something like:
fq={!join from=user_id fromIndex=name.user_actions to=user_id force=true}type:click

Search in user defined type with Apache Cassandra

In this example:
CREATE TYPE address (
street text,
city text,
zip_code int,
phones set<text>
)
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
addresses map<string, address>
)
How can I query users with city = newyork or find a user with a specific phone number.
This is not really a problem of querying a user-defined type: imagine that address would be a single text column and that addresses would contain a single address (ie. addresses TEXT); the problem would be the same.
Your user table is not meant to be query-able by anything else than the primary key, which in this case is the partition key, which is a UUID which makes it quasi useless.
If you want to query the users by name I would denormalize (that implies some duplication) and make a users_by_name table:
CREATE TABLE users_by_name(
name TEXT,
id UUID,
addresses whatever,
PRIMARY KEY((name), id)
)
where the users are stored by name (they should be unique) and the results will be retrieved sorted by id (id is the clustering key part of the primary key).
Same goes for query by addresses:
CREATE TABLE users_by_name(
city TEXT,
street TEXT,
name TEXT,
id UUID,
PRIMARY KEY((city), street)
)
You might think that it does not really solve your problem, but it looks like you designed your data model from a relational DB (SQL) point of view, this is not the goal with Cassandra.

Am I using cassandra efficiently?

I have these table
CREATE TABLE user_info (
userId uuid PRIMARY KEY,
userName varchar,
fullName varchar,
sex varchar,
bizzCateg varchar,
userType varchar,
about text,
joined bigint,
contact text,
job set<text>,
blocked boolean,
emails set<text>,
websites set<text>,
professionTag set<text>,
location frozen<location>
);
create table publishMsg
(
rowKey uuid,
msgId timeuuid,
postedById uuid,
title text,
time bigint,
details text,
tags set<text>,
location frozen<location>,
blocked boolean,
anonymous boolean,
hasPhotos boolean,
esIndx boolean,
PRIMARY KEY(rowKey, msgId)
) with clustering order by (msgId desc);
create table publishMsg_by_user
(
rowKey uuid,
msgId timeuuid,
title text,
time bigint,
details text,
tags set<text>,
location frozen<location>,
blocked boolean,
anonymous boolean,
hasPhotos boolean,
PRIMARY KEY(rowKey, msgId)
) with clustering order by (msgId desc);
CREATE TABLE followers
(
rowKey UUID,
followedBy uuid,
time bigint,
PRIMARY KEY(rowKey, orderKey)
);
I doing 3 INSERT statement in BATCH to put data in publishMsg publishMsg_by_user followers table.
To show a single message I have to query three SELECT query on different table:
publishMsg - to get a publish message details where rowkey & msgId given.
userInfo - to get fullName based on postedById
followers - to know whether a postedById is following a given topic or not
Is this a fit way of using cassandra ? will that be efficient because the given scanerio data can't fit in single table.
Sorry to ask this in an answer but I don't have the rep to comment.
Ignoring the tables for now, what information does your application need to ask for? Ideally in Cassandra, you will only have to execute one query on one table to get the data you need to return to the client. You shouldn't need to have to execute 3 queries to get what you want.
Also, your followers table appears to be missing the orderkey field.

Resources