Ordering by username in Cassandra - cassandra

Let's say I have this table:
CREATE TABLE "users" (
username text,
created_at timeuuid,
email text,
firstname text,
groups list<text>,
is_active boolean,
lastname text,
"password" text,
roles list<text>,
PRIMARY KEY (username, created_at)
)
I want to order users by username, which is not possible as ordering is only possible via the clustering column. How can I order my users by username?
I need to query users by username, so that is the reason, why username is the indexing column.
What is the right approach here?

If you absolutely must have the username sorted, and return all usernames in one query then you will need to create another table for this effect:
CREATE TABLE "users" (
field text,
value text,
PRIMARY KEY (field, value)
)
Unfortunately, this will put all the usernames in just one partition, but it's the only way of keeping them sorted. On the other hand, you could expand the table to store different values that you need to retrieve in the same way. So for instance, the partition field="username" would have all the usernames, but you could create another partition field="Surname" to store all the usernames sorted.
Cassandra is NoSQL, so duplication of data can be expected.

Cassandra stores the partition key data by hashing the value.
So when the data is returned, the order is done by the hash values and not order of the data itself. Thus, you can't order on the partition key.
Coming back to your question, I'm not sure about what kind of data it is and what kind of query you would want to run. Assuming multiple users per email I'd create the following table:
CREATE TABLE "users" (
username text,
created_at timeuuid,
email text,
firstname text,
groups list<text>,
is_active boolean,
lastname text,
"password" text,
roles list<text>,
PRIMARY KEY (email, username)
)

Related

How can I set Cassandra primary key?

I specify 2 unique data types, but when one of them is different, it keeps adding records.
The table schema has a compound primary key, i.e. it is composed of a partition key (username) and clustering key (email). This means that each partition has one or more rows of emails.
It is a completely different schema to a table with just a simple primary key (only has a partition key, no clustering key) like this:
CREATE TABLE users_by_username (
username text,
...
PRIMARY KEY (username)
)
This table would only ever have one row in each partition. Cheers!
[UPDATE] If you want your table to be partitioned by BOTH username + email, you need to create a new table which has a composite partition key (partition key has two or more columns):
CREATE TABLE users_by_username_email (
username text,
email text,
...
PRIMARY KEY ( (username, email) )
)
Note the difference: BOTH columns are enclosed in a bracket so they are treated as one key.

Cassandra - Is there a way to update column value for entire table

I have Cassandra table:
CREATE TABLE test (
network_id int,
date date,
score float,
id uuid,
user_id int,
user_name text,
PRIMARY KEY ((network_id, date), score, id))
WITH CLUSTERING ORDER BY (score DESC);
Query which I need to satisfy is:
"Give me all users which belongs to specific network for specific day sorted by score."
The problem is when user change his name (today) and when I have to execute query for some day in past my report will show old version of the name.
Changing column user_name to STATIC doesn't work because my table should be partitioned by day.
Any ideas how to solve this?
Thank You.
Since you have denormalized user_name for faster access, If the user_name updated you have to update all the copy of that user_name.
You need to maintain another table
CREATE TABLE network_by_user_id (
user_id int,
network_id int,
date date,
score float,
id uuid,
PRIMARY KEY (user_id, network_id, date, score, id)
);
So now whenever any user update their name you have to select all the record of that user from network_by_user_id table and for each record update user_name of base table
update test set user_name = 'New Name' where network_id = ? and date = ? and score = ? and id = ?
If the number of record for a user fastly increase over time, then the cost of update user_name will also fastly increase over time.
Another approach is to normalize the base table like below :
CREATE TABLE test (
network_id int,
date date,
score float,
id uuid,
user_id int,
PRIMARY KEY ((network_id, date), score, id)
);
CREATE TABLE users (
user_id int,
user_name text,
PRIMARY KEY (user_id)
);
For each user_id found in the base table you can query into users with execute async to get the user_name
Learn More about executeAsync
you can use SELECT command if you want to get any data from your Table

Cassandra & Solr Join 2 Cores

I've 2 models for Cassandra with the same partition key:
CREATE TABLE users(
parent_id int,
user_id text,
PRIMARY KEY ((parent_id), user_id )
);
CREATE TABLE user_actions(
parent_id int,
user_id text,
type text,
created_at int,
data map<text, text>,
PRIMARY KEY((parent_id), user_id, created_at)
);
I want to find all the users how made an action and belong to the same parent_id.
Right now I'm getting all the users, even if they did not made an action, I'm using it like this:
http://ADDRESS/solr/name.users/select?q=parent_id:1&fq={!join+fromIndex=name.user_actions}type:click
Thanks!
There are not 'from' and 'to' parameters to tell solr on which fields it should make the join, so your filter query should be something like:
fq={!join from=user_id fromIndex=name.user_actions to=user_id force=true}type:click

Search in user defined type with Apache Cassandra

In this example:
CREATE TYPE address (
street text,
city text,
zip_code int,
phones set<text>
)
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
addresses map<string, address>
)
How can I query users with city = newyork or find a user with a specific phone number.
This is not really a problem of querying a user-defined type: imagine that address would be a single text column and that addresses would contain a single address (ie. addresses TEXT); the problem would be the same.
Your user table is not meant to be query-able by anything else than the primary key, which in this case is the partition key, which is a UUID which makes it quasi useless.
If you want to query the users by name I would denormalize (that implies some duplication) and make a users_by_name table:
CREATE TABLE users_by_name(
name TEXT,
id UUID,
addresses whatever,
PRIMARY KEY((name), id)
)
where the users are stored by name (they should be unique) and the results will be retrieved sorted by id (id is the clustering key part of the primary key).
Same goes for query by addresses:
CREATE TABLE users_by_name(
city TEXT,
street TEXT,
name TEXT,
id UUID,
PRIMARY KEY((city), street)
)
You might think that it does not really solve your problem, but it looks like you designed your data model from a relational DB (SQL) point of view, this is not the goal with Cassandra.

ORDER BY reloaded, cassandra

A given column family I would like to sort and to this I am trying to create a table with the option CLUSTERING ORDER BY. I always encounter the following errors:
1.) Variant A resulting in
Bad Request: Missing CLUSTERING ORDER for column userid
Statement:
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc);
2.) Variant B resulting in
Bad Request: Only clustering key columns can be defined in CLUSTERING ORDER directive
Statement:
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc, userID asc);
As far as I can see in the manual this is the correct syntax for creating a table for which I would like to run queries as "SELECT .... FROM user WHERE ... ORDER BY lastname". How could I achieve this? (The column 'lastname' I would like to keep as the first part of the primary key, so that I could use it in delete statements with the WHERE-clause.)
Thanks a lot, Tamas
You can only specify clustering order on your clustering keys.
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc);
In your first example, your only clustering key is userID. Thus, it is the only valid entry for CLUSTERING ORDER BY.
PRIMARY KEY (lastname, userID)
)WITH CLUSTERING ORDER BY (lastname desc, userID asc);
The second example fails because you are specifying your partition key in CLUSTERING ORDER BY, and that's not going to work either.
Cassandra works by ordering CQL rows according to clustering keys, but only when a partition key is specified. This is because the whole idea of Cassandra wide-row modeling is to query by partition key, and read a series of ordered rows in one query operation.
I would like to run queries as "SELECT .... FROM user WHERE ... ORDER BY lastname".
Given this statement, I am going to suggest that you need another column in this model before it will work the way you want. What you need is an appropriate partition key for your users table. Say...like group. With your users partitioned by group, and clustered by lastname, your definition would look something like this:
CREATE TABLE test.usersbygroup (
userID timeuuid,
firstname varchar,
lastname varchar,
group text,
PRIMARY KEY (group,lastname)
)WITH CLUSTERING ORDER BY (lastname desc);
Then, this query will work, returning users (in this case) who are fans of the show "Firefly," ordered by lastname (descending):
SELECT * FROM usersbygroup WHERE group='Firefly Fans';
Read through this DataStax doc on Compound Keys and Clustering to get a better understanding.
NOTE: You don't need to specify ORDER BY in your SELECT. The rows will come back ordered by their clustering key(s), and ORDER BY cannot change that. All ORDER BY can really do, is alter the sort direction (DESCending vs. ASCending).
Clustering would be limited to whats defined in partitioning key, in your case (lastName + userId). So cassandra would store result in sorted order whose (lastName+userId) combination. Thats why u nned to give both for retrieval purpose. Its still not useful schema if you want to sort all data in table as last name as userId is unique(timeuuid) so clustering key would be of no use.
CREATE TABLE test.user (
userID timeuuid,
firstname varchar,
lastname varchar,
bucket int,
PRIMARY KEY (bucket)
)WITH CLUSTERING ORDER BY (lastname desc);
Here if u provide buket value say 1 for all user records then , all user would go in same bucket and hense it would retrieve all rows in sorted order of last name. (By no mean this is a good design, just to give you an idea).
Revised :
CREATE TABLE user1 (
userID uuid,
firstname varchar,
lastname varchar,
bucket int,
PRIMARY KEY ((bucket), lastname,userID)
)WITH CLUSTERING ORDER BY (lastname desc);

Resources