Is it possible to query the most recent additions to a table/column family in cassandra? - cassandra

Assuming the following table, is it possible to easily query the most recent items added to the table?
create table messages(
person_uuid uuid,
uuid timeuuid,
message text);
The primary purpose of this table is to hold a list of messages sent to a particular user, but there is also a need to show an RSS feed of all most recent users, ie, something like:
select person_uuid, message from messages
order by uuid
limit 30;

You need to use Compound Primary Key to be able to sort and order by date.
CREATE TABLE messages(
person_uuid uuid,
date timeuuid,
message text,
PRIMARY KEY(person_uuid,date)
);
Then you can do
SELECT * FROM messages WHERE person_uuid=xxx ORDER BY date DESC LIMIT 20;

Related

Cassandra Order By Updated At

I'm trying to build a cassandra schema to represent chat.
The one thing i can't seem to figure out is how to query most recently updated rooms (similar to most chat app list view)
Fields desired in list view ordered by updated_at desc
*room id
room title
room image
*user
*updated_at
*message entry
*message type
*metadata
Current Tables
Create TYPE user(
id uuid,
name text,
avatar text
);
CREATE TABLE rooms(
id uuid,
"name" text,
image text,
users set<user>,
archived boolean,
created_at timestampz,
updated_at timestampz,
PRIMARY KEY(id)
);
CREATE TABLE messages(
room_id uuid,
message_id timeuuid,
user user,
message_type int,
entry text,
metadata map<text, text>,
PRIMARY KEY(room_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
CREATE TABLE rooms_by_user(
user_id uuid,
room_id uuid,
PRIMARY KEY(user_id, room_id)
);
Possible solutions that i can come up with.
Duplicate all room details to each message
allows easy query with SELECT * FROM messages PER PARTITION LIMIT 1
this would be a lot of duplicate data per message...
Query latest messages which user belongs to get room ids then query rooms
This doesn't seem to be the cassandra way?
Is there a better way to model my data?
By looking at the schema it looks like you need relational database.
In Cassandra usually you use one table per query, it means you you should design your table by how you will structure query.
Also you can query by partition key or clustering column (second one should be partition key + clustering column).
So in order to query by updater_at, you need to make that column as clustering column. And keep in mind that in Cassandra you cannot alter keys.

Cassandra update previous rows after insert

I have this schema in cassandra:
create table if not exists
converstation_events(
timestamp timestamp,
sender_id bigint,
conversation_id bigint,
message_type varchar,
message text,
primary key ((conversation_id), sender_id, message_type, timestamp));
And there is a message_type with value conversation_ended, is there a way to denormalise the data so I can do queries on those conversations that have already ended?
I've thought about having an extra field that can be updated by a trigger when a conversation_ended message hits the system, does this make sense?
In Cassandra you need to model your data in a way the answers your questions. It's not like a RDBMS where you create you model first then create your queries. So think backwards...
When you do a query in cassandra (for the most part...) you need to query by the primary key and you can use your clustering key(s) to filter or a select ranges. a great post on it.
Your converstation_events table will give you answers about a conversation, filtering by sender, type and time. ** if you want to filter by time you must include sender_id and message_type in the query.
But you want all conversations of a given type so you'll need another table to answer this query. If you want all the conversation that are conversation_ended you could create a second table to map message type to conversation, like-
conversation_by_message_type (
message_type varchar,
conversation_id bigint,
timestamp timestamp,
primary key ((message_type), timestamp, conversation_id));
On the client side you'll have to add a record to conversation_by_message_type anytime you insert a converstation_events event with a given message_type that you might want to look up. I have timestamp in this table so you can sort or filter by time or time and conversation_id.
To find all the ended conversations you could do queries like
<ids> = select conversation_id from conversation_by_message_type where message_type = 'conversation_ended'
select * from conversation_events where conversation_id IN (<ids>)

Cassandra bookmark table data modeling

I want to model table with the following functionalities -
I can fetch the bookmarked items in descending timestamp order
I can delete individual bookmarked item.
My table looks like this-
CREATE TABLE bookmarked_content(
user_id uuid,
type varchar,
timestamp timestamp,
item_id uuid,
primary key(user_id, type, timestamp)
WITH CLUSTERING KEY (type , timestamp DESC)
);
Now this is fine for fetching all the bookmarked of specific type in descending timestamp order, But the problem is I can't delete specific item from the table and I don't want to depend on secondary indexes for this problem.
Thanks in advance
You have nothing to do except using plain old chmod function:
rename($from, $to);
chmod($to, $mode);

How to model inbox

How would ago about modelling the data if I have a web app for messaging and I expect the user to either see all the messages ordered by date, or see the messages exchanged with a specific contact, again ordered by date.
Should I have two tables, called "global_inbox" and "contacts_inbox" where I would add each message to both?
For example:
CREATE TABLE global_inbox(user_id int, timestamp timestamp,
message text, PRIMARY KEY(user_id, timestamp)
CREATE TABLE inbox(user_id int, contact_id int,
timestamp timestapm, message text,
PRIMARY KEY(user_id, contact_id, timestamp)
This means that every message should be copied 4 times, 2 for sender and 2 for receiver. Does it sound reasonable?
Yes, It's reasonable.
You need some modification.
Inbox table : If a user have many contact and every contact send message, then a huge amount of data will be inserted into a single partition (user_id). So add contact_id to partition key.
Updated Schema :
CREATE TABLE inbox (
user_id int,
contact_id int,
timestamp timestamp,
message text,
PRIMARY KEY((user_id, contact_id), timestamp)
);
global_inbox : Though It's global inbox, a huge amount of data can be inserted into a single partition (user_id). So add more key to partition key to more distribution.
Updated Schema :
CREATE TABLE global_inbox (
user_id int,
year int,
month int,
timestamp timestamp,
message text,
PRIMARY KEY((user_id,year,month), timestamp)
);
Here you can also add also add week to partition key, if you have huge data in a single partition in a week. Or remove month from partition key if you think not much data will insert in a year.
In term of queries performance, Yes it sounds good for me. Apache cassandra is really built in for this kind of data modeling. We build table to satisfy queries. This is the process called 'Denormalization' in Cassandra paradigm. This will increase queries performance. You have duplicated data but the main goal is to have fast queries.

Cassandra modeling with a read/unread status for a message inbox, CQL

I'm trying to find the best data model for a message box application. That messages appear in order in which first the ‘unread’ appear and then as the user scrolls the ‘read’ messages will appear. In both of the categories I want to sort the messages by arrival time. Something like priority inbox in gmail.
The first schema I thought to use is :
CREATE TABLE inbox
(userId uuid,
messageId timeuuid,
data blob,
isRead boolean,
PRIMARY KEY(userId, isRead, messageId))
WITH CLUSTERING ORDER BY (isRead ASC, messageId DESC);
So My data is first sorted by the boolean field and then by time. Now I can easily go over first my 'unread' messages and after they all end then I will start reading the 'read' messages.
The problem is that I can't update any message status, since it's a part of the primary key. I can do a delete and then insert in a batch operation, it's also the same row.
Another solution will be :
CREATE TABLE inbox
(userId uuid,
messageId timeuuid,
data blob,
isRead boolean,
PRIMARY KEY((userId, isRead), messageId))
WITH CLUSTERING ORDER BY (messageId DESC)
Having a row for every status. I gain a very easy access but does that mean that I have to deal with transaction? When reading a message I have to delete it from the ‘unread’ row and insert it to the ‘read’ row, they might be in different partitions.
another version for the partition key can be :
PRIMARY KEY(userId, messageId)
and then I would add a secondary index on isRead. My queries will always be on a certain user and not a group of user.
Any ideas on what is better? Or any other modeling ideas?
You can create a table referencing you messages by id for exemple :
CREATE TABLE inbox
(inbox_id uuid,
userId uuid,
messageId timeuuid,
data blob,
isRead boolean,
PRIMARY KEY(inbox_id));
This table store you datas and perform update operations.
Create other tables for search like
CREATE TABLE inbox
(inbox_id uuid,
userId uuid,
messageId timeuuid,
isRead boolean,
PRIMARY KEY((userId, isRead), messageId))
WITH CLUSTERING ORDER BY (isRead ASC, messageId DESC);
Search desired records in this table and update in both tables.

Resources