Cassandra - How to make table for friend accept or not - cassandra

I want to make table like this,
CREATE TABLE friendInvite {
user text,
invitee text,
accepted boolean
PRIMARY(invitee, user)
}
and expected queries are
1. SELECT * FROM friendInvite WHERE invitee="me" and accepted=false
2. UPDATE friendInvite SET accepted=true WHERE invitee="me" and user="you"
I think the query #1 is not good at performance because of accepted condition.
How can i handle this on Cassandra?
I cannot imagine using secondary index for accepted. because it will be updated to true if invitee accepts the offer. Is it ok If i use secondary index for accepted column?

I would create 2 tables.
CREATE TABLE friendInvites {
user text,
invitee text,
PRIMARY(invitee, user)
}
This table holds open friend requests and serves your query #1:
1. SELECT * FROM friendInvite WHERE invitee="me"
Then i would create a second table, where you store the accepted friend requests:
CREATE TABLE acceptedRequests {
user text,
invitee text,
PRIMARY(user, invitee)
}
When you accept a request, the entry has to be removed from friendInvites and inserted into acceptedRequests

Maybe you can use a materialized view that allow your first query to be executed without performance loss :
Your table :
CREATE TABLE friendInvite (
user text,
invitee text,
accepted boolean
PRIMARY KEY (invitee, user)
);
The materialized view :
CREATE MATERIALIZED VIEW friendInviteByInviteeAndAccepted
AS
SELECT *
FROM friendInvite
WHERE invitee IS NOT NULL
AND user IS NOT NULL
AND accepted IS NOT NULL
PRIMARY KEY ((invitee, accepted), user)
WITH CLUSTERING ORDER BY (user ASC);
You can perform updates on the first table friendInvite and you can read your datas from the materialized view. When you update datas in table, Cassandra will automaticly update the materialized view.
This feature is available in cassandra >= 3.0

Related

In Cassandra does creating a tables with multiple columns take more space compared to multiple tables?

I have 6 tables in my database each consisting of approximate 12-15 columns and they have relationship with its id to main_table. I have to migrate my database to cassandra so I have a question should I create one main_table with consisting multiple columns or different table as in my mysql database.
Will creatimg multiple column take more space or multiple table will take more space
Your line of questioning is flawed. It is a common mistake for DBAs who only have a background in traditional relational databases to view data as normalised tables.
When you switch to NoSQL, you are doing it because you are trying to solve a problem that traditional RDBMS can't. A paradigm shift is required since you can't just migrate relational tables the way they are, otherwise you're back to where you started.
The principal philosophy of data modelling in Cassandra is that you need to design a CQL table for each application query. It is a one-to-one mapping between app queries and CQL tables. The crucial point is you need to start with the app queries, not the tables.
Let us say that you have an application that stores information about users which include usernames, email addresses, first/last name, phone numbers, etc. If you have an app query like "get the email address for username X", it means that you need a table of email addresses and the schema would look something like:
CREATE TABLE emails_by_username (
username text,
email text,
firstname text,
lastname text,
...
PRIMARY KEY(username)
)
You would then query this table with:
SELECT email FROM emails_by_username WHERE username = ?
Another example is where you have an app query like "get the first and last names for a user where email address is Y". You need a table of users partitioned by email instead:
CREATE TABLE users_by_email (
email text,
firstname text,
lastname text,
...
PRIMARY KEY(email)
)
You would query the table with:
SELECT firstname, lastname FROM users_by_email WHERE email = ?
Hopefully with these examples you can see that the disk space consumption is completely irrelevant. What is important is that you design your tables so they are optimised for the application queries. Cheers!

Delete from denormalized table by id?

I'm trying to delete from records from cassandra using only part of the primary key, this is because I'm trying to delete all videos for certain tags for a given video, however I no longer have the original tags
Schema
Original Killr Video Table
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Denormalized Video by tag
CREATE TABLE videos_by_tag (
tag text,
videoid uuid,
added_date timestamp,
name text,
preview_image_location text,
tagged_date timestamp,
PRIMARY KEY (tag, videoid)
);
I tried
DELETE FROM videos_by_tag WHERE videoid='SOMEUUID';
but cassandra complaints that I'm missing the tag part of the PK, I know this however how could I ever delete records from this sort of denormalized table if I no longer know the full PK ?
Maybe you can use materialized view.
In this case, you have a master table like :
CREATE TABLE videos (
tag text,
videoid uuid,
added_date timestamp,
name text,
preview_image_location text,
tagged_date timestamp,
PRIMARY KEY (videoid)
);
You are able to delete with yours delete statement :
DELETE FROM videos WHERE videoid='SOMEUUID';
If you want to read data with others criteria , create materialized view :
CREATE MATERIALIZED VIEW
AS
SELECT *
FROM videos_by_tags
WHERE videoid IS NOT NULL
AND tag IS NOT NULL
PRIMARY KEY (tag, videoid);
In this case if you insert, update or delete data in master table (videos).
Associated materialized view will be affected too. So wirte operations must be perform on one table and read on many tables.
When I first created that schema, I was thinking in terms of managing records via application logic. By that, using application workflows to manage the data stored in the database. When you upload a new video, the videos table will get a record and so will videos_by_user and videos_by_tag. When updating, again all three may need to be updated.
In the case of delete, I would expect the application to present the user with a "Delete video?" option. Once that action is taken, the video_id is known and you can use it to delete from any index table. That's following a application workflow data model. Same with tags. If you delete a tag, update videos and videos_by_tag. Preferably with a batch statement to ensure both updates happen.
If you have orphan records of previously deleted videos, then my advice would be to use something like a Spark job to clean things up. Even with a relational DB that would take a fairly interesting sub-query to find all records in tags that have video_ids that don't have a parent record.

How could I create relationships between tables in JetBrains' DataGrip?

I am using DataGrip by JetBrains in my work. It is ok, but I don't know how to create relationships between tables like in this picture:
It's a two step procedure. In the first step, you must modify your table to add foreign key constraint definitions. In the second step, you can display the table diagram.
First, right click on the name of your table in DataGrip, then choose Modify Table. You will see four tabs: Columns, Keys, Indices, and Foreign Keys. Choose the Columns tab. Right click on the column name you want to become a foreign key, and choose New Foreign Key. The window will switch to its Foreign Keys tab with some information filled in. Fill in your "target table". You may also have to write in the target column name in the SQL statement's REFERENCES phrase. Review all the information now in the Modify Table window, and, when satisfied, click "Execute".
Second, right click again on the name of your table in DataGrip, and this time choose Diagrams > Show Visualisation. You should now see a diagram displaying the relations between your original table and the referenced tables.
In DataGrip Help, you can look at the Working with the Database Tool Window page for its Modifying the definition of a table, column, index, or a primary or foreign key section. There is a very short procedure description, there.
Wikipedia has an example in its Defining foreign keys article section that may be useful to you while working in DataGrip's Modify Table window.
I did this procedure in DataGrip 2017.1.3, and I don't know whether other versions vary.
Generally: from the context menu or by pressing Ctrl+Alt+U.
If you have found this picture, one more step was to go deeper in the website and you would get to this page:
https://www.jetbrains.com/datagrip/features/other.html
And there is an explanation how to do it.
Try this small SQL script which creates 3 tables. I think you will find this work well.
CREATE TABLE product (
category INT NOT NULL, id INT NOT NULL,
price DECIMAL,
PRIMARY KEY(category, id)
);
CREATE TABLE customer (
id INT NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE product_order (
no INT NOT NULL AUTO_INCREMENT,
product_category INT NOT NULL,
product_id INT NOT NULL,
customer_id INT NOT NULL,
PRIMARY KEY(no),
INDEX (product_category, product_id),
INDEX (customer_id),
FOREIGN KEY (product_category, product_id)
REFERENCES product(category, id)
ON UPDATE CASCADE ON DELETE RESTRICT,
FOREIGN KEY (customer_id)
REFERENCES customer(id)
) ;

Inverted index: Delete then Insert pattern?

I have an inverted index table on a table of Users. The table allows querying users by last name. It is called "users_by_lastname".
Primary key of this table has "lastname" in it, so it cannot be updated.
If a user changes their last name in the main "Users" table, should I be deleting and re-inserting into the inverted index table, "users_by_last name"?
I cannot update a primary key column in Cassandra... Are there other patterns that handle this better?
In Cassandra 3.0 you can deal with this problem by creating the inverted index table as a materialized view of the Users table. Then Cassandra will take care of maintaining the view automatically whenever you update the base table.
On earlier versions of Cassandra your only option would be to do a delete and then insert with the new last name in the application maintained inverted index table.

CQL Select on column value

I am creating account data in Cassandra. Accounts are most commonly queried based on an account id. However, often the account is queried by a login name. I have created a user table with primary keys (account_id and login_name). Because of this, I have to "ALLOW FILTERING" on the table to query by the login_name.
Is there a better way to create the table that does not have the impact of a filterable table?
A possible approach is to define a new table that has the exact same elements in the primary key but in the reverse order: (login_name, account_id). You can still keep the table you present as the reference table, which stores all the relevant account data, but this new table would allow you more optimized queries based on login_name. I am not sure how much you would win compared with the query "allowing filtering"... But this kind of "data duplication" for query optimization is a normal procedure in NoSQL DBs.
HTH.

Resources