follower/following in cassandra - cassandra

We are designing a twitter like follower/following in Cassandra, and found something similar
from here https://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376/13-Data_Model_simplified_13
so I think ItemLike is a table?
itemid1=>(userid1, userid2...) is a row in the table?
what do you think is the create table of this ItemLike table?

Yes, ItemLike is a table
Schema of the ItemLike table will be Like :
CREATE TABLE itemlike(
itemid bigint,
userid bigint,
timeuuid timeuuid,
PRIMARY KEY(itemid, userid)
);
The picture of the slide is the internal structure of the above table.
Let's insert some data :
itemid | userid | timeuuid
--------+--------+--------------------------------------
2 | 100 | f172e3c0-67a6-11e7-8e08-371a840aa4bb
2 | 103 | eaf31240-67a6-11e7-8e08-371a840aa4bb
1 | 100 | d92f7e90-67a6-11e7-8e08-371a840aa4bb
Internally cassandra will store the data like below :
--------------------------------------------------------------------------------------|
| | 100:timeuuid | 103:timeuuid |
| +---------------------------------------+----------------------------------------|
|2 | f172e3c0-67a6-11e7-8e08-371a840aa4bb | eaf31240-67a6-11e7-8e08-371a840aa4bb |
--------------------------------------------------------------------------------------|
---------------------------------------------|
| | 100:timeuuid |
| +---------------------------------------|
|1 | d92f7e90-67a6-11e7-8e08-371a840aa4bb |
---------------------------------------------|

Related

Renaming a table & keeping connections to existing partitions in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
Does renaming the table, existing partitions attached to that table remain as it is after renaming?
Yes.
yugabyte=# \dt
List of relations
Schema | Name | Type | Owner
--------+-----------------------+-------+----------
public | order_changes | table | yugabyte
public | order_changes_2019_02 | table | yugabyte
public | order_changes_2019_03 | table | yugabyte
public | order_changes_2020_11 | table | yugabyte
public | order_changes_2020_12 | table | yugabyte
public | order_changes_2021_01 | table | yugabyte
public | people | table | yugabyte
public | people1 | table | yugabyte
public | user_audit | table | yugabyte
public | user_credentials | table | yugabyte
public | user_profile | table | yugabyte
public | user_svc_account | table | yugabyte
(12 rows)
yugabyte=# alter table order_changes RENAME TO oc;
ALTER TABLE
yugabyte=# \dS+ oc
Table "public.oc"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------+------+-----------+----------+---------+----------+--------------+-------------
change_date | date | | | | plain | |
type | text | | | | extended | |
description | text | | | | extended | |
Partition key: RANGE (change_date)
Partitions: order_changes_2019_02 FOR VALUES FROM ('2019-02-01') TO ('2019-03-01'),
order_changes_2019_03 FOR VALUES FROM ('2019-03-01') TO ('2019-04-01'),
order_changes_2020_11 FOR VALUES FROM ('2020-11-01') TO ('2020-12-01'),
order_changes_2020_12 FOR VALUES FROM ('2020-12-01') TO ('2021-01-01'),
order_changes_2021_01 FOR VALUES FROM ('2021-01-01') TO ('2021-02-01')
Postgres and therefore YugabyteDB doesn’t actually use the names of an object, it uses the OID (object ID) of an object.
That means that you can rename it, without actually causing any harm, because it’s simply a name in the catalog with the object identified by its OID.
This has other side effects as well: if you create a table, and perform a certain SQL like ‘select count(*) from table’, drop it, and then create a table with the same name, and perform the exact same SQL, you will get two records in pg_stat_statements with identical SQL text. This seems weird from the perspective of databases where the SQL area is shared. In postgres, only pg_stat_statements is shared, there is no SQL cache.
pg_stat_statements does not store the SQL text, it stores the query tree (an internal representation of the SQL), and symbolizes the tree, which makes to appear like SQL again. The query tree uses the OID, and therefore for pg_stat_statements the above two identical SQL texts are different query trees, because the OIDs of the tables are different.

Sequelize - create recond with value that matches another table foreign key's value

I have the following structure for Users_Role table:
| role_id(PK) | role_name |
| ---------------| --------------|
| 1 | Admin |
| 2 | View |
And the following structure for Users table:
| user_id (PK) | user_role_id(FK) |
| -------------| --------------------|
| 12345 | 1 |
| 22434 | 1 |
The tables are connected by user_role_id and role_id
I want to create new records in Users Table, the thing is I only have the user_role value.
Is there a way to shortcut the way to get the value of the user_role_id, or the only way to do it is a seperate query before creating new record in Users?
const userRole = "Admin";
Users.create({
user_id:"1234",
user_role_id: ???
})

Order by in materialized view doesn't sort the results

I have a table with a structure like this:
CREATE TABLE kaefko.se_vi_f55dfeebae00d2b3 (
value text PRIMARY KEY,
id text,
popularity bigint);
With data that looks like this:
value | id | popularity
--------+------------------+------------
rally | 4eff16cb91f96cd6 | 2
reddit | 11aa39686ed66ba5 | 3
red | 552d7e95af481415 | 1
really | 756bfa499965863c | 1
right | c5850c6b08f7966b | 1
redis | 7f1d251f399442d7 | 1
And I've created a materialized view that should sort these values by the popularity from the biggest to the smallest ones:
CREATE MATERIALIZED VIEW kaefko.se_vi_f55dfeebae00d2b3_by_popularity AS
SELECT *
FROM kaefko.se_vi_f55dfeebae00d2b3
WHERE popularity IS NOT null
PRIMARY KEY (value, popularity)
WITH CLUSTERING ORDER BY (popularity DESC);
But the data in the materialized view looks like this:
value | popularity | id
--------+------------+------------------
rally | 2 | 4eff16cb91f96cd6
reddit | 3 | 11aa39686ed66ba5
really | 1 | 756bfa499965863c
right | 1 | c5850c6b08f7966b
redis | 1 | 7f1d251f399442d7
As you can see there are two main issues:
Data is not sorted as defined in the materialized view
There is just a part of all data in the materialized view
I'm not very experienced in Cassandra and I've already spent hours trying to find the reason why this happens with no avail. Could somebody please help me? Thank you <3
__
I'm using ScyllaDB 4.1.9-0 and cqlsh shows this:
[cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4]
Alex's comment is 100% correct, the order is within the partition.
PRIMARY KEY (value, popularity)
WITH CLUSTERING ORDER BY (popularity DESC);
This means that the ordering of popularity is descending only for values where the 'value' field is the same - if I was to alter the data you used to show what this would look like as an example, you would get the following:
value | popularity | id
--------+------------+------------------
rally | 3 | 4eff16cb91f96cd6
rally | 2 | 11aa39686ed66ba5
really | 3 | 756bfa499965863c
really | 2 | c5850c6b08f7966b
really | 1 | 7f1d251f399442d7
The order is on a per partition key basis, not globally ordered.

Caasandra PRIMARY KEY column "user" cannot be restricted as preceding column "eventtype" is not restricted

The table i developed is below one
create table userevent(id uuid,eventtype text,sourceip text,user text,sessionid text,roleid int,menu text,action text,log text,date timestamp,PRIMARY KEY (id,eventtype,user));
id | eventtype | user | action | date | log | menu | roleid | sessionid | sourceip
--------------------------------------+-----------+---------+--------+--------------------------+----------+-----------+--------+-----------+--------------
b15c6780-d69e-11e8-bb9a-59dfa00365c6 | DemoType | Aqib | Login | 2018-10-01 04:05:00+0000 | demolog | demomenu | 1 | Demo_1 | 121.11.11.12
95df3410-d69e-11e8-bb9a-59dfa00365c6 | DemoType | Aqib | Login | 2018-09-30 22:35:00+0000 | demolog | demomenu | 1 | Demo_1 | 121.11.11.12
575b05c0-d69e-11e8-bb9a-59dfa00365c6 | DemoType | Aqib | Login | 2018-10-01 04:05:00+0000 | demolog | demomenu | 1 | Demo_1 | 121.11.11.12
e6cbc190-d69e-11e8-bb9a-59dfa00365c6 | DemoType3 | Jasim | Login | 2018-05-31 22:35:00+0000 | demolog3 | demomenu3 | 3 | Demo_3 | 121.11.11.12
d66992a0-d69e-11e8-bb9a-59dfa00365c6 | DemoType | Shafeer | Login | 2018-07-31 22:35:00+0000 | demolog | demomenu | 2 | Demo_2 | 121.11.11.12
But when i queried as below,
select * from userevent where user='Aqib';
Its showing some thing like this : InvalidRequest: Error from server: code=2200 [Invalid query] message="PRIMARY KEY column "user" cannot be restricted as preceding column "eventtype" is not restricted"
What is the error...........
You need to read about data modelling for Cassandra, or for example take DS220 course on the DataStax Academy. Every row has primary key consisting of the partition key that defines on which node the data is located, and clustering keys that define placement inside partition. In your case, your primary key consists at least from id, eventtype, user. To put condition on user you need to specify both id and eventtype.
You can add the index, or materialized view to access only by user, but I recommend to get more into data modelling first - define your queries, and then build table structures about queries that you need to perform.

Create Cassandra CQL with IN and ORDER BY

I need a CQL to get all rows from the table based on set of current user friends (I'm using IN for that) and sort them based on created date.
I'm trying to play with key and clustering key, but got no ideas.
Here is my Cassandra table:
CREATE TABLE chat.news_feed(
id_news_feed uuid,
id_user_sent uuid,
first_name text,
last_name text,
security int,
news_feed text,
image blob,
image_preview text,
image_name text,
image_length int,
image_resolution text,
is_image int,
created_date timestamp,
PRIMARY KEY ((id_news_feed, id_user_sent), created_date))
WITH CLUSTERING ORDER BY (created_date DESC) AND comment = 'List of all news feed by link id';
and here is my CQL (formed in Java):
SELECT JSON id_news_feed, first_name, last_name, id_user_sent, news_feed, image_name, image_preview, image_length, created_date, is_image, image_resolution FROM chat.news_feed WHERE id_user_sent in (b3306e3f-1f1d-4a87-8a64-e22d46148316,b3306e3f-1f1d-4a87-8a64-e22d46148316) ALLOW FILTERING;
I coul not run it cause there is no key in my WHERE part of CQL.
Is there any way how I could get all rows created by set of users with Order By (I tried to create table different ways, but no results yet)?
Thank you!
Unlike the relational databases here you will probably need denormalization of the tables. First of all, you cannot effectively query everything from a single table. Also Cassandra does not support joins natively. I suggest to split up your table into several.
Let's start with the friends: the current user id should be part of the primary key and the friends should go as a clustering column.
CREATE TABLE chat.user_friends (
user_id uuid,
friend_id uuid,
first_name text,
last_name text,
security int,
PRIMARY KEY ((user_id), friend_id));
Now you can find the friend for each particular user by querying as follows:
SELECT * FROM chat.user_friends WHERE user_id = 'a001-...';
or
SELECT * FROM chat.user_friends WHERE user_id = 'a001-...' and friend_id in ('a121-...', 'a156-...', 'a344-...');
Next let's take care of news feed: before putting remaining columns into this table I'd think about the desired query against this table. The news feeds needs to be filtered by the user ids with IN listing and at the same time be sortable by time. So we put the created_date timestamp as a clustering key and friends' user_id as a partitioning key. Note that the timestamps will be sorted per user_id not globally (you can re-sort those on the client side). What's really important is to keep news_feed_id out of the primary key. This column still may contain uuid which is unique, but as long as we don't want to query this table to get a particular news feed by id. For this purpose We'd anyway require separate table (denormalization of the data) or materialized view (which I will not cover in this answer but is quite nice solution for some types of denormalization introduced in Cassandra 3.0).
Here is the updated table:
CREATE TABLE chat.news_feed(
id_user_sent uuid,
first_name text,
last_name text,
security int,
id_news_feed uuid,
news_feed text,
image blob,
image_preview text,
image_name text,
image_length int,
image_resolution text,
is_image int,
created_date timestamp,
PRIMARY KEY ((id_user_sent), created_date))
WITH CLUSTERING ORDER BY (created_date DESC) AND comment = 'List of all news feed by link id';
Some example dataset:
cqlsh:ks_test> select * from news_feed ;
id_user_sent | created_date | first_name | id_news_feed | image | image_length | image_name | image_preview | image_resolution | is_image | last_name | news_feed | security
--------------------------------------+---------------------------------+------------+--------------------------------------+-------+--------------+------------+---------------+------------------+----------+-----------+-----------+----------
01b9b9e8-519c-4578-b747-77c8d9c4636b | 2017-02-23 00:00:00.000000+0000 | null | fd25699c-78f1-4aee-913a-00263912fe18 | null | null | null | null | null | null | null | null | null
9bd23d16-3be3-4e27-9a47-075b92203006 | 2017-02-21 00:00:00.000000+0000 | null | e5d394d3-b67f-4def-8f1e-df781130ea22 | null | null | null | null | null | null | null | null | null
6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-25 00:00:00.000000+0000 | null | ec34c655-7251-4af8-9718-3475cad18b29 | null | null | null | null | null | null | null | null | null
6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-22 00:00:00.000000+0000 | null | 5342bbad-0b55-4f44-a2e9-9f285d16868f | null | null | null | null | null | null | null | null | null
6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-20 00:00:00.000000+0000 | null | beea0c24-f9d6-487c-a968-c9e088180e73 | null | null | null | null | null | null | null | null | null
63003200-91c0-47ba-9096-6ec1e35dc7a0 | 2017-02-21 00:00:00.000000+0000 | null | a0fba627-d6a7-463c-a00c-dd0472ad10c5 | null | null | null | null | null | null | null | null | null
And the filtered one:
cqlsh:ks_test> select * from news_feed where id_user_sent in (01b9b9e8-519c-4578-b747-77c8d9c4636b, 6e05257d-9278-4353-b580-711e62ade8d4) and created_date >= '2017-02-22';
id_user_sent | created_date | first_name | id_news_feed | image | image_length | image_name | image_preview | image_resolution | is_image | last_name | news_feed | security
--------------------------------------+---------------------------------+------------+--------------------------------------+-------+--------------+------------+---------------+------------------+----------+-----------+-----------+----------
01b9b9e8-519c-4578-b747-77c8d9c4636b | 2017-02-25 00:00:00.000000+0000 | null | 26dc0952-0636-438f-8a26-6a3fef4fb808 | null | null | null | null | null | null | null | null | null
01b9b9e8-519c-4578-b747-77c8d9c4636b | 2017-02-23 00:00:00.000000+0000 | null | fd25699c-78f1-4aee-913a-00263912fe18 | null | null | null | null | null | null | null | null | null
6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-25 00:00:00.000000+0000 | null | ec34c655-7251-4af8-9718-3475cad18b29 | null | null | null | null | null | null | null | null | null
6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-22 00:00:00.000000+0000 | null | 5342bbad-0b55-4f44-a2e9-9f285d16868f | null | null | null | null | null | null | null | null | null
P.S. As you might notice we got rid of the ALLOW FILTERING clause. Don't use ALLOW FILTERING in any application as it has significant performance penalty. This is only usable to look up some small chunk of data scattered around in different partitions.

Resources