Create Cassandra CQL with IN and ORDER BY

Create Cassandra CQL with IN and ORDER BY - cassandra

I need a CQL to get all rows from the table based on set of current user friends (I'm using IN for that) and sort them based on created date.
I'm trying to play with key and clustering key, but got no ideas.
Here is my Cassandra table:
CREATE TABLE chat.news_feed(
id_news_feed uuid,
id_user_sent uuid,
first_name text,
last_name text,
security int,
news_feed text,
image blob,
image_preview text,
image_name text,
image_length int,
image_resolution text,
is_image int,
created_date timestamp,
PRIMARY KEY ((id_news_feed, id_user_sent), created_date))
WITH CLUSTERING ORDER BY (created_date DESC) AND comment = 'List of all news feed by link id';
and here is my CQL (formed in Java):
SELECT JSON id_news_feed, first_name, last_name, id_user_sent, news_feed, image_name, image_preview, image_length, created_date, is_image, image_resolution FROM chat.news_feed WHERE id_user_sent in (b3306e3f-1f1d-4a87-8a64-e22d46148316,b3306e3f-1f1d-4a87-8a64-e22d46148316) ALLOW FILTERING;
I coul not run it cause there is no key in my WHERE part of CQL.
Is there any way how I could get all rows created by set of users with Order By (I tried to create table different ways, but no results yet)?
Thank you!

Unlike the relational databases here you will probably need denormalization of the tables. First of all, you cannot effectively query everything from a single table. Also Cassandra does not support joins natively. I suggest to split up your table into several.
Let's start with the friends: the current user id should be part of the primary key and the friends should go as a clustering column.
CREATE TABLE chat.user_friends (
user_id uuid,
friend_id uuid,
first_name text,
last_name text,
security int,
PRIMARY KEY ((user_id), friend_id));
Now you can find the friend for each particular user by querying as follows:
SELECT * FROM chat.user_friends WHERE user_id = 'a001-...';
or
SELECT * FROM chat.user_friends WHERE user_id = 'a001-...' and friend_id in ('a121-...', 'a156-...', 'a344-...');
Next let's take care of news feed: before putting remaining columns into this table I'd think about the desired query against this table. The news feeds needs to be filtered by the user ids with IN listing and at the same time be sortable by time. So we put the created_date timestamp as a clustering key and friends' user_id as a partitioning key. Note that the timestamps will be sorted per user_id not globally (you can re-sort those on the client side). What's really important is to keep news_feed_id out of the primary key. This column still may contain uuid which is unique, but as long as we don't want to query this table to get a particular news feed by id. For this purpose We'd anyway require separate table (denormalization of the data) or materialized view (which I will not cover in this answer but is quite nice solution for some types of denormalization introduced in Cassandra 3.0).
Here is the updated table:
CREATE TABLE chat.news_feed(
id_user_sent uuid,
first_name text,
last_name text,
security int,
id_news_feed uuid,
news_feed text,
image blob,
image_preview text,
image_name text,
image_length int,
image_resolution text,
is_image int,
created_date timestamp,
PRIMARY KEY ((id_user_sent), created_date))
WITH CLUSTERING ORDER BY (created_date DESC) AND comment = 'List of all news feed by link id';
Some example dataset:
cqlsh:ks_test> select * from news_feed ;
id_user_sent | created_date | first_name | id_news_feed | image | image_length | image_name | image_preview | image_resolution | is_image | last_name | news_feed | security
--------------------------------------+---------------------------------+------------+--------------------------------------+-------+--------------+------------+---------------+------------------+----------+-----------+-----------+----------
01b9b9e8-519c-4578-b747-77c8d9c4636b | 2017-02-23 00:00:00.000000+0000 | null | fd25699c-78f1-4aee-913a-00263912fe18 | null | null | null | null | null | null | null | null | null
9bd23d16-3be3-4e27-9a47-075b92203006 | 2017-02-21 00:00:00.000000+0000 | null | e5d394d3-b67f-4def-8f1e-df781130ea22 | null | null | null | null | null | null | null | null | null
6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-25 00:00:00.000000+0000 | null | ec34c655-7251-4af8-9718-3475cad18b29 | null | null | null | null | null | null | null | null | null
6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-22 00:00:00.000000+0000 | null | 5342bbad-0b55-4f44-a2e9-9f285d16868f | null | null | null | null | null | null | null | null | null
6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-20 00:00:00.000000+0000 | null | beea0c24-f9d6-487c-a968-c9e088180e73 | null | null | null | null | null | null | null | null | null
63003200-91c0-47ba-9096-6ec1e35dc7a0 | 2017-02-21 00:00:00.000000+0000 | null | a0fba627-d6a7-463c-a00c-dd0472ad10c5 | null | null | null | null | null | null | null | null | null
And the filtered one:
cqlsh:ks_test> select * from news_feed where id_user_sent in (01b9b9e8-519c-4578-b747-77c8d9c4636b, 6e05257d-9278-4353-b580-711e62ade8d4) and created_date >= '2017-02-22';
id_user_sent | created_date | first_name | id_news_feed | image | image_length | image_name | image_preview | image_resolution | is_image | last_name | news_feed | security
--------------------------------------+---------------------------------+------------+--------------------------------------+-------+--------------+------------+---------------+------------------+----------+-----------+-----------+----------
01b9b9e8-519c-4578-b747-77c8d9c4636b | 2017-02-25 00:00:00.000000+0000 | null | 26dc0952-0636-438f-8a26-6a3fef4fb808 | null | null | null | null | null | null | null | null | null
01b9b9e8-519c-4578-b747-77c8d9c4636b | 2017-02-23 00:00:00.000000+0000 | null | fd25699c-78f1-4aee-913a-00263912fe18 | null | null | null | null | null | null | null | null | null
6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-25 00:00:00.000000+0000 | null | ec34c655-7251-4af8-9718-3475cad18b29 | null | null | null | null | null | null | null | null | null
6e05257d-9278-4353-b580-711e62ade8d4 | 2017-02-22 00:00:00.000000+0000 | null | 5342bbad-0b55-4f44-a2e9-9f285d16868f | null | null | null | null | null | null | null | null | null
P.S. As you might notice we got rid of the ALLOW FILTERING clause. Don't use ALLOW FILTERING in any application as it has significant performance penalty. This is only usable to look up some small chunk of data scattered around in different partitions.

Related

Cassnadra - Update/Delete based on timestamp datatype

I have below table structure which houses failed records.
CREATE TABLE if not exists dummy_plan (
id uuid,
payload varchar,
status varchar,
bucket text,
create_date timestamp,
modified_date timestamp,
primary key ((bucket), create_date, id))
WITH CLUSTERING ORDER BY (create_date ASC)
AND COMPACTION = {'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'DAYS',
'compaction_window_size': 1};
My table looks like below
| id | payload | status | bucket | create_date | modified_date |
| abc| text1 | Start | 2021-02-15 | 2021-02-15 08:07:50+0000 | |
Table and records are created and inserted successfully. However after processing, we want to update (if failed) and delete (if successful) record based on Id.
But am facing problem with timestamp where I tried giving same value but it still doesn't deletes/updates.
Seems Cassandra doesn't works with EQ with timestamp.
Please guide.
Thank you in advance.

Cassandra works just fine with the timestamp columns - you can use equality operation on that. But you need to make sure that you include milliseconds into the value, otherwise it won't match:
cqlsh> insert into test.dummy_service_plan_contract (id, create_date, bucket)
values (1, '2021-02-15T11:00:00.123Z', '123');
cqlsh> select * from test.dummy_service_plan_contract;
bucket | create_date | id | modified_date | payload | status
--------+---------------------------------+----+---------------+---------+--------
123 | 2021-02-15 11:00:00.123000+0000 | 1 | null | null | null
(1 rows)
cqlsh> delete from test.dummy_service_plan_contract where bucket = '123' and
id = 1 and create_date = '2021-02-15T11:00:00Z';
cqlsh> select * from test.dummy_service_plan_contract;
bucket | create_date | id | modified_date | payload | status
--------+---------------------------------+----+---------------+---------+--------
123 | 2021-02-15 11:00:00.123000+0000 | 1 | null | null | null
(1 rows)
cqlsh> delete from test.dummy_service_plan_contract where bucket = '123' and
id = 1 and create_date = '2021-02-15T11:00:00.123Z';
cqlsh> select * from test.dummy_service_plan_contract;
bucket | create_date | id | modified_date | payload | status
--------+-------------+----+---------------+---------+--------
(0 rows)
If you don't see the milliseconds in your output in the cqlsh, then you need to configure datetimeformat setting in the .cqlshrc

typeOrm unique row

I'm trying to make a Entity using typeOrm on my NestJS, and it's not working as I expected.
I have the following entity
#Entity('TableOne')
export class TableOneModel {
#PrimaryGeneratedColumn()
id: number
#PrimaryColumn()
tableTwoID: number
#PrimaryColumn()
tableThreeID: number
#CreateDateColumn()
createdAt?: Date
}
This code generate a migration that generates a table like the example below
+--------------+-------------+------+-----+----------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+----------------------+-------+
| id | int(11) | NO | | NULL | |
| tableTwoID | int(11) | NO | | NULL | |
| tableThreeID | int(11) | NO | | NULL | |
| createdAt | datetime(6) | NO | | CURRENT_TIMESTAMP(6) | |
+--------------+-------------+------+-----+----------------------+-------+
That's ok, the problem is, that I want to the table only allow one row with tableTwoID and tableThreeID, what should I use in the Entity to generated the table as I expected it to be?
Expected to not allow rows like the example below
+----+------------+--------------+----------------------------+
| id | tableTwoID | tableThreeID | createdAt |
+----+------------+--------------+----------------------------+
| 1 | 1 | 1 | 2019-10-30 19:27:43.054844 |
| 2 | 1 | 1 | 2019-10-30 19:27:43.819174 | <- should not allow the insert of this row
+----+------------+--------------+----------------------------+

Try marking the column as Unique
#Unique()
ColumnName

This is currently expected behavior from TypeORM. According to the documentation if you have multiple #PrimaryColumn() decorators you create a composite key. The combination of the composite key columns must be unique (in your above '1' + '1' + '1' = '111' vs '2' + '1' + '1' = '211'). If you are looking to make each column unique along with being a composite primary key, you should be able to do something like #PrimaryColumn({ unique: true })

follower/following in cassandra

We are designing a twitter like follower/following in Cassandra, and found something similar
from here https://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376/13-Data_Model_simplified_13
so I think ItemLike is a table?
itemid1=>(userid1, userid2...) is a row in the table?
what do you think is the create table of this ItemLike table?

Yes, ItemLike is a table
Schema of the ItemLike table will be Like :
CREATE TABLE itemlike(
itemid bigint,
userid bigint,
timeuuid timeuuid,
PRIMARY KEY(itemid, userid)
);
The picture of the slide is the internal structure of the above table.
Let's insert some data :
itemid | userid | timeuuid
--------+--------+--------------------------------------
2 | 100 | f172e3c0-67a6-11e7-8e08-371a840aa4bb
2 | 103 | eaf31240-67a6-11e7-8e08-371a840aa4bb
1 | 100 | d92f7e90-67a6-11e7-8e08-371a840aa4bb
Internally cassandra will store the data like below :
--------------------------------------------------------------------------------------|
| | 100:timeuuid | 103:timeuuid |
| +---------------------------------------+----------------------------------------|
|2 | f172e3c0-67a6-11e7-8e08-371a840aa4bb | eaf31240-67a6-11e7-8e08-371a840aa4bb |
--------------------------------------------------------------------------------------|
---------------------------------------------|
| | 100:timeuuid |
| +---------------------------------------|
|1 | d92f7e90-67a6-11e7-8e08-371a840aa4bb |
---------------------------------------------|

spark dataframe save to SQL table with auto increment column

I have the following table in db
+----------------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| VERSION | bigint(20) | NO | | NULL | |
| user_id | bigint(20) | NO | MUL | NULL | |
| measurement_id | bigint(20) | NO | MUL | NULL | |
| day | timestamp | NO | | NULL | |
| hour | tinyint(4) | NO | | NULL | |
| hour_timestamp | timestamp | NO | | NULL | |
| value | bigint(20) | NO | | NULL | |
+----------------+------------+------+-----+---------+----------------+
I'm trying to save spark dataframe that holds multiple rows that have the following case class structure:
case class Record(val id : Int,
val VERSION : Int,
val user_id : Int,
val measurement_id : Int,
val day : Timestamp,
val hour : Int,
val hour_timestamp : Timestamp,
val value : Long )
When I'm trying to save the dataframe to my sql through jdbc driver using:
dataFrame.insertIntoJDBC(...)
I get a primary key violation error:
com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry '1' for key 'PRIMARY'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
I tried to set id=0 as the default value of all the rows and also tried to remove the id field from the case class, neither worked.
Can anyone help?
Thanks,
Tomer

Found it.
I had a sql <-> java column type issue.
According to: https://www.cis.upenn.edu/~bcpierce/courses/629/jdkdocs/guide/jdbc/getstart/mapping.doc.html
bigint sql columns should be represented as Long in java.
After I've changed my case class to:
case class Record(val id: Long,
val VERSION : Long,
val user_id : Long,
val measurement_id : Long,
val day : Timestamp,
val hour : Int,
val hour_timestamp : Timestamp,
val value : Long )
And set a id=0 for all the records in the dataframe it worked.
Thanks

Cassandra: Searching for NULL values

I have a table MACRecord in Cassandra as follows :
CREATE TABLE has.macrecord (
macadd text PRIMARY KEY,
position int,
record int,
rssi1 float,
rssi2 float,
rssi3 float,
rssi4 float,
rssi5 float,
timestamp timestamp
)
I have 5 different nodes each updating a row based on its title i-e node 1 just updates rssi1, node 2 just updates rssi2 etc. This evidently creates null values for other columns.
I cannot seem to be able to a find a query which will give me only those rows which are not null. Specifically i have referred to this post.
I want to be able to query for example like SELECT *FROM MACRecord where RSSI1 != NULL as in MYSQL. However it seems both null values and comparison operators such as != are not supported in CQL.
Is there an alternative to putting NULL values or a special flag?. I am inserting float so unlike strings i cannot insert something like ''. What is a possible workaround for this problem?
Edit :
My data model in MYSQL was like this :
+-----------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+-------------------+-----------------------------+
| MACAdd | varchar(17) | YES | UNI | NULL | |
| Timestamp | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| Record | smallint(6) | YES | | NULL | |
| RSSI1 | decimal(5,2) | YES | | NULL | |
| RSSI2 | decimal(5,2) | YES | | NULL | |
| RSSI3 | decimal(5,2) | YES | | NULL | |
| RSSI4 | decimal(5,2) | YES | | NULL | |
| RSSI5 | decimal(5,2) | YES | | NULL | |
| Position | smallint(6) | YES | | NULL | |
+-----------+--------------+------+-----+-------------------+-----------------------------+
Each node (1-5) was querying from MYSQL based on its number for example node 1 "SELECT *FROM MACRecord WHERE RSSI1 is not NULL"
I updated my data model in cassandra as follows so that rssi1-rssi5 are now VARCHAR types.
CREATE TABLE has.macrecord (
macadd text PRIMARY KEY,
position int,
record int,
rssi1 text,
rssi2 text,
rssi3 text,
rssi4 text,
rssi5 text,
timestamp timestamp
)
I was thinking that each node would initially insert string 'NULL' for a record and when an actual rssi data comes it will just replace the 'NULL' string so it would avoid having tombstones and would more or less appear to the user that the values are actually not valid pieces of data since they are flagged 'NULL'.
However i am still puzzled as to how i will retrieve results like i have done in MYSQL. There is no != operator in cassandra. How can i write a query which will give me a result set for example like "SELECT *FROM HAS.MACRecord where RSSI1 != 'NULL'" .

You can only select rows in CQL based on the PRIMARY KEY fields, which by definition cannot be null. This also applies to secondary indexes. So I don't think Cassandra will be able to do the filtering you want on the data fields. You could select on some other criteria and then write your client to ignore rows that had null values.
Or you could create a different table for each rssiX value, so that none of them would be null.
If you are only interested in some kind of aggregation, then the null values are treated as zero. So you could do something like this:
SELECT sum(rssi1) WHERE macadd='someadd';
The sum() function is available in Cassandra 2.2.
You might also be able to do some kind of trick with a user defined function/aggregate, but I think it would be simpler to have multiple tables.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Create Cassandra CQL with IN and ORDER BY - cassandra

Related

Cassnadra - Update/Delete based on timestamp datatype

typeOrm unique row

follower/following in cassandra

spark dataframe save to SQL table with auto increment column

Cassandra: Searching for NULL values

Categories

Resources