CQL--get records by id and latest timestamp - cql

I am very new to cassandra so it might sound like a newbie question.
I am running cqlsh 5.0.1 | Cassandra 2.1.4 on local.
I have a table like below:
CREATE TABLE master (
id uuid,
creation timestamp,
event_type text,
name text,
PRIMARY KEY(id,creation)
);
...and the records are:
id | creation | event_type | name
--------------------------------------+--------------------------+------------+------------------
305abd6d-34b8-4f36-96c6-9ea0c11be952 | 2015-04-15 14:01:54-0400 | create | test2
305abd6d-34b8-4f36-96c6-9ea0c11be952 | 2015-04-15 14:03:03-0400 | update | test2 update
7440c51c-6441-44fb-833b-6140fbe822eb | 2015-04-15 14:01:54-0400 | create | test3
7440c51c-6441-44fb-833b-6140fbe822eb | 2015-04-15 14:03:44-0400 | update | test3 update
7440c51c-6441-44fb-833b-6140fbe822eb | 2015-04-15 14:04:34-0400 | update | test3 2nd update
bf42a120-dec1-47d8-bde2-c0d76f1c93a5 | 2015-04-15 14:01:54-0400 | create | test1
How can i select all the records with distinct ids and last modified timestamp.
the result should be like:
305abd6d-34b8-4f36-96c6-9ea0c11be952 | 2015-04-15 14:03:03-0400 | update | test2 update
7440c51c-6441-44fb-833b-6140fbe822eb | 2015-04-15 14:04:34-0400 | update | test3 2nd update
bf42a120-dec1-47d8-bde2-c0d76f1c93a5 | 2015-04-15 14:01:54-0400 | create | test1

Given your current structure, you won't be able to select any other columns aside from id with a DISTINCT query. You can create another query table with just id as the PK, then run a basic SELECT on that (it should always keep the last modified date)
CREATE TABLE querytable (
id uuid,
creation timestamp,
event_type text,
name text,
PRIMARY KEY(id)
);
SELECT * from querytable --should only contain unique ID's and the last updated creation date.
You'll have to update this table as you update the master as well.

Related

Last record each group in cassandra

I has a table with schema:
create table last_message_by_group
(
date date,
created_at timestamp,
message text,
group_id bigint,
primary key (date, created_at, message_id)
)
with clustering order by (created_at desc)
and data should be:
| date | created_at | message | group_id |
| 2021-05-11 | 7:23:54 | ddd | 1 |
| 2021-05-11 | 6:21:43 | ccc | 1 |
| 2021-05-11 | 5:35:16 | bbb | 2 |
| 2021-05-11 | 4:38:23 | aaa | 2 |
It will show messages order by created_at desc partition by date.
But the problem is it can not get last message each group likes:
| date | created_at | message | group_id |
| 2021-05-11 | 7:23:54 | ddd | 1 |
| 2021-05-11 | 5:35:16 | bbb | 2 |
created_at is cluster key, so it cant be updated, so I delete and insert new row every new message by group_id, this way make low performance
Is there any way to do that?
I was able to get this to work by making one change to your primary key definition. I added group_id as the first clustering key:
PRIMARY KEY (date, group_id, created_at, message_id)
After inserting the same data, this works:
> SELECT date, group_id, max(created_at), message
FROM last_message_by_group
WHERE date='2021-05-11'
GROUP BY date,group_id;
date | group_id | system.max(created_at) | message
------------+----------+---------------------------------+---------
2021-05-11 | 1 | 2021-05-11 12:23:54.000000+0000 | ddd
2021-05-11 | 2 | 2021-05-11 10:35:16.000000+0000 | bbb
(2 rows)
There's more detail on using CQL's GROUP BY clause in the official docs.
there is one problem, because you changed clustering key, so message will be ordered by group_id first. Any idea for still order by created_at and 1 message each group?
From the document linked above:
the GROUP BY option only accept as arguments primary key column names in the primary key order.
Unfortunately, if we were to adjust the primary key definition to put created_at before group_id, we would also have to group by created_at. That would create a "group" for each unique created_at, which negates the idea behind group_id.
In this case, you may have to decide between having the grouped results in a particular order vs. having them grouped at all. It might also be possible to group the results, but then re-order them appropriately on the application side.

Cassnadra - Update/Delete based on timestamp datatype

I have below table structure which houses failed records.
CREATE TABLE if not exists dummy_plan (
id uuid,
payload varchar,
status varchar,
bucket text,
create_date timestamp,
modified_date timestamp,
primary key ((bucket), create_date, id))
WITH CLUSTERING ORDER BY (create_date ASC)
AND COMPACTION = {'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'DAYS',
'compaction_window_size': 1};
My table looks like below
| id | payload | status | bucket | create_date | modified_date |
| abc| text1 | Start | 2021-02-15 | 2021-02-15 08:07:50+0000 | |
Table and records are created and inserted successfully. However after processing, we want to update (if failed) and delete (if successful) record based on Id.
But am facing problem with timestamp where I tried giving same value but it still doesn't deletes/updates.
Seems Cassandra doesn't works with EQ with timestamp.
Please guide.
Thank you in advance.
Cassandra works just fine with the timestamp columns - you can use equality operation on that. But you need to make sure that you include milliseconds into the value, otherwise it won't match:
cqlsh> insert into test.dummy_service_plan_contract (id, create_date, bucket)
values (1, '2021-02-15T11:00:00.123Z', '123');
cqlsh> select * from test.dummy_service_plan_contract;
bucket | create_date | id | modified_date | payload | status
--------+---------------------------------+----+---------------+---------+--------
123 | 2021-02-15 11:00:00.123000+0000 | 1 | null | null | null
(1 rows)
cqlsh> delete from test.dummy_service_plan_contract where bucket = '123' and
id = 1 and create_date = '2021-02-15T11:00:00Z';
cqlsh> select * from test.dummy_service_plan_contract;
bucket | create_date | id | modified_date | payload | status
--------+---------------------------------+----+---------------+---------+--------
123 | 2021-02-15 11:00:00.123000+0000 | 1 | null | null | null
(1 rows)
cqlsh> delete from test.dummy_service_plan_contract where bucket = '123' and
id = 1 and create_date = '2021-02-15T11:00:00.123Z';
cqlsh> select * from test.dummy_service_plan_contract;
bucket | create_date | id | modified_date | payload | status
--------+-------------+----+---------------+---------+--------
(0 rows)
If you don't see the milliseconds in your output in the cqlsh, then you need to configure datetimeformat setting in the .cqlshrc

Cassandra maintain insert and update timestamp

I have a table which maintains data for products.
I update this table from after making an API call every few mins.
Say I have this table.
create table products(
id int,
name text,
created timestamp,
updated timestamp,
primary_key(id)
);
So I make an API call at time 00:00:00 and I get list of products
[
{id:1, name:'product1'},
{id:2, name:'product2'}
]
I insert it to my cassandra table
id | name | created | updated
1 | Product1 | 1/1/2018 00:00:00| 1/1/2018 00:00:00
2 | Product2 | 1/1/2018 00:00:00| 1/1/2018 00:00:00
Now in the second API call after 10 mins the user has deleted Product1, but created Product3 so I get the API call output as
[
{id:2, name:'product2'},
{id:3, name:'product3'}
]
I will upsert this in my cassandra table(since I don't want to check if the record exists before deciding whether to insert or update) but I want to maintain the created time of product2 in my upset query so that I know when the product was created and what was the lifetime of the product.
Is it possible for me to maintain the created timestamp by setting it now() without specifying it in the query and then I keep updating the updatetime value?
My table should look like this
id | name | created | updated
1 | Product1 | 1/1/2018 00:00:00 | 1/1/2018 00:00:00
2 | Product2 | 1/1/2018 00:00:00 | 1/1/2018 00:00:10
3 | Product3 | 1/1/2018 00:00:10 | 1/1/2018 00:00:10
TL;DR; Yes.
You'll have to know what columns to invoke during the upsert, that will be the only logic that will be pushed to the application side, but the data model can work exactly as you expected.
CREATE TABLE KS.products (
id int PRIMARY KEY,
created timestamp,
name text,
updated timestamp
)
INSERT INTO products (id, name, created, updated)
VALUES (1, 'TTzV2', toTimeStamp(now()), toTimeStamp(now()));
INSERT INTO products (id, name, created, updated)
VALUES (2, 'smLL301', toTimeStamp(now()), toTimeStamp(now()));
SELECT * FROM KS.products;
id | created | name | updated
----+---------------------------------+---------+---------------------------------
1 | 2018-09-28 17:22:38.502000+0000 | TTzV2 | 2018-09-28 17:22:38.502000+0000
2 | 2018-09-28 17:22:39.180000+0000 | smLL301 | 2018-09-28 17:22:39.180000+0000
Now after the product is updated you'll need to specialize your API to call something else, but with the following CQL you can get what you expected:
INSERT INTO products (id, updated) VALUES (2, toTimeStamp(now()));
INSERT INTO products (id, name, created, updated)
VALUES (3, 'Gt3X', toTimeStamp(now()), toTimeStamp(now()));
SELECT * FROM KS.products;
id | created | name | updated
----+---------------------------------+---------+---------------------------------
1 | 2018-09-28 17:22:38.502000+0000 | TTzV2 | 2018-09-28 17:22:38.502000+0000
2 | 2018-09-28 17:22:39.180000+0000 | smLL301 | 2018-09-28 17:25:38.208000+0000
3 | 2018-09-28 17:25:38.774000+0000 | Gt3X | 2018-09-28 17:25:38.774000+0000

follower/following in cassandra

We are designing a twitter like follower/following in Cassandra, and found something similar
from here https://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376/13-Data_Model_simplified_13
so I think ItemLike is a table?
itemid1=>(userid1, userid2...) is a row in the table?
what do you think is the create table of this ItemLike table?
Yes, ItemLike is a table
Schema of the ItemLike table will be Like :
CREATE TABLE itemlike(
itemid bigint,
userid bigint,
timeuuid timeuuid,
PRIMARY KEY(itemid, userid)
);
The picture of the slide is the internal structure of the above table.
Let's insert some data :
itemid | userid | timeuuid
--------+--------+--------------------------------------
2 | 100 | f172e3c0-67a6-11e7-8e08-371a840aa4bb
2 | 103 | eaf31240-67a6-11e7-8e08-371a840aa4bb
1 | 100 | d92f7e90-67a6-11e7-8e08-371a840aa4bb
Internally cassandra will store the data like below :
--------------------------------------------------------------------------------------|
| | 100:timeuuid | 103:timeuuid |
| +---------------------------------------+----------------------------------------|
|2 | f172e3c0-67a6-11e7-8e08-371a840aa4bb | eaf31240-67a6-11e7-8e08-371a840aa4bb |
--------------------------------------------------------------------------------------|
---------------------------------------------|
| | 100:timeuuid |
| +---------------------------------------|
|1 | d92f7e90-67a6-11e7-8e08-371a840aa4bb |
---------------------------------------------|

Range query - Data modeling for time series in CQL Cassandra

I have a table like this:
CREATE TABLE test ( partitionkey text, rowkey text, date
timestamp, policyid text, policyname text, primary key
(partitionkey, rowkey));
with some data:
partitionkey | rowkey | policyid | policyname | date
p1 | r1 | pl1 | plicy1 | 2007-01-02 00:00:00+0000
p1 | r2 | pl2 | plicy2 | 2007-01-03 00:00:00+0000
p2 | r3 | pl3 | plicy3 | 2008-01-03 00:00:00+0000
I want to be able to find:
1/ data from a particular partition key
2/ data from a particular partition key & rowkey
3/ Range query on date given a partitionkey
1/ and 2/ are trivial:
select * from test where partitionkey='p1';
partitionkey | rowkey | policyid | policyname | range
p1 | r1 | pl1 | plicy1 | 2007-01-02 00:00:00+0000
p1 | r2 | pl2 | plicy2 | 2007-01-03 00:00:00+0000
but what about 3/?
Even with an index it doesnt work:
create index i1 on test (date);
select * from test where partitionkey='p1' and date =
'2007-01-02';
partitionkey | rowkey | policyid | policyname | date
p1 | r1 | pl1 plicy1 | 2007-01-02 00:00:00+0000
but
select * from test where partitionkey='p1' and
date > '2007-01-02';
Bad Request: No indexed columns present in
by-columns clause with Equal operator
Any idea?
thanks,
Matt
CREATE TABLE test ( partitionkey text, rowkey text, date timestamp,
policyid text, policyname text, primary key (partitionkey, rowkey));
First of all, you really should use more descriptive column names instead of partitionkey and rowkey (and even date, for that matter). By looking at those column names, I really can't tell what kind of data this table is supposed to be indexed by.
select * from test where partitionkey='p1' and date > '2007-01-02';
Bad Request: No indexed columns present in by-columns clause with Equal operator
As for this issue, try making your "date" column a part of your primary key.
primary key (partitionkey, rowkey, date)
Once you do that, I think your date range queries will function appropriately.
For more information on this, check out DataStax Academy's (free) course called Java Development With Apache Cassandra. Session 5, Module 104 discusses how to model time series data and that should help you out.

Resources