Apply TTL in column level

Apply TTL in column level - cassandra

Want to know, how to apply TTL in column level.
below query set the TTL at record level
INSERT INTO excelsior.clicks (
userid, url, date, name)
VALUES
(
3715e600-2eb0-11e2-81c1-0800200c9a66,
'http://apache.org',
'2013-10-09', 'Mary'
)
USING TTL 86400;
whereas my requirement is setting TTL for a particular column. Is there any way to achieve this

You can do an INSERT with partial data:
cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> use test;
cqlsh:test> create table test(userid uuid, url text, date text, name text, primary key(userid));
cqlsh:test>
cqlsh:test> insert into test(userid, url, date, name) VALUES
... (
... 3715e600-2eb0-11e2-81c1-0800200c9a66,
... 'http://apache.org',
... '2013-10-09', 'Mary'
... )
... USING TTL 86400;
cqlsh:test>
cqlsh:test> select userid, url, TTL(url), date, TTL(date), name, TTL(name) from test;
userid | url | ttl(url) | date | ttl(date) | name | ttl(name)
--------------------------------------+-------------------+----------+------------+-----------+------+-----------
3715e600-2eb0-11e2-81c1-0800200c9a66 | http://apache.org | 86342 | 2013-10-09 | 86342 | Mary | 86342
(1 rows)
cqlsh:test> insert into test(userid, url ) VALUES (3715e600-2eb0-11e2-81c1-0800200c9a66, 'http://apache.org' ) USING TTL 864000;
cqlsh:test>
cqlsh:test> select userid, url, TTL(url), date, TTL(date), name, TTL(name) from test;
userid | url | ttl(url) | date | ttl(date) | name | ttl(name)
--------------------------------------+-------------------+----------+------------+-----------+------+-----------
3715e600-2eb0-11e2-81c1-0800200c9a66 | http://apache.org | 863992 | 2013-10-09 | 86109 | Mary | 86109
(1 rows)
cqlsh:test>
If you do an insert statement per column, you can set a TTL on each column individually.

Related

Cassnadra - Update/Delete based on timestamp datatype

I have below table structure which houses failed records.
CREATE TABLE if not exists dummy_plan (
id uuid,
payload varchar,
status varchar,
bucket text,
create_date timestamp,
modified_date timestamp,
primary key ((bucket), create_date, id))
WITH CLUSTERING ORDER BY (create_date ASC)
AND COMPACTION = {'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'DAYS',
'compaction_window_size': 1};
My table looks like below
| id | payload | status | bucket | create_date | modified_date |
| abc| text1 | Start | 2021-02-15 | 2021-02-15 08:07:50+0000 | |
Table and records are created and inserted successfully. However after processing, we want to update (if failed) and delete (if successful) record based on Id.
But am facing problem with timestamp where I tried giving same value but it still doesn't deletes/updates.
Seems Cassandra doesn't works with EQ with timestamp.
Please guide.
Thank you in advance.

Cassandra works just fine with the timestamp columns - you can use equality operation on that. But you need to make sure that you include milliseconds into the value, otherwise it won't match:
cqlsh> insert into test.dummy_service_plan_contract (id, create_date, bucket)
values (1, '2021-02-15T11:00:00.123Z', '123');
cqlsh> select * from test.dummy_service_plan_contract;
bucket | create_date | id | modified_date | payload | status
--------+---------------------------------+----+---------------+---------+--------
123 | 2021-02-15 11:00:00.123000+0000 | 1 | null | null | null
(1 rows)
cqlsh> delete from test.dummy_service_plan_contract where bucket = '123' and
id = 1 and create_date = '2021-02-15T11:00:00Z';
cqlsh> select * from test.dummy_service_plan_contract;
bucket | create_date | id | modified_date | payload | status
--------+---------------------------------+----+---------------+---------+--------
123 | 2021-02-15 11:00:00.123000+0000 | 1 | null | null | null
(1 rows)
cqlsh> delete from test.dummy_service_plan_contract where bucket = '123' and
id = 1 and create_date = '2021-02-15T11:00:00.123Z';
cqlsh> select * from test.dummy_service_plan_contract;
bucket | create_date | id | modified_date | payload | status
--------+-------------+----+---------------+---------+--------
(0 rows)
If you don't see the milliseconds in your output in the cqlsh, then you need to configure datetimeformat setting in the .cqlshrc

Cassandra Predicates on non-primary-key columns (eventtype) are not yet supported for non secondary index queries

i developed a table as shown as below with primary key as id which is a uuid type
id | date | eventtype | log | password | priority | sessionid | sourceip | user | useragent
--------------------------------------+--------------------------+--------------+----------+----------+----------+-----------+--------------+------------+------------
6b47e9b0-d11a-11e8-883c-5153f134200b | null | LoginSuccess | demolog | 1234 | 10 | Demo_1 | 123.12.11.11 | Aqib | demoagent
819a58d0-cd3f-11e8-883c-5153f134200b | null | LoginSuccess | demolog | 1234 | 10 | Demo_1 | 123.12.11.11 | Aqib | demoagent
f4fae220-d133-11e8-883c-5153f134200b | 2018-10-01 04:01:00+0000 | LoginSuccess | demolog | 1234 | 10 | Demo_1 | 123.12.11.11 | Aqib | demoagent
But when i try to query some thing like below
select * from loginevents where eventtype='LoginSuccess';
i get an error like below
InvalidRequest: Error from server: code=2200 [Invalid query] message="Predicates on non-primary-key columns (eventtype) are not yet supported for non secondary index queries"
This is my table
cqlsh:events> describe loginevents;
CREATE TABLE events.loginevents (
id uuid PRIMARY KEY,
date timestamp,
eventtype text,
log text,
password text,
priority int,
sessionid text,
sourceip text,
user text,
useragent text
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
How can i solve this

An immediate answer to your question would be to create a secondary index on the column eventtype like this:
CREATE INDEX my_index ON events.loginevents (eventtype);
Then you can filter on this particular column :
SELECT * FROM loginevents WHERE eventtype='LoginSuccess';
However this solution can badly impact the performances of your cluster.
If you come from the SQL world and are new to Cassandra, go read an introduction on cassandra modeling, like this one.
The first thing is to identify the query, then create the table according to.
In Cassandra, data are distributed in the cluster according to the partition key, so reading records that belong to the same partition is very fast.
In your case, maybe a good start would be to group your records based on the eventtype :
CREATE TABLE events.loginevents (
id uuid,
date timestamp,
eventtype text,
log text,
password text,
priority int,
sessionid text,
sourceip text,
user text,
useragent text,
PRIMARY KEY (eventtype, id)
)
Then you can do select like this :
SELECT * FROM loginevents WHERE eventtype='LoginSuccess';
or even :
SELECT * FROM loginevents WHERE eventtype in ('LoginSuccess', 'LoginFailure');
(It's not a perfect model, it definitely needs to be improved before production.)

In Cassandra, you can only query on the PRIMARY key and some of the clustering columns and it's not possible to query on all of the fields.
if you want to query on "eventtype" you should use secondary indexes in the definition of table or index table by Apache Solr and query using Solr.Some things like below:
CREATE INDEX loginevents_type
ON events.loginevents (eventtype);

Understanding Cassandra static field

I learn Cassandra through its documentation. Now I'm learning about batch and static fields.
In their example at the end of the page, they somehow managed to make balance have two different values (-200, -208) even though it's a static field.
Could someone explain to me how this is possible? I've read the whole page but I did not catch on.

In Cassandra static field is static under a partition key.
Example : Let's define a table
CREATE TABLE static_test (
pk int,
ck int,
d int,
s int static,
PRIMARY KEY (pk, ck)
);
Here pk is the partition key and ck is the clustering key.
Let's insert some data :
INSERT INTO static_test (pk , ck , d , s ) VALUES ( 1, 10, 100, 1000);
INSERT INTO static_test (pk , ck , d , s ) VALUES ( 2, 20, 200, 2000);
If we select the data
pk | ck | s | d
----+----+------+-----
1 | 10 | 1000 | 100
2 | 20 | 2000 | 200
here for partition key pk = 1 static field s value is 1000 and for partition key pk = 2 static field s value is 2000
If we insert/update static field s value of partition key pk = 1
INSERT INTO static_test (pk , ck , d , s ) VALUES ( 1, 11, 101, 1001);
Then static field s value will change for all the rows of the partition key pk = 1
pk | ck | s | d
----+----+------+-----
1 | 10 | 1001 | 100
1 | 11 | 1001 | 101
2 | 20 | 2000 | 200

In a table that uses clustering columns, non-clustering columns can be declared static in the table definition. Static columns are only static within a given partition.
Example:
CREATE TABLE test (
partition_column text,
static_column text STATIC,
clustering_column int,
PRIMARY KEY (partition_column , clustering_column)
);
INSERT INTO test (partition_column, static_column, clustering_column) VALUES ('key1', 'A', 0);
INSERT INTO test (partition_column, clustering_column) VALUES ('key1', 1);
SELECT * FROM test;
Results:
primary_column | clustering_column | static_column
----------------+-------------------+--------------
key1 | 0 | A
key1 | 1 | A
Observation:
Once declared static, the column inherits the value from given partition key
Now, lets insert another record
INSERT INTO test (partition_column, static_column, clustering_column) VALUES ('key1', 'C', 2);
SELECT * FROM test;
Results:
primary_column | clustering_column | static_column
----------------+-------------------+--------------
key1 | 0 | C
key1 | 1 | C
key1 | 2 | C
Observation:
If you update the static key, or insert another record with updated static column value, the value is reflected across all the columns ==> static column values are static (constant) across given partition column
Restriction (from the DataStax reference documentation below):
A table that does not define any clustering columns cannot have a static column. The table having no clustering columns has a one-row partition in which every column is inherently static.
A table defined with the COMPACT STORAGE directive cannot have a static column.
A column designated to be the partition key cannot be static.
Reference : DataStax Reference

In the example on the page you've linked they don't have different values at the same point in time.
They first have the static balance field set to -208 for the whole user1 partition:
user | expense_id | balance | amount | description | paid
-------+------------+---------+--------+-------------+-------
user1 | 1 | -208 | 8 | burrito | False
user1 | 2 | -208 | 200 | hotel room | False
Then they apply a batch update statement that sets the balance value to -200:
BEGIN BATCH
UPDATE purchases SET balance=-200 WHERE user='user1' IF balance=-208;
UPDATE purchases SET paid=true WHERE user='user1' AND expense_id=1 IF paid=false;
APPLY BATCH;
This updates the balance field for the whole user1 partition to -200:
user | expense_id | balance | amount | description | paid
-------+------------+---------+--------+-------------+-------
user1 | 1 | -200 | 8 | burrito | True
user1 | 2 | -200 | 200 | hotel room | False
The point of a static fields is that you can update/change its value for the whole partition at once. So if I would execute the following statement:
UPDATE purchases SET balance=42 WHERE user='user1'
I would get the following result:
user | expense_id | balance | amount | description | paid
-------+------------+---------+--------+-------------+-------
user1 | 1 | 42 | 8 | burrito | True
user1 | 2 | 42 | 200 | hotel room | False

CQL--get records by id and latest timestamp

I am very new to cassandra so it might sound like a newbie question.
I am running cqlsh 5.0.1 | Cassandra 2.1.4 on local.
I have a table like below:
CREATE TABLE master (
id uuid,
creation timestamp,
event_type text,
name text,
PRIMARY KEY(id,creation)
);
...and the records are:
id | creation | event_type | name
--------------------------------------+--------------------------+------------+------------------
305abd6d-34b8-4f36-96c6-9ea0c11be952 | 2015-04-15 14:01:54-0400 | create | test2
305abd6d-34b8-4f36-96c6-9ea0c11be952 | 2015-04-15 14:03:03-0400 | update | test2 update
7440c51c-6441-44fb-833b-6140fbe822eb | 2015-04-15 14:01:54-0400 | create | test3
7440c51c-6441-44fb-833b-6140fbe822eb | 2015-04-15 14:03:44-0400 | update | test3 update
7440c51c-6441-44fb-833b-6140fbe822eb | 2015-04-15 14:04:34-0400 | update | test3 2nd update
bf42a120-dec1-47d8-bde2-c0d76f1c93a5 | 2015-04-15 14:01:54-0400 | create | test1
How can i select all the records with distinct ids and last modified timestamp.
the result should be like:
305abd6d-34b8-4f36-96c6-9ea0c11be952 | 2015-04-15 14:03:03-0400 | update | test2 update
7440c51c-6441-44fb-833b-6140fbe822eb | 2015-04-15 14:04:34-0400 | update | test3 2nd update
bf42a120-dec1-47d8-bde2-c0d76f1c93a5 | 2015-04-15 14:01:54-0400 | create | test1

Given your current structure, you won't be able to select any other columns aside from id with a DISTINCT query. You can create another query table with just id as the PK, then run a basic SELECT on that (it should always keep the last modified date)
CREATE TABLE querytable (
id uuid,
creation timestamp,
event_type text,
name text,
PRIMARY KEY(id)
);
SELECT * from querytable --should only contain unique ID's and the last updated creation date.
You'll have to update this table as you update the master as well.

Range query - Data modeling for time series in CQL Cassandra

I have a table like this:
CREATE TABLE test ( partitionkey text, rowkey text, date
timestamp, policyid text, policyname text, primary key
(partitionkey, rowkey));
with some data:
partitionkey | rowkey | policyid | policyname | date
p1 | r1 | pl1 | plicy1 | 2007-01-02 00:00:00+0000
p1 | r2 | pl2 | plicy2 | 2007-01-03 00:00:00+0000
p2 | r3 | pl3 | plicy3 | 2008-01-03 00:00:00+0000
I want to be able to find:
1/ data from a particular partition key
2/ data from a particular partition key & rowkey
3/ Range query on date given a partitionkey
1/ and 2/ are trivial:
select * from test where partitionkey='p1';
partitionkey | rowkey | policyid | policyname | range
p1 | r1 | pl1 | plicy1 | 2007-01-02 00:00:00+0000
p1 | r2 | pl2 | plicy2 | 2007-01-03 00:00:00+0000
but what about 3/?
Even with an index it doesnt work:
create index i1 on test (date);
select * from test where partitionkey='p1' and date =
'2007-01-02';
partitionkey | rowkey | policyid | policyname | date
p1 | r1 | pl1 plicy1 | 2007-01-02 00:00:00+0000
but
select * from test where partitionkey='p1' and
date > '2007-01-02';
Bad Request: No indexed columns present in
by-columns clause with Equal operator
Any idea?
thanks,
Matt

CREATE TABLE test ( partitionkey text, rowkey text, date timestamp,
policyid text, policyname text, primary key (partitionkey, rowkey));
First of all, you really should use more descriptive column names instead of partitionkey and rowkey (and even date, for that matter). By looking at those column names, I really can't tell what kind of data this table is supposed to be indexed by.
select * from test where partitionkey='p1' and date > '2007-01-02';
Bad Request: No indexed columns present in by-columns clause with Equal operator
As for this issue, try making your "date" column a part of your primary key.
primary key (partitionkey, rowkey, date)
Once you do that, I think your date range queries will function appropriately.
For more information on this, check out DataStax Academy's (free) course called Java Development With Apache Cassandra. Session 5, Module 104 discusses how to model time series data and that should help you out.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Apply TTL in column level - cassandra

Related

Cassnadra - Update/Delete based on timestamp datatype

Cassandra Predicates on non-primary-key columns (eventtype) are not yet supported for non secondary index queries

Understanding Cassandra static field

CQL--get records by id and latest timestamp

Range query - Data modeling for time series in CQL Cassandra

Categories

Resources