Cassandra cannot override column to old value - cassandra

i try to update record in Cassandra using CQL, and noticed for some reason i cannot change the column to its old values, here are the steps i performed,
insert a brand new record with column token set to value1
insert into instrucment(instrument_id, account_id, token) values('CDX-IT-359512FD43D3', 'CDX-IT-970A44E2DAF4','value1') USING TIMESTAMP 1605546853130000
update the record to set column token to value2
insert into instrucment(instrument_id, token) values('CDX-IT-359512FD43D3', 'value2') USING TIMESTAMP 1605546853130000
update the record to set column token back to value1
insert into instrucment(instrument_id, token) values('CDX-IT-359512FD43D3', 'value1') USING TIMESTAMP 1605546853130000
step 1 & 2 worked fine, but step3 failed, DB record showed the column token is still value2, why is that? is that because Cassandra think the value1+ timestamp 1605546853130000 is an old record thus wont' update it ?

You are updating the same row (same partition key) with different values.
Cassandra normally determines the valid record for a row by timestamp. The record with the most recent timestamp 'wins'.
See here for more information how updates work:
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlWriteUpdate.html
Since you are inserting with the same timestamp, you are simulating concurrent writes to the same row, concurrent to the same milisecond. If you are not setting the timestamp explicitly for your inserts, such concurrency is very unlikely.
In such truly concurrent cases Cassandra needs to turn to other methods to determine the 'winner'. Cassandra breaks the timestamp tie by comparing the byte values in a deterministic manner. In your case, the record with value2 wins.

Related

YCQL Secondary indexes on tables with TTL in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
I have a table with TTL and a secondary index, using YugabyteDB 2.9.0 and I’m getting the following error when I try to insert a row:
SyntaxException: Feature Not Supported
Below is my schema:
CREATE TABLE lists.list_table (
item_value text,
list_id uuid,
created_at timestamp,
updated_at timestamp,
is_deleted boolean,
valid_from timestamp,
valid_till timestamp,
metadata jsonb,
PRIMARY KEY ((item_value, list_id))
) WITH default_time_to_live = 0
AND transactions = {'enabled': 'true'};
CREATE INDEX list_created_at_idx ON lists.list_table (list_id, created_at)
WITH transactions = {'enabled': 'true'};
We have two types of queries (80% & 20% distribution):
select * from list_table where list_id= <id> and item_value = <value>
select * from list_table where list_id= <id> and created_at>= <created_at>
We expect per list_id there would be around 1000-10000 entries.
The TTL would be around 1 month.
It is a restriction, it’s currently not supported to transactionally expire rows using TTL out of a table which are indexed (i.e. atomic expiry of TTL entries in both table and index). There are several workarounds to this:
a) In YCQL, we also support an index with a weaker consistency. This is not well documented today, but you can see the details here: https://github.com/YugaByte/yugabyte-db/issues/1696
The main issue to call out when using this variant of index is that error handling (on INSERT failure), is that it is an application side responsibility to retry the INSERT on failure. As noted in the above issue << If an insert/update or batch of such operations fails, it is the app's responsibility to retry the operation so that the index is consistent. Much like in a 2-table case, it would have been the apps responsibility to retry (in case of a failure between the update to the two tables) to make sure both tables are in sync again. >>
This type of index supports a TTL at the table & index level. (which is recommended to keep the same): https://github.com/yugabyte/yugabyte-db/issues/2481#issuecomment-537177471
b)Another workaround is to use a background cleanup job to periodically delete stale records (instead of using TTL).
c)Avoid using indexes and store data in two tables. one organized by the original primary key and one organized by the index columns you wanted (as the primary key). Both tables can have TTL. But it is an application side responsibility to INSERT to both tables when data is added to the database.
The first table's PK would be ((list_id, item_value)), identical to the current main table. nstead of an index you'll have a second table; the second table's PK would be ((list_id), created_at) and both tables would have a TTL. The application must insert the data into both tables. In the 2nd table you have a choice:
(option 1) Duplicate all the columns from the main table here including your JSON columns etc. This makes Q2 lookup fast, the row has everything it needs; but increases your storage requirements.
(option 2): In addition to the PK, just store the item_value column in the second table. For Q2, you must first lookup the 2nd table and get the item_value, and then use list_id and item_value and retrieve the data from the main table (much like an index would do under the covers).
d)Another workaround, is if we could avoid the index and pick the PK to be ((list_id, item_value), created_at).
This would not affect the performance of Q1 because with (where list_id and item_value) provided it can use the PK to find the rows. But it would be slower for Q2 where list_id and created_at are provided because while it can still use list_id, it must filter out the data using the created_at value without the help of an index. So if Q2 is really 20% of your queries, you probably do not want to scan 1 to 10k items to find your matching row.
To clarify option (c), with the example in mind:
The first table's PK would be ((list_id, item_value)); it is the same as your current main table. Instead of an index you'll have a second table; the second table's PK would be ((list_id), created_at).
both tables would have a TTL
The application would have to insert entries into both tables.
In the 2nd table you have a choice:
(option 1) duplicate all the columns from the main table, including your JSON columns etc. This makes Q2 lookup fast, the row has everything it needs; but increases your storage requirements.
(option 2): in addition to the Primary Key, just store the item_value column in the second table. For Q2, you must first lookup the 2nd table and get the item_value, and then use list_id and item_value and retrieve the data from the main table (much like an index would do under the covers)

Fetch more than 2147483647 record from Cassandra

I inherited a Cassandra database with years of data in it. I was tasked to delete all records older than 2 years. I don't know how many rows the table contains, but it is a lot.
The table structure is this:
CREATE TABLE IF NOT EXISTS my_table (
key1 bigint,
key2 text,
"timestamp" timestamp,
some more columns,
PRIMARY KEY ((key1, key2), "timestamp")
) WITH CLUSTERING ORDER BY ("timestamp" DESC);
Since key1 and key2 are partition keys, I cannot simply delete everything with a timestamp < 2 years. You would need to do this per partition key.
So I went ahead and created a small tool in Java based on the async paging pattern described in the manual: https://docs.datastax.com/en/developer/java-driver/4.11/manual/core/paging/
I do a SELECT DISTINCT key1, key2 from my_table;, iterate over the keys, delete rows for those keys older than 2 years, fetch the next page and repeat.
After a few hours, the tool completes and reports it has modified the rows of 2147483647 partitioning keys. That is exactly 2^32-1, the maximum of a signed 32-bit integer. This is probably some limit in Cassandra, because having exact that amount of keys is improbable.
My questions:
How can I fetch ALL of the table?
Is 2147483647 some (configurable) limit and why?
The other strategy would be to start a new table, use a TTL and write to both tables until two years have passed. But I would like to avoid that if I can.
I work at ScyllaDB - Scylla is a Cassandra compatible database.
There is indeed a limitation in Cassandra paging - https://issues.apache.org/jira/browse/CASSANDRA-14683 and it is not yet fixed.
What you can try and do is use the last token returned and continue paging from that state
select distinct token (key1,key2), key1,key2 from my_table ;
and then when the paging ended you would change the query and use the last returned token (as an example)
select distinct token (key1,key2), key1,key2 from my_table where token(key1,key2) >= -3748018335291956378;
(you need to reiterate with >= since multiple pairs maybe mapped to the same token)
PS: Scylla has uplifted this limitation (https://github.com/scylladb/scylla/issues/5101) so we are bound by 2^64 -1

How to find the delta difference for a table in cassandra using uuid column type

I have the following table on my Cassandra db, I want to find the delta difference in terms of cassandra query. For example, if I operate any insert,update,delete operation to the table I should be able to show which row/rows are getting impacted as my final result.
Let's say on first instance I have perform some 10 rows insertions so if I take the delta difference the output should only show that 10 rows are inserted. Same if we modify any number of rows or delete some rows then those changes should be captured.
Next time if we run the query it should idealy give 0 as we have not insert/modify/delete any row/rows
Here is the following table
CREATE TABLE datainv (
datainv_account_id uuid,
datainv_run_id uuid,
id uuid,
datainv_summary text,
json text,
number text,
PRIMARY KEY (datainv_account_id, datainv_run_id));
many things I have searched on internet but most of the solution are based on timeuuid,but in this case I have uuid columns only. So I'm not getting any solution that the same use-case can be achieved using uuid
It's not so easy to generate a diff between 2 table states in Cassandra, because you can't easily detect if you have inserted new partitions or not. You can implement something based on the timeuuid or on the timestamp as clustering column - in this case you'll able to filter out the data since latest change, as you have ordering of values that you don't have with uuid that is completely random. But it still requires that you perform the full scan of all the table. Plus it won't detect deletions...
Theoretically you can implement this with Spark as following:
read all primary key values & store this data in some other table/on disk;
next time, read all primary key values & find difference between original set of primary keys & new set - for example, do full outer join & use presence of None on left as addition, and presence of None on right as deletion;
store new set of the primary keys in a separate table/on disk, but previous version should be truncated.
but it will consume quite a lot of resources.

Data loss in cassandra because of frequent delete and insert of same column in a row

I have a column family posts which is used to store post detail of my facebook account. I am using cassandra 2.0.9 and datastax java driver 3.0.
CREATE TABLE posts (
key blob,
column1 text,
value blob,
PRIMARY KEY ((key), column1)
) WITH COMPACT STORAGE;
where rowkey is my userid, columnkey is postid, value is post json. Whenever i refresh my application in browser, it'll fetch data from facebook and remove and add data for existing postids. Some times i miss some posts from cassandra. May frequent delete and insert in same column of a row causes data loss? How can i manage this?
It's not really dataloss, if you're updating the same column at a very high frequency (like thousands updates/sec) you may have unpredictable result.
Why ? Because Cassandra is using insert timestamp to determine at read time which value is the right one by comparing the timestamp of the same column from different replicas.
Currently, the resolution of the timestamp is the order of milliseconds so if you update rate is very high, for example 2 update on the same column for the same millisecond, the bigger post JSON will win.
By bigger, I mean by using postJson1.compareTo(postJson2). The ordering is determined by the type of your column and in your case it's a String so Cassandra breaks tie by comparing the post JSON data lexicographically.
To avoid this, you can provide the write timestamp at client side by generating yourself an unique timmeuuid().
There are many internatives to generate such TimeUUID, for example by using the Java driver class com.datastax.driver.core.utils.UUIDs.timeBased()

Is this a correct choice for partition and row key in Azure table?

I have a table that stores the online status of a user.
columns(userid, IsOnline,date)
If the user is online, the Isonline bool flag is true, if it goes offline, the IsOnline bool is false. This way I can see between which and which time the user was online.
Would it be fine choosing partitionKey: userId,
Rowkey: the date and time of the event
?
The user can not go on and off at the same time, so rowkey should be unique. I like about this that it keeps all data for a user on the same partition. Also does choosing the date as rowId make sorting more efficient?
UserId is a good strategy for the PartitionKey
With respect to RowKey, I would suggest using "DateTime.MaxValue.Ticks - dateOfEvent.Ticks" formatted to max number of digits.
This will make your RowKey always be in the descending order and thus allow you to pick the latest status of the user without getting data from all of he partition
Sounds reasonable to me. This groups all of a given user's actions together in a single partition. Each action is then delineated by an individual row with the Timestamp for the key.
You might want to keep in mind that every row in Azure Table Storage has a Timestamp column that is populated automatically on create/update. You could consider using this column for your Timestamp but searching/sorting will be slow since it is part of the tertiary data set associated with a Table Storage row.

Resources