Cassandra UPDATE not working after deletion

Cassandra UPDATE not working after deletion - cassandra

I'm using a wide row schema in Cassandra. My table definition is as follows:
CREATE TABLE usertopics (
key text,
topic text,
score counter,
PRIMARY KEY (key, topic)
)
I'm inserting entries using:
UPDATE usertopics SET score = score + ? WHERE key=? AND topic=?
such that if key does not exist it will insert and if it exists it will update.
I'm deleting entries from using:
Delete form usertopics where key in ?
But after deletion when I'm trying to update again, it's not updating. It's not giving any error, but it's not reflecting in db as well.
It's inserting perfectly again when I'm truncating the table. I'm using Datastax java driver for accessing Cassandra. Any suggestions?

From cassandra documentation -
Counter removal is intrinsically limited. For instance, if you issue
very quickly the sequence "increment, remove, increment" it is
possible for the removal to be lost (if for some reason the remove
happens to be the last received messages). Hence, removal of counters
is provided for definitive removal only, that is when the deleted
counter is not increment afterwards. This holds for row deletion too:
if you delete a row of counters, incrementing any counter in that row
(that existed before the deletion) will result in an undetermined
behavior. Note that if you need to reset a counter, one option (that
is unfortunately not concurrent safe) could be to read its value and
add -value.
Once deleted, a counter with same key cannot/should not be used. Please use the below links for further info -
https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
https://wiki.apache.org/cassandra/Counters

Related

Does add new value/update existing value in map in cassandra create tombstones?

I was following this page of datastax :- https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useInsertMap.html to find how to update the map in cassandra. But I am suspicious if this does not create unwanted tombstones in following scenarios :-
UPDATE cycling.cyclist_teams SET teams = teams + {2009 : 'DSB Bank - Nederland bloeit'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e
Will adding new value to map (if 2009 was not existed in map) create any tombstone ?
UPDATE cycling.cyclist_teams SET teams = teams + {2009 : 'DSB Bank - Nederland bloeit'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2
Will updating old value to map (if 2009 key was existed before in map) create tombstone for old value or any other kind of tombstone?

It won't create a tombstone (no delete or deliberate write of null), but it will "obsolete" the previous value.
This means that both the old and new values for 2009 will be retrieved at read-time, and Cassandra will filter-out all but the most recent. Also, depending on how much time has elapsed since the first write to teams, it entirely possible that the old and new values could have been written to separate SSTable files, meaning that the read/reconciliation process will take longer.
So while this won't create a tombstone, it'll have a similar effect in that a large amount of obsoleted data (from in-place writes/updates) to the same value will cause performance to slow over time.

It won't create a tombstone, because you are updating collection with + . Tombstone would be created if you would create a new collection instead, (map in this instance) like this:
UPDATE cycling.cyclist_teams SET teams = {2009 : 'DSB Bank - Nederland bloeit'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2
Cassandra always writes data in append only mode, with the only difference that for commit log it is appended ti the end of the log, and for the memtable it written in the order of the partition key and clustering column(s). Memtables's data is periodically flushed into the SSTable. Your conflicting data may end up duplicated (with the conflicting values) in SSTable. In fact all inserts are upserts, unless you add conditions with lightweight transactions.
Both values will be written and retrieved from a)row cache(RAM), b) memtable(RAM), or c)SSTable(HDD/SSD) upon read and then on conflict the data with the latest timestamp will be returned back to the driver. Depending on your read consistency level - always for ANY and depending on read_repair_chance for other consistency levels - old values in replicas memtables(RAM) will be updated. The old (outdated) values will be eventually removed upon SSTable(HDD/SSD) compaction process.
You can experiment and then retrieve table statistics to see if there are any tombstones by executing:
$CASSANDRA_HOME/bin/nodetool cfstats keyspace.table

Does a Secondry index lock anything when it is being created?

Given the following table schema:
CREATE TABLE Record (
-- uuidv4
recordId STRING(36) NOT NULL,
-- uuidv4
userId STRING(36),
isActive BOOL
lastUpdate TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true)
...
) PRIMARY KEY (recordId)
CREATE NULL_FILTERED INDEX RecordByUser
ON Record (userId, isActive)
For every record created we make a record (in the index) to be able able to get all of a user's records by their userId. Depending on what may be needed there could be an extra STORING clause with additional information columns.
My understanding is that as I add records to the Record table, Spanner will trigger a write to the index. Since the index is non-interleaved the data itself may have a different locality to the original record.
Under that assumption, will that write to the secondary index lock the Record table until it is completed or does one not affect the other?
I'm going to guess they are totally independent since an index can be created after the fact and Spanner will trigger a backfill operation that does not affect the operational status of the Record table.
The act of writing the index has to take some resources though from the node(s) so I would imagine that is really the limitation. Under a high write scenario for the Record table, we would also be effectively invoking a second write for the Index table RecordByUser consuming a bit more of the node(s) write throughput capacity.
So the act of adding to a Secondary Index doesn't require any locking on the source table (Record in this case). The primary concern would be the write throughput and any hotspots from those writes. For example, if we indexed on a timestamp as the first part of the index, the writes to the index would bunch up. Is my understanding here correct?
During the act of creating the index on an existing table, does the backfill process hold an exclusive lock on the index, like Postgres for example:
https://www.postgresql.org/docs/current/index-locking.html
Or can new writes land in the index during the secondary index creation while backfill is taking place?
I can imagine a backfill process on spanners end of things that takes a read snapshot and starts writing. Given Spanners fancy clocks if it encounters a row in the index newer than the row it is attempting to write, it just drops the old row on the floor and carries on.

Thanks for the question. Google engineer here for the help.
+1 to chainicko# answer for the general locking mechanism. It is not "locked" in the sense that you can still read/write the original table despite the backfill is still running.
Read/query to the index itself are not allowed during the backfill. But writes to the original table are allowed. New writes are added to the index concurrently. After the backfill, Spanner will make sure only the latest data will be presented when queried.
As for the example of "indexed on a timestamp as the first part of the index", since it creates a hotspot on the index, so it would still have a negative impact on the system as a whole, even though it does not lock the original table.

Partition DELETE/INSERT concurrency issue in Cassandra

I have a table in Cassandra which stores versions of csv-files. It uses a primary key with a unique id for the version (the partition key) and a row number (the clustering key). When I insert a new version I first execute a delete statement on the partition key I am about to insert, to clean up any incomplete data. Then the data is inserted.
Now here is the issue. Even though the delete and subsequent insert are executed synchronously after one another in the application it seems that some level of concurrency still exist in Cassandra, because when I read afterwards, rows from my insert will be missing occasionally - something like 1 in 3 times. Here are some facts:
Cassandra 3.0
Consistency ALL (R+W)
Delete using the Java Driver
Insert using the Spark-Cassandra connector
Number of nodes: 2
Replication factor: 2
The delete statement I execute looks like this:
"DELETE FROM myTable WHERE version = 'id'"
If I omit it, the problem goes away. If I insert a delay between the delete and the insert the problem is reduced (less rows missing). Initially I used a less restrictive consistency level, and I was sure this was the issue, but it didn't affect the problem. My hypothesis is that for some reason the delete statement is being sent to the replica asynchronously despite the consistency level of ALL, but I can't see why this would be the case or how to avoid it.

All mutations are going to by default get a write time of the coordinator for that write. From the docs
TIMESTAMP: sets the timestamp for the operation. If not specified,
the coordinator will use the current time (in microseconds) at the
start of statement execution as the timestamp. This is usually a
suitable default.
http://cassandra.apache.org/doc/cql3/CQL.html
Since the coordinator for different mutations can be different, a clock skew between coordinators can end up with a mutations to one machine to be skewed relative to another.
Since write time controls C* history this means you can have a driver which synchronously inserts and deletes but depending on the coordinator the delete can happen "before" the insert.
Example
Imagine two nodes A and B, B is operating with a 5 second clock skew behind A.
At time 0: You insert data to the cluster and A is chosen as the coordinator. The mutation arrives at A and A assigns a timestamp (0)
There is now a record in the cluster
INSERT VALUE AT TIME 0
Both nodes contain this message and the request returns confirming the write was successful.
At time 2: You issue a delete for the data previously inserted and B is chosen as the coordinator. B assigns a timestamp of (-3) because it is clock skewed 5 seconds behind the time in A. This means that we end up with a statement like
DELETE VALUE AT TIME -3
We acknowledge that all nodes have received this record.
Now the global consistent timeline is
DELETE VALUE AT TIME -3
INSERT VALUE AT TIME 0
Since the insertion occurs after the delete the value still exists.

I have got similar problem, and I have fixed it by enabling Light-Weight-Transaction for both INSERT and DELETE requests (for all queries actually, including UPDATE). It will make sure all queries to this partition are serialized through one "thread", so DELETE wan't overwrite INSERT. For example (assuming instance_id is a primary key):
INSERT INTO myTable (instance_id, instance_version, data) VALUES ('myinstance', 0, 'some-data') IF NOT EXISTS;
UPDATE myTable SET instance_version=1, data='some-updated-data' WHERE instance_id='myinstance' IF instance_version=0;
UPDATE myTable SET instance_version=2, data='again-some-updated-data' WHERE instance_id='myinstance' IF instance_version=1;
DELETE FROM myTable WHERE instance_id='myinstance' IF instance_version=2
//or:
DELETE FROM myTable WHERE instance_id='myinstance' IF EXISTS
IF clauses enable light-wight-transactions for each row, so all of them are serialized. Warning: LWT is more expensive than normal calls, but sometimes they are needed, like in the case of this concurrency problem.

Supporting logical delete for an existing feed table

I would like to implement logical delete for a news-feed record to support a later undo.
The system is in production, so any solution should support existing data.
Insert records to the feed is idempotent, thus inserting an already deleted record (has the same primary key) should not undelete it.
Any solution should support the queries to retrieve a page of existing or deleted records.
The feed table:
CREATE TABLE my_feed (
tenant_id int,
item_id int,
created_at timestamp,
feed_data text,
PRIMARY KEY (tenant_id, created_at, feed_id) )
WITH compression = { 'sstable_compression' : 'LZ4Compressor' }
AND CLUSTERING ORDER BY (created_at DESC);
There are two approaches I have thought of but both have serious disadvantages:
1. Move deleted records to a different table. Queries are trivial and no migration is required, but idempotent inserts seems to be difficult (only read before insert?).
2. Add is_deleted column. Create a secondary index for that column to support the queries. Idempotent inserts seems to be easier to support (lightweight transactions or an update trick).
The main disadvantage is that older records have null value, thus it requires data migration.
Is there a third more elegant approach? Do you support one of the above suggestions?

If you maintain a separate table for deleted records, you can use CQL's BATCH construct to perform your "move" operation, but since the only record of deletion is in that table, you must check it first if you want the behavior you've described around not re-animating deleted records. Reading before writing is usually an anti-pattern, etc.
Using an is_deleted column might require some migration work, as you mention, but the potentially more serious problem you may have is that creating an index on a very low-cardinality column is usually extremely inefficient. With a boolean field, I think your index would contain only two rows. If you don't delete too frequently, that means your "false" row will be very wide and therefore almost useless.
If you avoid creating a secondary index for the is_deleted column and you allow both null and false to indicate active records, while only explicit true indicates deleted ones, you may not need to migrate anything. (Do you actually know which existing records to delete during migration?) You would then leave filtering deleted records to the client, who is probably already going to be in charge of some of your paging behavior. The drawback of this design is that you may have to ask for > N records to get N that aren't deleted!
I hope that helps and addresses the question as you've stated it. I would be curious to know why you would need to guard against already deleted records being brought back to life, but I can imagine a situation where you have multiple actors working on a particular feed (and the CAS problems that could arise).
On a somewhat unrelated note, you may want to consider using timeuuid instead of timestamp for your created_at field. CQL supports a dateOf() function to retrieve that date if that's a stumbling block. (It may also be impossible to get collisions within your tenant_id partitions, in which case you can safely ignore me.)

Why am I reading many tombstones in Cassandra table although my access pattern should avoid them

I know this is not the best way to use Cassandra, but the type of my data requires reading all data from the last week. However when using Collection-types in CQL3, I ran into certain limitations which prevent me from doing normal date-range queries.
So I have set up Cassandra (currently single node, probably more in the future) with the following table
CREATE TABLE cache (tag text, id int, tags map<text,text>,
PRIMARY KEY (tag, id) );
ALTER TABLE cache WITH GC_GRACE_SECONDS = 0;
I am inserting with a TTL of one week to automatically remove the items from the Cache.
I tried to follow the suggestions mentioned in this article to avoid reading many tombstones by selecting by "minimum id", which I persist elsewhere to avoid reading old data:
SELECT * FROM cache WHERE tag = ? AND id >= ?
The id is basically some sort of timestamp which is constantly increasing, i.e. I only insert higher values over time and constantly remove older ids from the table.
But I still get warnings about thresholds being reached
WARN 08:59:06,286 Read 5001 live and 5702 tombstoned cells in cache (see tombstone_warn_threshold)
And if I do not run manual compaction/scrubbing regularly I get exceptions and queries fail.
However based on my understanding from the articles and documentation, I should be avoiding most if not all tombstones here as I query on equality for the tag, which allows Cassandra to only look for those areas and I use a minimum id which allows Cassandra to start reading only after most of the tombstones, so why are there still tombstone warnings/exceptions reported?

Map k/v pair is actually a column (name, value and timestamp): so, if you are issuing a lot of deletions of map elements (expiring by TTL is also the case) -- this is the source of this warning. Because you are still reading full maps (with lots of tombstones in them). Also, TTL setting on map is applied on per-element basis.
Second, this is multiplied by >= predicate in your select query.
If this is the case, you should remodel your data access pattern to use only EQ relations in SELECT query and bump id more often. Also, this access pattern will allow you to get rid of clustering part of your PRIMARY KEY.
So, if you do not issue lots of deletions on that map, you can try to use tag text, time timeuuid, name text, data text model and slice it precisely by time.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string