Updating cell value to null in Cassandra and avoid tombstones - cassandra

We are having DSE 6.8 with Cassandra version 3.11 and ysing spring-data-cassandra 2.2.6 which is using cassandra-driver-core 3.7.2
We have use cases where we need to UPDATE a field (cell) value to null (or whatever will represent "no value"), and it was not null before.
The trick here is that we wanted to avoid tombstones when applying this operation.
Thanks.

It's impossible - in Cassandra, setting value to null is equal to deletion operation that will generate a tombstone. If you want to avoid a tombstone, agree on some artificial value that will represent "no value", like, empty string, 0, etc.

Related

Is it possible to express inequality in the WHERE clause of a CQL statement?

I want to SELECT stuff WHERE value is not NAN. How to do it? I tried different options:
WHERE value != NAN
WHERE value is not NAN
WHERE value == value
None of these attempts succeeded.
I see that it is possible to write WHERE value = NAN, but is there a way to express inequality?
As you noted, none of the alternatives you tried work today:
although the != operator is recognized by the parser, it is unfortunately not supported in WHERE clause. This is true for both Cassandra and Scylla. I opened https://github.com/scylladb/scylladb/issues/12736 as an feature request in Scylla to add support for !=.
The IS NOT ... syntax is not relevant - it is only supported in the specific way IS NOT NULL, and even that is not supported in WHERE (see https://github.com/scylladb/scylladb/issues/8517).
WHERE value = value (note a single equals sign is the SQL and CQL syntax, not '==' as in C) is currently not supported, you can only check equality of a column to a constant, not check the equality of two columns. Again this is true for both Cassandra and Scylla. Scylla is now in the process of improving the power of the WHERE expressions, and at the end of this process this sort of expression will be supported.
I think your best solution today is just to read all the data, and filter out NaN yourself, in the client. The performance loss should be minimal - just the network overhead - because even if Scylla did this filtering for you it would still need to read the data from disk and do this filtering - it's not like it can get this inequality check "for free". This is unlike the equality check (WHERE value = 3) where Scylla can jump directly to the position of value = 3 (if "value" is the partition key or clustering key) and read only that. This efficiency concern is the reason why historically Scylla and Cassandra supported the equality operator, and not the inequality operator.
Cassandra is designed for OLTP workloads so reads are optimised for retrieving specific partitions such that the filter is of the form:
SELECT ... FROM ... WHERE partition_key = ?
A query that has an inequality filter is retrieving "everything except partition X" and is not really OLTP because Cassandra has to perform a full table scan to check all records which do NOT match the filter. This query does not scale so is not supported.
As far as I'm aware, the inequality operator (!=) only works in the conditional section of lightweight transactions that only applies to UPDATE or DELETE, not SELECT statements. For example:
UPDATE ... SET ... WHERE ... IF condition
If you have a complex search use case, you should look at using Elasticsearch or Apache Solr on top of Cassandra. If you have an analytics use case, consider using Apache Spark to query the data in Cassandra. Cheers!

Cassandra read consistency LOCAL_QUORUM

I have a doubt about how does Cassandra return the value in case of LOCAL_QUORUM. If due to some case there is no quorum consensus between the values returned by individual nodes, will Cassandra not return any value at all or return the lastest value based on the timestamp.
Cassandra does not use consensus of the values, for quorum reads, to determine which value to return to the client, it always uses the timestamp value to determine the most recent value.
This most recent value is then used to overwrite the values in the other replicas using read repair, if the values do not match.
Cassandra always works based on timestamp and return the latest value to the client.After checksum read repair updates the replica for that partition.
https://academy.datastax.com/support-blog/read-repair

Howto avoid cassandra tombstones when inserting NULL values

My problem is that cassandra creates tombstones when inserting NULL values.
From what I understand, cassandra doesn't support NULLs and when NULL is inserted it just deletes the respective column. On one hand this is very space effective, however on the other hand it creates tombstones which degrades read performance.
This goes agains NoSql phillosophy because cassandra is saving space but degrading read performance. In NoSql world the space is cheap, however performance matters. I beleive this is the phillosophy behind saving tables in denormalized form.
I would like cassandra to use the same technique for inserting NULL as for any other value - use timestamping and during compaction preserve the latest entry - even if the entry is NULL (or we can call it "unset").
Is there any tweak in cassandra config or any approach how I would be able to achieve upserts with nulls without having tombstones ?
I came across this issue however it only allows to ignore NULL values
My use case:
I have stream of events, every event identified by causeID. I'm receiving many events with same causeId and I want to store only the latest event for the same causeID (using upsert). The properties of the event may change from NULL to specific value, but also from specific value to NULL. Unfortunatelly the later case generates tombstones and degrades read performance.
Update
It seems there is no way how I could avoid tombstones. Could you advice me on techniques how to minimize them (set gc_grace_seconds to very low value). What are the risks, what to do when a node goes down for a longer period than gc_grace_seconds ?
You can't insert NULL into Cassandra - it has special meaning there, and lead to creation of tombstones that you observe. If you want to treat NULL as special value, why not to solve this problem on application side - when you get null status, just insert any special value that couldn't be used in your table, and when you read data back, check for that special value and output null to requester...
When we want to just insert or update rows using null for values that are not specified, and even though our intention is to leave the value empty, Cassandra represents it as a tombstone causing unnecessary overhead which degrades performance.
To avoid such tombstones for save operations, cassandra has the concept of unset for a parameter value.
So you can do the following to unset a field value while saving to avoid tombstone overhead for example related to different cases:
1). If you are using express-cassandra then :
const user = new models.instance.User({
user_id: 1235,
user_name: models.datatypes.unset // this will not create tombstone when we want empty user_name or null
});
user.save(function(err){
// user_name value is not set and does not create any unnecessary tombstone overhead
});
2). If you are writing cassandra raw query then for empty or null field when you know say colC will be null, then don't use it in your query.
insert into my_table(id,colA,colB) values(idVal,valA,valB) // Avoid colC
3). If you are using Node.Js Driver, you can even pass undefined on insert or update which will avoid tombstone overhead. For example
const query = 'INSERT INTO my_table (id, colC) VALUES (?, ?)';
client.execute(query, [ id, undefined ]);
4). If you are using c# driver then
// Prepare once in your application lifetime
var ps = session.Prepare("INSERT INTO my_table (id, colC) VALUES (?, ?)");
// Bind the unset value in a prepared statement
session.Execute(ps.Bind(id, Unset.Value));
For more detail on express-cassandra read the sub topic Null and unset values of
https://express-cassandra.readthedocs.io/en/latest/datatypes/#cassandra-to-javascript-datatypes
For more detail on Node.js driver unset feature refer datastax https://docs.datastax.com/en/developer/nodejs-driver/4.6/features/datatypes/nulls/
For more detail on Csharp driver unset feature refer datastax https://docs.datastax.com/en/developer/csharp-driver/3.16/features/datatypes/nulls-unset/
NOTE: I tested this on Node.js cassandra 4.0 But unset feature is introduced after cassandra 2.2
Hope this will help you or somebody else.
Thanks!
You cannot avoid tombstones if you particularly mention NULL in your INSERT. C* does not do a lookup before insert or writing a data which makes the writes very faster. For this purpose, C* just inserts a tombstone to avoid that value later (taking the latest update comparing the timestamp). If you want to avoid tombstone (which is recommended), you've to prepare different combinations of queries to check each one for NULL before adding it to the INSERT. If you have very few fields to check then it'll be easy to just add some IF-ELSE statements. But if there are lots of them, the code will be bigger and less readable. Shortly, you cannot insert NULL which will impact read performance later.
Inserting null values into cassandra
I don't think the other answers address the original question, which is how to overwrite a non-null value in Cassandra with null without creating a tombstone. The nearest is Alex Ott's suggestion to use some special value other than null.
However, with a little bit of trickery you can insert an explicit null into Cassandra by exploiting a FROZEN tuple or user-defined type. The FROZEN keyword effectively serialises the user defined type and stores the serialised representation in the column. Crucially, the serialised representation of a UDT containing null values is not itself null.
> CREATE TYPE test_type(value INT);
> CREATE TABLE test(pk INT, cl INT, data FROZEN<test_type>, PRIMARY KEY (pk, cl));
> INSERT INTO test (pk, cl, data) VALUES (0, 0, {value: 15});
> INSERT INTO test (pk, cl, data) VALUES (0, 0, {value: null});
> INSERT INTO test (pk, cl) VALUES (0, 1);
> SELECT * FROM test;
pk | cl | data
----+----+---------------
0 | 0 | {value: null}
0 | 1 | null
(2 rows)
Here we wrote 15, then overwrote it with null, and finally added a second row to demonstrate that there is a difference between an unset cell and a cell containing a frozen UDT that itself contains null.
Of course the downside of this approach is that in your application you have to delve into the UDT for the actual value.
On the other hand, if you combine several columns into the UDT you do save a little overhead in Cassandra. (But you can't then read or write them individually. You also can't remove fields, though you can add new ones.)

how can get the return of Update/Insert statement from nodejs driver

I am using the Datasax nodejs driver for cassandra (https://github.com/datastax/nodejs-driver)
I am wondering if I can get the number of rows updated after executing an update statement.
Thanks,
Updates to Cassandra is treated as another insert. In other words, there is no read before write (update in this case) in Cassandra. The updates are just writes to another immutable sstable and during read Cassandra stitches them together. Its during read, the latest value for that column is choosen.
So in short, there isn't a way in Cassandra to deduce the number of rows updated.

Error in Cassandra documentation regarding Size Tiered Compaction?

In the Cassandra documentation here it says:
While STCS works well to compact a write-intensive workload, it makes reads slower because the merge-by-size process does not group data by rows. This makes it more likely that versions of a particular row may be spread over many SSTables.
1) What does 'group data by rows' mean? Aren't all rows for a partition already grouped?
2) How is it possible for a row to have multiple versions on a single node? Doesn't the upsert behavior ensure that only the latest version of a row is accessible via the memtable and partition indices? Isn't it true that when a row is updated and the memtable flushed, the partition indices are updated to point to the latest version? Then, on compaction, this latest version (because of the row timestamp) is the one that ends up in the compacted SSTable?
Note that I'm talking about a single node here - NOT the issue of replicas being out of sync.
Either this is incorrect or I am misunderstanding what that paragraph says.
Thanks!
OK, I think I found the answer myself - I would be grateful for any confirmation that this is correct.
A row may have many versions because updates/upserts can write only part of a row. Thus, the latest version of a complete row is made up of all the latest updates for all the columns in that row - which can be spread out across multiple SSTables.
My misunderstanding seemed to stem from the idea that the partition indices can only point to one location in one SSTable. If I relax this constraint, the statement in the doc makes sense. I must therefore assume that an index in the partition indices for a primary key can hold multiple locations for that key. Can someone confirm that all this is true?
Thanks.

Resources