Does an UPDATE become an implied INSERT - cassandra

For Cassandra, do UPDATEs become an implied INSERT if the selected row does not exist? That is, if I say
UPDATE users SET name = "Raedwald" WHERE id = 545127
and id is the PRIMARY KEY of the users table, and the table has no row with a key of 545127, will that be equivalent to
INSERT INTO users (id, name) VALUES (545127, "Raedwald")
I know that the opposite is true: an INSERT for an id that already exists becomes an UPDATE of the row with that id. Older Cassandra documentation talked about inserts actually being "upserts" for that reason.
I'm interested in the case for CQL3, Cassandra version 1.2+.

Yes, for Cassandra UPDATE is synonymous with INSERT, as explained in the CQL documentation where it says the following about UPDATE:
Note that unlike in SQL, UPDATE does not check the prior existence of the row: the row is created if none existed before, and updated otherwise. Furthermore, there is no mean to know which of creation or update happened. In fact, the semantic of INSERT and UPDATE are identical.
For the semantics to be different, Cassandra would need to do a read to know if the row already exists. Cassandra is write optimized, so you can always assume it doesn't do a read before write on any write operation. The only exception is counters (unless replicate_on_write = false), in which case replication on increment involves a read.

Unfortunately the accepted answer is not 100% accurate. inserts are different than updates:
cqlsh> create table ks.t (pk int, ck int, v int, primary key (pk, ck));
cqlsh> update ks.t set v = null where pk = 0 and ck = 0;
cqlsh> select * from ks.t where pk = 0 and ck = 0;
pk | ck | v
----+----+---
(0 rows)
cqlsh> insert into ks.t (pk,ck,v) values (0,0,null);
cqlsh> select * from ks.t where pk = 0 and ck = 0;
pk | ck | v
----+----+------
0 | 0 | null
(1 rows)
Scylla does the same thing.
In Scylla and Cassandra rows are sequences of cells. Each column gets a corresponding cell (or a set of cells in the case of non-frozen collections or UDTs). But there is one additional, invisible cell - the row marker (in Scylla at least; I suspect Cassandra has something similar).
The row marker makes a difference for rows in which all other cells are dead: a row shows up in a query if and only if there's at least one alive cell. Thus, if the row marker is alive, the row will show up, even if all other columns were previously set to null using e.g. updates.
inserts create a live row marker, while updates don't touch the row marker, so clearly they are different. The example above illustrates that.
One could argue that row markers are "internal" to Cassandra/Scylla, but as you can see, their effects are visible. Row markers affect your life whether you like it or not, so it may be useful to remember about them.
It's sad that no documentation mentions row markers (well, I found this: https://docs.scylladb.com/architecture/sstable/sstable2/sstable-data-file/#cql-row-marker but it's in the context of explaining SSTable internals, which is probably dedicated to Scylla developers more than to users).
Bonus: a cell delete:
delete v from ks.t where pk = 0 and ck = 0
is the same as a null update:
update ks.t set v = null where pk = 0 and ck = 0
indeed, a cell delete also doesn't touch the row marker. It only sets the specified cell to null.
This is different from a row delete:
delete from ks.t where pk = 0 and ck = 0
because row deletes insert a row tombstone, which kills all cells in the row (including the row marker). You could say that row deletes are the opposite of an insert. Updates and cell deletes are somewhere in between.

What one can do is this however:
UPDATE table_name SET field = false WHERE key = 55 IF EXISTS;
This will ensure that your update is a true update and not an upsert.

Related

Cassandra map and column update regarding tombstones

I have this following table:
CREATE TABLE example
(
id text,
users map<text,text>,
lastvisit int,
...
PRIMARY KEY (userid)
);
Sometimes I update a column or a map entry like:
1) update example set users = users - {'JOE'} where id = 'id';
2) update example set users = users + {'JOE':'meta'} where id = 'id';
3) update example set lastvisit = 100 where id = 'id';
I need to know how each query handles the old data in manner of tombstones and compaction.
The following I have researched/ advised but specially on maps I lack on information.
Deletes the map entry at key = 'JOE' by generating a tombstone only for that entry in the map. On compaction the value is dropped.
Inserts the key value pair to the map. The old entry is dropped at compaction since there is a newer entry.
The column entry is updated and like in 2, the old value is dropped in compaction
The question in each case is, will the whole row be written again or only the updated value with a newer timestamp ?
A tombstone for the map item where key = 'BOB' will be inserted.
The row doesn't get overwritten. Just adds a new map item.
Strictly speaking, it's not an UPDATE -- a new column will be inserted. All mutations in C* are inserts under the hood even for deletes.
Here are some additional points:
You had a typo in your schema. It should be -- users map<text,text>.
For (1) you need to enclose the item in curly brackets otherwise the CQL statement is invalid -- {'JOE'}.
For (2) you need a colon (:) to delimit the key and value -- {'JOE':'meta'}.
For (3) there's no evidence that lastvisit was defined so a new column lastvisit = 100 will be inserted and there's no old value to be deleted. Cheers!

Why does a (upserted) row disappear after updating the column to null? (But not when it's been inserted)

My understanding of inserts and updates in Cassandra was that they were basically the same thing. That's is also what the documentation says ( https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlUpdate.html?hl=upsert )
Note: Unlike the INSERT command, the UPDATE command supports counters. Otherwise, the UPDATE and INSERT operations are identical.
So aside from support for counters they should be the same.
But then I ran across a problem where rows that where create via update would disappear if I set columns to null, whereas this doesn't happen if the rows are created with insert.
cqlsh:test> CREATE TABLE IF NOT EXISTS address_table (
... name text PRIMARY KEY,
... addresses text,
... );
cqlsh:test> insert into address_table (name, addresses) values ('Alice', 'applelane 1');
cqlsh:test> update address_table set addresses = 'broadway 2' where name = 'Bob' ;
cqlsh:test> select * from address_table;
name | addresses
-------+-------------
Bob | broadway 2
Alice | applelane 1
(2 rows)
cqlsh:test> update address_table set addresses = null where name = 'Alice' ;
cqlsh:test> update address_table set addresses = null where name = 'Bob' ;
cqlsh:test> select * from address_table;
name | addresses
-------+-----------
Alice | null
(1 rows)
The same thing happens if I skip the separate step of first creating a row. With insert I can create a row with a null value, but if I use update the row is nowhere to be found.
cqlsh:test> insert into address_table (name, addresses) values ('Caroline', null);
cqlsh:test> update address_table set addresses = null where name = 'Dexter' ;
cqlsh:test> select * from address_table;
name | addresses
----------+-----------
Caroline | null
Alice | null
(2 rows)
Can someone explain what's going on?
We're using Cassandra 3.11.3
This is expected behavior. See details in https://issues.apache.org/jira/browse/CASSANDRA-14478
INSERT adds a row marker, while UPDATE does not. What does this mean? Basically an UPDATE requests that individual cells of the row be added, but not that the row itself be added; So if one later deletes the same individual cells with DELETE, the entire row goes away. However, an "INSERT" not only adds the cells, it also requests that the row be added (this is implemented via a "row marker"). So if later all the row's individual cells are deleted, an empty row remains behind (i.e., the primary of the row which now has no content is still remembered in the table).

Is there anyway we could limit an update?

Generally, I see we can limit the select by select * from table where predicate = value limit by N Am currently in a situation where I have 200 records falling under a predicate, but I want to update the first 100 like update table set column = 1 where predicate = value limit...? and the second half by update table set column = 2 where predicate = value. I think it could be done by having ranges <=,>= in the predicate section, unfortunately, I have none of them.
Currently, I don't think we have this feature as WHERE clause must identify the row or rows to be updated by primary key as per. However, you could further limit the number of rows to be updated by using IF EXISTS condition. Details can be found here

Just set the TTL on a row

Using Java, can I scan a Cassandra table and just update the TTL of a row? I don't want to change any data. I just want to scan Cassandra table and set TTL of a few rows.
Also, using java, can I set TTL which is absolute. for example (2016-11-22 00:00:00). so I don't want to specify the TTL in seconds, but specify the absolute value in time.
Cassandra doesn't allow to set the TTL value for a row, it allows to set TTLs for columns values only.
In the case you're wondering why you are experiencing rows expiration, this is because if all the values of all the columns of a record are TTLed then the row disappears when you try to SELECT it.
However, this is only true if you perform an INSERT with the USING TTL. If you INSERT without TTL and then do an UPDATE with TTL you'll still see the row, but with null values. Here's a few examples and some gotchas:
Example with a TTLed INSERT only:
CREATE TABLE test (
k text PRIMARY KEY,
v int,
);
INSERT INTO test (k,v) VALUES ('test', 1) USING TTL 10;
... 10 seconds after...
SELECT * FROM test ;
k | v
---------------+---------------
Example with a TTLed INSERT and a TTLed UPDATE:
INSERT INTO test (k,v) VALUES ('test', 1) USING TTL 10;
UPDATE test USING TTL 10 SET v=0 WHERE k='test';
... 10 seconds after...
SELECT * FROM test;
k | v
---------------+---------------
Example with a non-TTLed INSERT with a TTLed UPDATE
INSERT INTO test (k,v) VALUES ('test', 1);
UPDATE test USING TTL 10 SET v=0 WHERE k='test';
... 10 seconds after...
SELECT * FROM test;
k | v
---------------+---------------
test | null
Now you can see that the only way to solve you problem is to rewrite all the values of all the columns of your row with a new TTL.
In addition, there's no way to specify an explicit expiration date, but you can get a simple TTL value in seconds with simple math (as other suggested).
Have a look at the official documentation about data expiration. And don't forget to have a look at the DELETE section for updating TTLs.
HTH.
You can't only update TTL of a row. You have to update or re-insert all the column.
You can select all the regular column and the primary keys column then update the regular columns with primary keys or re-insert using TTL in second
In Java you can calculate TTL in second from a date using below method.
public static long ttlFromDate(Date ttlDate) throws Exception {
long ttl = (ttlDate.getTime() - System.currentTimeMillis()) / 1000;
if (ttl < 1) {
throw new Exception("Invalid ttl date");
}
return ttl;
}
Alternatively, you can set a TTL value on the entire table while creating it.
CREATE TABLE test (
k text PRIMARY KEY,
v int,
) WITH default_time_to_live = 63113904;
Above example will create a table whose rows will disappear after 2 years.

Cassandra OperationTimedOut

select count (*) from my_table gives me OperationTimedOut: errors={}, last_host=127.0.0.1
I have already tried to change the values in request_timeout_in_ms in cassandra.yaml and request_timeout in cqlshrc.sample. (Both are in C:\Programs\DataStax-DDC\apache-cassandra\conf) But without success.
How can I increse timeout?
select count (*) is not doing what you think. It is actually expensive as it counts the rows one by one. You can track number of records using a separate column family with a counter, which you will need to increment for every insert you do into your table. For example
CREATE TABLE IF NOT EXISTS my_table_counter (
mykey text,
count counter,
PRIMARY KEY (mykey)
);
Then for every insert into your table, do counter update:
INSERT into my_table (mykey, mydata) VALUES (?, ?);
UPDATE my_table_counter SET count = count + 1 WHERE mykey = ?;
To get the count:
SELECT count FROM my_table_counter WHERE mykey = ?
Note that counters are not idempotent, so in a rare event of a failure your data might be under or over-counted. Also the code above assumes that you only insert with a new key.
If you need a precise counting, Cassandra may be not a good fit for that. Also if you are not inserting with unique keys you may need to consider using light weight transaction with insert (IF NOT EXISTS) and update a counter only if transaction was applied.

Resources