I have a table - for simplicity, lets say this is its definition:
CREATE TABLE t (pk1 varchar, pk2 varchar, c1 varchar, c2 varchar, PRIMARY KEY(pk1, pk2));
I do multiple actions on it in parallel using the full PK:
INSERT INTO t (pk1, pk2, c1, c2) values (?, ?, ?, ?) IF NOT EXISTS;
DELETE FROM t where pk1 = ? AND pk2 = ?;
UPDATE t set c1 = ? where pk1 = ? AND pk2 = ? IF EXISTS;
Note:
in the INSERT command c2 is never null
in the UPDATE command c2 is not populated
Using these commands I should never have rows with c2 = null. The problem is that every now and then I do see such rows. I can't easily reproduce it but it always happens when I stress the system (multiple parallel clients running: insert, update, delete with the same PK).
Edit: my cluster size is 4 with RF=2 (NetworkTopologyStrategy with 1 DC) and I use CL=QUORUM for all queries.
Am I missing something or is there a bug in LWT?
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlLtwtTransactions.html
If lightweight transactions are used to write to a row within a partition,only lightweight transactions for both read and write operations should be used. This caution applies to all operations, whether individual or batched. For example, the following series of operations can fail:
DELETE ...
INSERT .... IF NOT EXISTS
SELECT ....
The following series of operations will work:
DELETE ... IF EXISTS
INSERT .... IF NOT EXISTS
SELECT .....
Note - The same is true for INSERT and UPDATE combination as well from a bug we recently encountered. If you use Transactions then use it for the related statements. The recent could be related to the slightly different timestamps, explained here better
https://jira.apache.org/jira/browse/CASSANDRA-14304
doanduyhai DOAN DuyHai added a comment - 24/Mar/18 15:12
Hints:
1) LWT operations are using the ballot based on an agreement of timestamp value between QUORUM of replicas. It can happens that the timestamp is slightly incremented (some microsecs) in case of conflict/contention on the cluster. The consequence is that the timestamp used for LWT can be slightly (again in microsecs) in the future. Read the source code to check the part of the code responsible for ballot agreement with Paxos
2) For the DELETE request:
a) it can use the <current> timestamp, which can belong to the "past" with respect to the one used by LWT, thus SELECT does return a value
In general one option is that the Delete is executed in parallel with the Update yet does not provide any guarantee of being applied.
Example (this is for simplicity - it may be possible in other options as well).
Assume a cluster of 3 nodes with RF=3 (that has a temporary connection issue with node1 and the other node2,node3)
The Delete with CL=ONE is executed toward node1 with timestamp T1 (and not applied on node2, node3).
The Updates is executed toward node2,node3 with timestamp T2 (T2 > T1).
Connection between node1,node2,node3 is restored and now the tombstone that DELETE introduced would remove all the data (including c2) and the UPDATE would only have set pk1,pk2,c1 - leaving c2 as null.
If you would apply the DELETE using LWT - this should not happen as long as TTL is not used.
TTL can be set either directly in the insert statements or by default via a table property, to check this you can execute
Describe table will return the default_time_to_live that is set for this table.
A select ttl(c2) ... will return a value if ttl was set.
Related
I have a table in Cassandra with 2 columns: id and date_proc and plan to insert a lot of inserts. Is it possible to use something like ON CONFLICT in Postgres to get previous value on inserting?
Could you tell me another way to avoid 2 requests to Cassandra (select and insert)? Maybe some solution in DataStax?
ddl:
create table test.date_dict (
id text,
date_proc text,
PRIMARY KEY (id));
example of inserting:
INSERT INTO test.date_dict (id, date_proc) VALUES ('1', '2020-01-01'); // return '2020-01-01'
INSERT INTO test.date_dict (id, date_proc) VALUES ('1', '2020-01-05'); // return '2020-01-01'
"Normal" inserts and updates in Cassandra are just appends into the memtable (and then flushed into SSTables) - no read happens during these operations. And it will just overwrite previous data if it has lower timestamp.
Potentially you can use lightweight transactions (LWTs) to achieve what you need - they return previous value if there is a conflict (row exists already when you use IF NOT EXISTS, or value is different than you specify in the IF condition). But LWTs are very bad for performance, so they should be used carefully.
I would try to reformulate your task such way so it will fit into "normal" inserts/updates behavior.
How can I delete a row from Cassandra and get the value it had just before the deletion?
I could execute a SELECT and DELETE query in series, but how can I be sure that the data was not altered concurrently between the execution of those two queries?
I've tried to execute the SELECT and DELETE queries in a batch but that seems to be not allowed.
cqlsh:foo> BEGIN BATCH
... SELECT * FROM data_by_user WHERE user = 'foo';
... DELETE FROM data_by_user WHERE user = 'foo';
... APPLY BATCH;
SyntaxException: line 2:4 mismatched input 'SELECT' expecting K_APPLY (BEGIN BATCH [SELECT]...)
In my use case I have one main table that stores data for items. And I've build several tables that allow to lookup items based on those informations.
If I delete an item from the main table, I must also remove it from the other tables.
CREATE TABLE items (id text PRIMARY KEY, owner text, liking_users set<text>, ...);
CREATE TABLE owned_items_by_user (user text, item_id text, PRIMARY KEY ((user), item_id));
CREATE TABLE liked_items_by_user (user text, item_id tect, PRIMARY KEY ((user), item_id));
...
I'm afraid the tables might contain wrong data if I delete an item and at the same time someone e.g. hits the like button of that same item.
The deleteItem method execute a SELECT query to fetch the current row of the item from the main table
The likeItem method that gets executed at the same times runs an UPDATE query and inserts the item into the owned_items_by_user, liked_items_by_user, ... tables. This happens after the SELECT statement was executed and the UPDATE query is executed before the DELETE query.
The deleteItem method deletes the items from the owned_items_by_user, liked_items_by_user, ... tables based on the data just retrieved via the SELECT statement. This data does not yet contain the just added like. The item is therefore deleted, but the just added like remains in the liked_items_by_user table.
You can do a select beforehand, then do a lightweight transaction on the delete to ensure that the data still looks exactly like it did when you selected. If it does, you know the latest state before you deleted. If it does not, keep retrying the whole procedure until it sticks.
Unfortunately you cannot do a SELECT query inside a batch statement. If you read the docs here, only insert, update, and delete statements can be used.
What you're looking for is atomicity on the execution, but batch statements are not going to be the way forward. If the data has been altered, your worst case situation is zombies, or data that could reappear.
Cassandra uses a grade period mechanism to deal with this, you can find the details here. If for whatever reason, this is critical to your business logic, the "best" thing you can do in this situation is to increase the consistency level, or restructure the read pattern at application level to not rely on perfect atomicity, whichever the right trade off is for you. So either you give up some of the performance, or tune down the requirement.
In practice, QUORUM should be more than enough to satisfy most situations most of the time. Alternatively, you can do an ALL, and you pay the performance penalty, but that means all replicas for the given foo partition key will have to acknowledge the write both in the commitlog and the memtable. Note, this still means a flush from the commitlog will need to happen before the delete is complete, but you can tune the consistency to the level you require.
You don't have atomicity in the SQL sense, but depending on throughput it's unlikely that you will need it(touch wood).
TLDR:
USE CONSISTENCY ALL;
DELETE FROM data_by_user WHERE user = 'foo';
That should do the trick. The error you're seeing now is basically the ANTLR3 Grammar parser for CQL 3, which is not designed to accept to SELECT queries inside batches simply because they are not supported, you can see that here.
So, I have a Cassandra CQL statement that looks like this:
SELECT * FROM DATA WHERE APPLICATION_ID = ? AND PARTNER_ID = ? AND LOCATION_ID = ? AND DEVICE_ID = ? AND DATA_SCHEMA = ?
This table is sorted by a timestamp column.
The functionality is fronted by a REST API, and one of the filter parameters that they can specify to get the most recent row, and then I appent "LIMIT 1" to the end of the CQL statement since it's ordered by the timestamp column in descending order. What I would like to do is allow them to specify multiple device id's to get back the latest entries for. So, my question is, is there any way to do something like this in Cassandra:
SELECT * FROM DATA WHERE APPLICATION_ID = ? AND PARTNER_ID = ? AND LOCATION_ID = ? AND DEVICE_ID IN ? AND DATA_SCHEMA = ?
and still use something like "LIMIT 1" to only get back the latest row for each device id? Or, will I simply have to execute a separate CQL statement for each device to get the latest row for each of them?
FWIW, the table's composite key looks like this:
PRIMARY KEY ((application_id, partner_id, location_id, device_id, data_schema), activity_timestamp)
) WITH CLUSTERING ORDER BY (activity_timestamp DESC);
IN is not recommended when there are a lot of parameters for it and under the hood it's making reqs to multiple partitions anyway and it's putting pressure on the coordinator node.
Not that you can't do it. It is perfectly legal, but most of the time it's not performant and is not suggested. If you specify limit, it's for the whole statement, basically you can't pick just the first item out from partitions. The simplest option would be to issue multiple queries to the cluster (every element in IN would become one query) and put a limit 1 to every one of them.
To be honest this was my solution in a lot of the projects and it works pretty much fine. Basically coordinator would under the hood go to multiple nodes anyway but would also have to work more for you to get you all the requests, might run into timeouts etc.
In short it's far better for the cluster and more performant if client asks multiple times (using multiple coordinators with smaller requests) than to make single coordinator do to all the work.
This is all in case you can't afford more disk space for your cluster
Usual Cassandra solution
Data in cassandra is suggested to be ready for query (query first). So basically you would have to have one additional table that would have the same partitioning key as you have it now, and you would have to drop the clustering column activity_timestamp. i.e.
PRIMARY KEY ((application_id, partner_id, location_id, device_id, data_schema))
double (()) is intentional.
Every time you would write to your table you would also write data to the latest_entry (table without activity_timestamp) Then you can specify the query that you need with in and this table contains the latest entry so you don't have to use the limit 1 because there is only one entry per partitioning key ... that would be the usual solution in cassandra.
If you are afraid of the additional writes, don't worry , they are inexpensive and cpu bound. With cassandra it's always "bring on the writes" I guess :)
Basically it's up to you:
multiple queries - a bit of refactoring, no additional space cost
new schema - additional inserts when writing, additional space cost
Your table definition is not suitable for such use of the IN clause. Indeed, it is supported on the last field of the primary key or the last field of the clustering key. So you can:
swap your two last fields of the primary key
use one query for each device id
I have a table in Cassandra which stores versions of csv-files. It uses a primary key with a unique id for the version (the partition key) and a row number (the clustering key). When I insert a new version I first execute a delete statement on the partition key I am about to insert, to clean up any incomplete data. Then the data is inserted.
Now here is the issue. Even though the delete and subsequent insert are executed synchronously after one another in the application it seems that some level of concurrency still exist in Cassandra, because when I read afterwards, rows from my insert will be missing occasionally - something like 1 in 3 times. Here are some facts:
Cassandra 3.0
Consistency ALL (R+W)
Delete using the Java Driver
Insert using the Spark-Cassandra connector
Number of nodes: 2
Replication factor: 2
The delete statement I execute looks like this:
"DELETE FROM myTable WHERE version = 'id'"
If I omit it, the problem goes away. If I insert a delay between the delete and the insert the problem is reduced (less rows missing). Initially I used a less restrictive consistency level, and I was sure this was the issue, but it didn't affect the problem. My hypothesis is that for some reason the delete statement is being sent to the replica asynchronously despite the consistency level of ALL, but I can't see why this would be the case or how to avoid it.
All mutations are going to by default get a write time of the coordinator for that write. From the docs
TIMESTAMP: sets the timestamp for the operation. If not specified,
the coordinator will use the current time (in microseconds) at the
start of statement execution as the timestamp. This is usually a
suitable default.
http://cassandra.apache.org/doc/cql3/CQL.html
Since the coordinator for different mutations can be different, a clock skew between coordinators can end up with a mutations to one machine to be skewed relative to another.
Since write time controls C* history this means you can have a driver which synchronously inserts and deletes but depending on the coordinator the delete can happen "before" the insert.
Example
Imagine two nodes A and B, B is operating with a 5 second clock skew behind A.
At time 0: You insert data to the cluster and A is chosen as the coordinator. The mutation arrives at A and A assigns a timestamp (0)
There is now a record in the cluster
INSERT VALUE AT TIME 0
Both nodes contain this message and the request returns confirming the write was successful.
At time 2: You issue a delete for the data previously inserted and B is chosen as the coordinator. B assigns a timestamp of (-3) because it is clock skewed 5 seconds behind the time in A. This means that we end up with a statement like
DELETE VALUE AT TIME -3
We acknowledge that all nodes have received this record.
Now the global consistent timeline is
DELETE VALUE AT TIME -3
INSERT VALUE AT TIME 0
Since the insertion occurs after the delete the value still exists.
I have got similar problem, and I have fixed it by enabling Light-Weight-Transaction for both INSERT and DELETE requests (for all queries actually, including UPDATE). It will make sure all queries to this partition are serialized through one "thread", so DELETE wan't overwrite INSERT. For example (assuming instance_id is a primary key):
INSERT INTO myTable (instance_id, instance_version, data) VALUES ('myinstance', 0, 'some-data') IF NOT EXISTS;
UPDATE myTable SET instance_version=1, data='some-updated-data' WHERE instance_id='myinstance' IF instance_version=0;
UPDATE myTable SET instance_version=2, data='again-some-updated-data' WHERE instance_id='myinstance' IF instance_version=1;
DELETE FROM myTable WHERE instance_id='myinstance' IF instance_version=2
//or:
DELETE FROM myTable WHERE instance_id='myinstance' IF EXISTS
IF clauses enable light-wight-transactions for each row, so all of them are serialized. Warning: LWT is more expensive than normal calls, but sometimes they are needed, like in the case of this concurrency problem.
I have a list of unordered events and my task is to store first and last occurrences for them.
I have following column family in Cassandra:
CREATE TABLE events (
event_name TEXT,
first_occurrence BIGINT,
last_occurrence BIGINT,
PRIMARY KEY (event_name)
);
So if I have an event with the name "some_event" and occurrence with 123456, what I want to do is something which in MySQL terms would look like this:
INSERT INTO events (event_name, first_occurence, last_occurence)
VALUES ('some_event', 123456, 123456)
ON DUPLICATE KEY UPDATE
first_occurrence = LEAST(first_occurrence, 12345),
last_occurrence = GREATEST(last_occurrence, 123456)
I was going to use lightweight transactions in Cassandra to accomplish it, like this:
INSERT INTO events(event_name, first_occurrence, last_occurrence) VALUES ('some_event', 12345, 12345) IF NOT EXISTS;
UPDATE events SET first_occurrence = 123456 WHERE event_name='some_event' IF first_occurrence > 123456;
UPDATE events SET last_occurrence = 123456 WHERE event_name='some_event' IF last_occurrence < 123456;
But as it turns out, CQL3 does not allow < and > operators in lightweight transactions IF clause.
So my question is, what is the pattern for doing such conditional updates?
What version of cassandra are you running? Support for non-equal conditions with LWTs was added in 2.1.1 via CASSANDRA-6839:
cqlsh:test> UPDATE events SET first_occurrence = 123456 WHERE event_name='some_event' IF first_occurrence > 1;
[applied]
-----------
True
Cassandra does not do read before write with only 2 exceptions - counters and "lightweight transactions". As a result you will not be able to reliably implement your scenario directly in Cassandra. Even if you read the values out and then make an update based on these values, you may overwrite someone else's changes, since there is no locking and isolation in Cassandra, and eventual consistency makes it even worse.
So if you need to implement something like this, you will need to do it outside of Cassandra. Create a synchronization layer which will provide a central point for Cassandra writes, and make this layer responsible for the logic. Just make sure that no writes are making it around this layer.