I have a doubt about how does Cassandra return the value in case of LOCAL_QUORUM. If due to some case there is no quorum consensus between the values returned by individual nodes, will Cassandra not return any value at all or return the lastest value based on the timestamp.
Cassandra does not use consensus of the values, for quorum reads, to determine which value to return to the client, it always uses the timestamp value to determine the most recent value.
This most recent value is then used to overwrite the values in the other replicas using read repair, if the values do not match.
Cassandra always works based on timestamp and return the latest value to the client.After checksum read repair updates the replica for that partition.
https://academy.datastax.com/support-blog/read-repair
Related
In Datastax's documentation, it said:
During a write, Cassandra adds each new row to the database without
checking on whether a duplicate record exists. This policy makes it
possible that many versions of the same row may exist in the database.
As far as I understand, that means there are possibly more than 1 non-compacted SSTables that contains different versions of the same row. How does Cassandra handle duplicated data when it read data from these SSTables?
#quangh : As already stated in document :
This is why Cassandra performs another round of comparisons during a read process. When a client requests data with a particular primary key, Cassandra retrieves many versions of the row from one or more replicas. The version with the most recent timestamp is the only one returned to the client ("last-write-wins").
All the writes operation have a timestamp associated. In this case different node will have different version of same row. But during read operation Cassandra will pick row with latest timestamp. I hope this solves your query.
I am using the Datasax nodejs driver for cassandra (https://github.com/datastax/nodejs-driver)
I am wondering if I can get the number of rows updated after executing an update statement.
Thanks,
Updates to Cassandra is treated as another insert. In other words, there is no read before write (update in this case) in Cassandra. The updates are just writes to another immutable sstable and during read Cassandra stitches them together. Its during read, the latest value for that column is choosen.
So in short, there isn't a way in Cassandra to deduce the number of rows updated.
I have a table in Cassandra which stores versions of csv-files. It uses a primary key with a unique id for the version (the partition key) and a row number (the clustering key). When I insert a new version I first execute a delete statement on the partition key I am about to insert, to clean up any incomplete data. Then the data is inserted.
Now here is the issue. Even though the delete and subsequent insert are executed synchronously after one another in the application it seems that some level of concurrency still exist in Cassandra, because when I read afterwards, rows from my insert will be missing occasionally - something like 1 in 3 times. Here are some facts:
Cassandra 3.0
Consistency ALL (R+W)
Delete using the Java Driver
Insert using the Spark-Cassandra connector
Number of nodes: 2
Replication factor: 2
The delete statement I execute looks like this:
"DELETE FROM myTable WHERE version = 'id'"
If I omit it, the problem goes away. If I insert a delay between the delete and the insert the problem is reduced (less rows missing). Initially I used a less restrictive consistency level, and I was sure this was the issue, but it didn't affect the problem. My hypothesis is that for some reason the delete statement is being sent to the replica asynchronously despite the consistency level of ALL, but I can't see why this would be the case or how to avoid it.
All mutations are going to by default get a write time of the coordinator for that write. From the docs
TIMESTAMP: sets the timestamp for the operation. If not specified,
the coordinator will use the current time (in microseconds) at the
start of statement execution as the timestamp. This is usually a
suitable default.
http://cassandra.apache.org/doc/cql3/CQL.html
Since the coordinator for different mutations can be different, a clock skew between coordinators can end up with a mutations to one machine to be skewed relative to another.
Since write time controls C* history this means you can have a driver which synchronously inserts and deletes but depending on the coordinator the delete can happen "before" the insert.
Example
Imagine two nodes A and B, B is operating with a 5 second clock skew behind A.
At time 0: You insert data to the cluster and A is chosen as the coordinator. The mutation arrives at A and A assigns a timestamp (0)
There is now a record in the cluster
INSERT VALUE AT TIME 0
Both nodes contain this message and the request returns confirming the write was successful.
At time 2: You issue a delete for the data previously inserted and B is chosen as the coordinator. B assigns a timestamp of (-3) because it is clock skewed 5 seconds behind the time in A. This means that we end up with a statement like
DELETE VALUE AT TIME -3
We acknowledge that all nodes have received this record.
Now the global consistent timeline is
DELETE VALUE AT TIME -3
INSERT VALUE AT TIME 0
Since the insertion occurs after the delete the value still exists.
I have got similar problem, and I have fixed it by enabling Light-Weight-Transaction for both INSERT and DELETE requests (for all queries actually, including UPDATE). It will make sure all queries to this partition are serialized through one "thread", so DELETE wan't overwrite INSERT. For example (assuming instance_id is a primary key):
INSERT INTO myTable (instance_id, instance_version, data) VALUES ('myinstance', 0, 'some-data') IF NOT EXISTS;
UPDATE myTable SET instance_version=1, data='some-updated-data' WHERE instance_id='myinstance' IF instance_version=0;
UPDATE myTable SET instance_version=2, data='again-some-updated-data' WHERE instance_id='myinstance' IF instance_version=1;
DELETE FROM myTable WHERE instance_id='myinstance' IF instance_version=2
//or:
DELETE FROM myTable WHERE instance_id='myinstance' IF EXISTS
IF clauses enable light-wight-transactions for each row, so all of them are serialized. Warning: LWT is more expensive than normal calls, but sometimes they are needed, like in the case of this concurrency problem.
This is as per the official documentation.
All writes are sequential, which is the primary reason that writes perform so well in
Cassandra. No reads or seeks of any kind are required for writing a value to Cassandra
because all writes are append operations.
am confused because in case there is insert operation and duplicate primary key case, cassandra will require the search first from memtable or in case if data is flushed to sstable.
so if user id with value 123 is already present and we are inserting row with 123, it fails because internally it does read based on that key. this is the doubt i have if someone can clarify pls.
There is no notion of duplicate keys in Cassandra. Every change written to Cassandra has a timestamp and Cassandra does timestamp resolution meaning the data with the latest timestamp always wins and returned. In read path, the content of the key from SSTable is merged with the content of the same key in memtable if exists and the data with latest timestamp is returned. It is worth nothing that each column has a timestamp.
In example:
Let's assume at time 139106495223456 you write the following:
123 => {column1:foo column1_timstamp:139106495223456}
Then after few microseconds (139106495223470) you write to the same key:
123 => {column1:bar column1_timstamp:139106495223470}
Both operations will succeed. When you try to read the kay the one with column1:bar is returned because it has the latest timestamp.
Now you may wonder how this works with deletes. Deletes are written the same way except the column/key which is being deleted will be marked with tombstone. If the tombstone has laster timestamp, the row or column will be considered deleted.
You may wonder how this plays with sequential writes to disk as these tombstones or old columns will consume space. It is true. That is why compaction exists and it takes care of compacting and removing expired tomstones.
You can read more about Cassandra write/read path here:
http://www.planetcassandra.org/blog/category/Cassandra%20read%20path
http://www.datastax.com/docs/1.1/dml/about_writes
Is there some timestamp/counter that can be used to validate that in a read-modify-write cycle, the data in the row did not change between reading and modifying?
In other words, can I read some kind of ID while reading the row, and when I write it back tell Cassandra what that ID was, and the write then fails if the ID changed since then? (Which amounts to saying that some other write took place after I read the data)
Each column in cassandra is a Tuple (or a triplet) that contains a name, value and a timestamp. The timestamp of the column represents the last time it was modified. If you have 100's of nodes, whichever node has an update with a the most recent timestamp will win. This is how Eventual Consistency is achieved.
zznate has a good presentation: Introduction to Apache Cassandra for Java Developers where this topic is referenced (slide 37)
Accessing timestamp of a Cassandra column
In summary, you don't need "some kind of ID" when you have the ability to retrieve the timestamp for a given column representing the last time it was modified. However, at scale, with 100's of nodes, how can you be sure that the node you are connecting to, has the most up to date column? (refer back to the zznate presentation)
Point is, you can't, without enabling transactions:
Cassandra - transaction support
Cassandra Transaction with ZooKeeper - Does this work?
how to integrate cassandra with zookeeper to support transactions
And many more: cassandra & transactions