How can Datastax Java API handle consistency level as ALL in cassandra - cassandra

Setup I have 4 nodes for cassandra cluster (same datacenter). Replication factor is 3. Write consistency is set to ALL
As I understand, Cassandra doesn't have master node. Thus I can write data to any random node as I want. Let's say I have 03 nodes A, B and C. I write to node A record 123, value is 4.
Question 1: Will execute() method in Session object be blocked until the data has been replicated on all replicas ?
Another situation: Let's say the record 123 with value of 5 is also written to node B, 100 millisecond after the request for inserting record 123 with value of 4 arrived at node A.
Question 2: When B is a replica of A, how can cassandra handles this situation in its architecture? Will cassandra node use their internal time to decide which node received the record first? Or all replicas will share the same lock for writing data?
Question 3: When B is not a replica of A, and I have read consistency is set to ALL. If I query for the value of record 123 randomly on node A or B, how can Cassandra handle this situation ?
I'm new to Cassandra thus any answer or help is highly appreciated.
Thank you very much.

Will execute() method in Session object be blocked until the data has been replicated on all replicas ?
The session object will be blocked until N acknowledgements of your mutation(s) are received, N depends on the chosen consistency level. In your case, since you're using ALL, the client will block until acknowledgements are received from all replicas.
When B is a replica of A, how can cassandra handles this situation in its architecture? Will cassandra node use their internal time to decide which node received the record first? Or all replicas will share the same lock for writing data?
The coordinator node (the one which receives the request) will dispatch the write, in parallel, to all replicas. With modern drivers like the Java driver, most of the time the coordinator node is chosen so that it is a replica for the partition being inserted, to avoid one extra network hop.
The role of the coordinator is also to set a timestamp value on each column of your write. This timestamp is the same and will be sent to all replicas
When B is not a replica of A, and I have read consistency is set to ALL. If I query for the value of record 123 randomly on node A or B, how can Cassandra handle this situation ?
In this case, the node which receives the request, called coordinator, will act as a proxy by forwarding the request to the appropriate replica(s) and by forwarding the response(s) it receives back to the client.
Each node knows about the topology of the whole cluster (token range, IP address) so that each node can play the role of a coordinator at any time.
More details about how the data distribution is handled in Cassandra here: http://www.slideshare.net/doanduyhai/cassandra-introduction-apache-con-2014-budapest/18

Related

How does Cassandra handle inconsistencies between two replicas?

I have a simple question on the strategy Cassandra opted for when the following scenario happen
Scenario
At T1, replica 1 receives the write mutation like name = amit, language = english
At T1 + 1, replica 2 receives the update like language = japanese where name = amit
Assume, that if the write record is not replicated on replica 2 when the update for the record has come, then how does Cassandra handles the scenario.
My Guess - May be replica 2 will check the lamport timestamp of
update message say it 102 and ask replica 1 for any record which
is less than 102 so that it ( replica 2 ) can execute them first
then execute the update statement.
Any help would be appreciated.
Under the hood (for normal operations, not LWTs) both INSERTs and UPDATEs are UPSERTs - they aren't dependent on the previous state to perform update of the data. When you perform UPDATE, then Cassandra just put the corresponding value without checking if corresponding primary key exists, and that's all. And even if the earlier operations come later, Cassandra will check the "write time" to resolve the conflict.
For your case, it will go as following:
replica 1 receives the write, and retransmit it to other replicas in the cluster, including replica 2. If replica 2 isn't available in that moment, the mutation will be written as a hint that will be replayed when replica 2 is up.
replica 2 may receive new updates, and also retransmit it to other replicas.
The coordinator deals with the inconsistencies depending on the consistency level (CL) used. There are also other nuanced behaviour which again are tied to the consistency level for read and write requests.
CASE A - Failed writes
If your application uses a weak consistency of ONE or LOCAL_ONE for writes, the coordinator will (1) return a successful write response to the client/driver even if just one replica acknowledges the write, and (2) will store a hint for the replica(s) which did not respond.
When the replica(s) is back online, the coordinator will (3) replay the hint (resend the write/mutation) to the replica to keep it in sync with other replicas.
If your application uses a strong consistency of LOCAL_QUORUM or QUORUM for writes, the coordinator will (4) return a successful write response to the client/driver when the required number of replicas have acknowledged the write. If any replicas did not respond, the same hint storage in (2) and hint replay in (3) applies.
CASE B - Read with weak CL
If your application issues a read request with a CL of ONE or LOCAL_ONE, the coordinator will only ever query one replica and will return the result from that one replica.
Since the app requested the data from just one replica, the data does NOT get compared to any other replicas. This is the reason we recommend using a strong consistency level like LOCAL_QUORUM.
CASE C - Read with strong CL
For a read request with a CL of LOCAL_QUORUM against a keyspace with a local replication factor of 3, the coordinator will (5) request the data from 2 replicas. If the replicas don't match, the (6) data with the latest timestamp wins, and (7) a read-repair is triggered to repair the inconsistent replica.
For more info, see the following documents:
How read requests are accomplished in Cassandra
How writes are accomplished in Cassandra

Does Cassandra read have inconsistency?

I am new to Cassandra and am trying to understand how it works. Say if a write to a number of nodes. My understanding is that depending on the hash value of the key, its decided which node owns the data and then the replication happens. While reading the data , the hash of the key determines which node has the data and then it responds back. Now my question is that if reading and writing happen from the same set of nodes which always has the data then how does read inconsistency occurs and Cassandra returns stale data ?
For Tuning consistency cassandra allows to set the consistency on per query basis.
Now for your question, Let's assume CONSISTENCY is set to ONE and Replication factor is 3.
During WRITE request coordinator sends a write request to all replicas that own the row being written. As long as all replica nodes are up and available, they will get the write regardless of the consistency level specified by the client. The write consistency level determines how many replica nodes must respond with a success acknowledgment in order for the write to be considered successful. Success means that the data was written to the commit log and the memtable.
For example, in a single data center 10 node cluster with a replication factor of 3, an incoming write will go to all 3 nodes that own the requested row. If the write consistency level specified by the client is ONE, the first node to complete the write responds back to the coordinator, which then proxies the success message back to the client. A consistency level of ONE means that it is possible that 2 of the 3 replicas could miss the write if they happened to be down at the time the request was made. If a replica misses a write, Cassandra will make the row consistent later using one of its built-in repair mechanisms: hinted handoff, read repair, or anti-entropy node repair.
By default, hints are saved for three hours after a replica fails because if the replica is down longer than that, it is likely permanently dead. You can configure this interval of time using the max_hint_window_in_ms property in the cassandra.yaml file. If the node recovers after the save time has elapsed, run a repair to re-replicate the data written during the down time.
Now when READ request is performed co-ordinator node sends these requests to the replicas that can currently respond the fastest. (Hence it might go to any 1 of 3 replica's).
Now imagine a situation where data is not yet replicated to third replica and during READ that replica is selected(chances are very negligible), then you get in-consistent data.
This scenario assumes all nodes are up. If one of the node is down and read-repair is not done once the node is up, then it might add up to issue.
READ With Different CONSISTENCY LEVEL
READ Request in Cassandra
Consider scenario where CL is QUORUM, in which case 2 out of 3 replicas must respond. Write request will go to all 3 replica as usual, if write to 2 replica fails, and succeeds on 1 replica, cassandra will return Failed. Since cassandra does not rollback, the record will continue to exist on successful replica. Now, when the read come with CL=QUORUM, and the read request will be forwarded to 2 replica node and if one of the replica node is the previously successful one then cassandra will return the new records as it will have latest timestamp. But from client perspective this record was not written as cassandra had returned failure during write.

Cassandra - data loss on a dead node with CF = 1

I'm a newbie to Cassandra and have a question on the commit log which is configured to use periodic mode (10 seconds).
Suppose we have a node that processes a request with CF = 1 and RF = 3. If the node is in a state in which the commit log has not been flushed to disk and replication of the data is also pending, would we loose data if the node crashes in this state?
Another follow-up question, which node is responsible for replicating the data on other nodes based on RF=3? Is is the coordinator node or some other node which processes the request depending on consistency level?
I think following link might be of use to you:
https://www.ecyrd.com/cassandracalculator/
Yes, data loss is possible in this scenario because data would not reach other nodes, so no copies exist. As if the data was not there. The thing is this window is actually quite small because with RF 3 the other nodes will receive the insert within the milliseconds (Unless there is some really heavy load on the node).
All of the RF requests (per single client request) are handled by the coordinator. Also if the node might not be there when the coordinator needs to replicate it stores the data in a hint.
So to sum it up yes data loss is possible but the probability is really small.
With CL=ONE when a coordinator crashes and goes down uncleanly there is a window where data loss is possible before the mutation is sent to replicas and commit log is flushed. Its pretty small window and unlikely but if its a concern use local quorum or batch mode.
The coordinator will send data to all replicas and store hints for whatever hasn't acked.

Read repair for cassandra with quorum

I need to understand on read-repair for Cassandra 3.0. For example, I have three nodes A, B & C. My replication factor is 3. Now, I wrote with Quorum, and it successfully wrote on Node A & B, so client will receive success but somehow data was not written on Node C (it was down, and hints throttle time elapsed).
I have not run manual repair and my read repair change is 0.1.
After, few days, my node A is down, leaving me with Node B & Node C. So if I issue a read query with quorum, will read repair always write data to node C and return to client successfully or there may be a scenario, where client can receive an error of "unable to achieve consistency level".
If 2 out of 3 replicas are up, then, the Quorum consistency will be achieved therefore the client will be able to read data. As one of the nodes doesn't have any data therefore, read repair will happen.
As per my understanding(I'm new in Cassandra), whenever a query is executed in Cassandra, the coordinator node checks if the desired number of replicas(requested consistency) are able to respond to the query. If it happens, then the client receives the most recent version of the data(timestamps of the data returned by each node is compared) and then that recent version is written over all the remaining replicas via read repair in case of mismatch.

Cassandra not working when one of the nodes is down

I have a development cassandra cluster of two cassandra nodes [Let's call them NodeA and NodeB]. I also have a script that is continuously sending data on NodeA. I have created the database with the following parameters:
CREATE KEYSPACE test_database WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
Now, for some reason NodeB is stoping after some time. But the issue is, as soon as NodeB stops, the script that is sending data to NodeA starts giving data insertion error.
Can anyone point out a probable reason for the same.
Update: Both the nodes are seed nodes.
How Cassandra handle data repartition
Each key in cassandra can be converted to a token. When you install your cluster, the nodes calculate what range of token they will accept.
Let's take a simple example:
You have two nodes, and a token that goes from 0 to 9. A simple repartition would be: node A stores every token between 0-4 and node B stores every token between 5-9.
How Cassandra works for write
You choose a Coordinator (in your case node A), that receive the data. This node will then calculate a token. As seen in the first example, every node has a range of token assigned to it. So imagine the key is converted to token 4, then the data goes to node A (here the coordinator). If the token is 8, the data will be sent to node B.
What is cassandra data replication factor
The replication factor is how many time your data will be stored on your cluster. For a single database with no racks (your case), the data is first send to the node who owns the token associated with the key, and the replicas are sent to the next node in the topology.
In case of failure of one node, the replicas will help the node to restore its data.
In your case, there are no replicas, and if a node is down, Cassandra can't store the data and throws an error. If you have replication factor 2, Cassandra should be able to store a replica on node A and not fail.
Cassandra's Replication Factor:
Lets say we have 'n' as replication factor which means given input data will be stored/retrieved from 'n' nodes.
t
If you mention the replication factor as '1' which means only one node will have the data.
Partitioning:
Lets say we have 2 nodes, whenever you are inserting the data. Both these nodes will have some data, based on partitioning algorithm mentioned.
For example:
You are inserting 10 records, based on the hashing and partitioning algorithm, it chooses which node needs to be written for each record. Of-course the identification of node is done by the Coordinator :)
Durable Writes:
By default, cassandra always write in commit-log before flushing to disk. If you set to false, it will bypass commit-log and write directly to disk(SSTable).
The problem you have mentioned, for example lets say you are inserting 10 rows.
For simplicity, we can make the partitioning/hashing calculation as n/2.
So, Cassandra's Coordinator node splits up your data into two pieces(for simple calculation it will be 10/2) and tries to put 1st half in to 1st node and succeeds and tries to put the 2nd half into the second node(writing to commit-log), since it is unavailable it is throwing error.
So how do we fix this issue? lets say I want to batch insert multiple insert queries when 1 node in a cluster is down? It returns me
Connection to Cassandra cluster associated with connection cs1 not available due to Host not available. Host Address: cassandra1
If your table is not counter table , you can use consistency level of ANY which gives high availaiblity for write.
Refer this to learn more about it => https://www.datastax.com/blog/2011/05/understanding-hinted-handoff-cassandra-08

Resources