Cassandra tunable consistency exercise - cassandra

I need some help solving an exercise for school. It's about tunable consistency in Cassandra.
Given a cluster of 15 nodes, complete the following table. In case of multiple posibilities give all of them. CL values are: ANY, ONE, QUORUM, ALL
Thank you very much for your help!
p.s. I'm sure that we need the following rule to solve this: nodes read + nodes written > replication factor to be consistent

This document here should outline the consistency levels and how they function:
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html
I've copied some of the content here for clarity if the link becomes broken in the future
Write Consistency Levels
ALL
A write must be written to the commit log and memtable on all replica
nodes in the cluster for that partition.
EACH_QUORUM Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in each
datacenter.
QUORUM
A write must be written to the commit log and memtable on a quorum of
replica nodes across all datacenters.
LOCAL_QUORUM
Strong consistency. A write must be written to the commit log and
memtable on a quorum of replica nodes in the same datacenter as the
coordinator. Avoids latency of inter-datacenter communication.
ONE
A write must be written to the commit log and memtable of at least one
replica node.
TWO A write must be written to the commit log and memtable of at least two replica nodes.
THREE A write must be written to the commit log and memtable
of at least three replica nodes.
LOCAL_ONE
A write must be sent to, and successfully acknowledged by, at least
one replica node in the local datacenter.
ANY
A write must be written to at least one node. If all replica nodes for
the given partition key are down, the write can still succeed after a
hinted handoff has been written. If all replica nodes are down at
write time, an ANY write is not readable until the replica nodes for
that partition have recovered.
Read consistency levels
ALL
Returns the record after all replicas have responded. The read
operation will fail if a replica does not respond. EACH_QUORUM
Not supported for reads.
QUORUM
Returns the record after a quorum of replicas from all datacenters has
responded.
LOCAL_QUORUM
Returns the record after a quorum of replicas in the current
datacenter as the coordinator has reported. Avoids latency of
inter-datacenter communication.
ONE
Returns a response from the closest replica, as determined by the
snitch. By default, a read repair runs in the background to make the
other replicas consistent.
TWO
Returns the most recent data from two of the closest replicas.
THREE
Returns the most recent data from three of the closest replicas.
LOCAL_ONE
Returns a response from the closest replica in the local datacenter.
SERIAL
Allows reading the current (and possibly uncommitted) state of data
without proposing a new addition or update. If a SERIAL read finds an
uncommitted transaction in progress, it will commit the transaction as
part of the read. Similar to QUORUM.
LOCAL_SERIAL
Same as SERIAL, but confined to the datacenter.

I think this should be the correct answer. Please correct me if I'm wrong. Ignore the dutch sentences in the table, I don't think it will pose any problems for english readers.

Related

Why cassandra is considered as partition tolerant by CAP theorem despite we can isolate the coordinator?

Here is the definition of partition tolerance by Gilbert and Lynch
When a network is partitioned, all messages sent from nodes in one
component of the partition to nodes in another component are lost.
Let's divide the cluster into two partitions: the first one contains only the coordinator, the second one contains all other nodes. This way coordinator will not be able to contact any replicas and will respond with error. Is it allowed for partition tolerant systems?
More specifically I think the question is which of the other two CAP attributes does Cassandra retain in the face of such a Partition.
The answer is dependent on the configured consistency level. For writes there is the ANY consistency level. At this consistency level, so long as hinted-handoffs are enabled, the coordinator will record the write and maintain Availability. Clients connected to other coordinators will not be able to see the udpated value until the partition is resolved, so reads will not be Consistent. If a stronger consistency level is chosen, then the client is explicitly configuring Consistency over Availability.
So can Cassandra (given that it does not necessarily replicate all data to all nodes) be considered AP when a read coordinator is alone in a partition? If it responds with an error that sounds like Consistency to me, if it responds with an empty result set because the data is not in its partition, then that would be Availability. Since the weakest read consistency level is ONE - requiring at least one replica to respond, Cassandra opts for the former: If the coordinator is not itself one of the replicas owning the requested data then the read will time out and not be Available. As with writes, any stronger read consistency level explicitly configures Cassandra to behave more Consistently at the expense of Availability.
So the "coordinator" node isn't a long-lasting or "leader"-like definition. It changes with practically every query. If there was a non-token-aware operation which needed a coordinator node, and that coordinator was suddenly partitioned-off from the rest, then that one query would fail.
The next query (or a retry) would pick a new node as a coordinator. The only issue, would be that some data rows will be short by one replica (data stored on the partitioned node). But as long as you're querying by ONE and have a RF >= 2, the cluster will continue on like nothing happened.
So "yes," Cassandra is definitely partition-tolerant.
Note: This is why it's important to use a token-aware load balancing policy. That way the driver picks one of the nodes containing the required data as the "coordinator." At consistency ONE, the operation is completed locally, and a network hop is taken out of the equation.

How does Cassandra guarantee eventual consistency in cross region replication?

I cannot find much documentation about it. The only thing I can find is that when the consistency level is not set to EACH_QUORUM, cross region replication is done asynchronously.
But in asynchronous style, is it possible to lose messages? How does Cassandra handle losing messages?
If you don't use EACH_QUORUM and a destination node which would accept a write is down, then coordinator node is saving writes as "hinted handoffs".
When destination node becomes available again, coordinator replays hinted handoffs on destination.
For any occasion when hinted handoffs are lost, you have to do run a repair on your cluster.
Also you have to be aware of that storing hints is allowed for maximum of 3 hours by defaults.
For further info see documentation at:
http://www.datastax.com/dev/blog/modern-hinted-handoff
http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesHintedHandoff.html
Hope this helps.
When you issue a write in Cassandra, the coordinator sends the write to all online replicas, and then blocks. The duration of the block corresponds to consistency level - if you say "ALL", it blocks until all nodes ack the write. If you use "EACH_QUORUM", it blocks until a quorum of nodes in each datacenter ack the write.
For any replica that didn't ack the write, the coordinator will write a hint, and attempt to deliver that hint later (minutes, hours, no guarantee).
Note, though, that the writes were all sent at the same time - what you don't have is a guarantee as to which were delivered. Your guarantee is in the consistency level.
When you read, you'll do something similar - you'll block until you have an appropriate number of replicas answering. If you write with EACH_QUORUM, you can read with LOCAL_QUORUM and guarantee strong consistency. If you write with QUORUM, you could read with QUORUM. If you write with ONE, you could still guarantee strong consistency if you read with ALL.
To guarantee eventual consistency, you don't have to do anything - it'll eventually get there, as long as you wrote with CL >= ONE (CL ANY isn't really a guarantee).

Does Cassandra read have inconsistency?

I am new to Cassandra and am trying to understand how it works. Say if a write to a number of nodes. My understanding is that depending on the hash value of the key, its decided which node owns the data and then the replication happens. While reading the data , the hash of the key determines which node has the data and then it responds back. Now my question is that if reading and writing happen from the same set of nodes which always has the data then how does read inconsistency occurs and Cassandra returns stale data ?
For Tuning consistency cassandra allows to set the consistency on per query basis.
Now for your question, Let's assume CONSISTENCY is set to ONE and Replication factor is 3.
During WRITE request coordinator sends a write request to all replicas that own the row being written. As long as all replica nodes are up and available, they will get the write regardless of the consistency level specified by the client. The write consistency level determines how many replica nodes must respond with a success acknowledgment in order for the write to be considered successful. Success means that the data was written to the commit log and the memtable.
For example, in a single data center 10 node cluster with a replication factor of 3, an incoming write will go to all 3 nodes that own the requested row. If the write consistency level specified by the client is ONE, the first node to complete the write responds back to the coordinator, which then proxies the success message back to the client. A consistency level of ONE means that it is possible that 2 of the 3 replicas could miss the write if they happened to be down at the time the request was made. If a replica misses a write, Cassandra will make the row consistent later using one of its built-in repair mechanisms: hinted handoff, read repair, or anti-entropy node repair.
By default, hints are saved for three hours after a replica fails because if the replica is down longer than that, it is likely permanently dead. You can configure this interval of time using the max_hint_window_in_ms property in the cassandra.yaml file. If the node recovers after the save time has elapsed, run a repair to re-replicate the data written during the down time.
Now when READ request is performed co-ordinator node sends these requests to the replicas that can currently respond the fastest. (Hence it might go to any 1 of 3 replica's).
Now imagine a situation where data is not yet replicated to third replica and during READ that replica is selected(chances are very negligible), then you get in-consistent data.
This scenario assumes all nodes are up. If one of the node is down and read-repair is not done once the node is up, then it might add up to issue.
READ With Different CONSISTENCY LEVEL
READ Request in Cassandra
Consider scenario where CL is QUORUM, in which case 2 out of 3 replicas must respond. Write request will go to all 3 replica as usual, if write to 2 replica fails, and succeeds on 1 replica, cassandra will return Failed. Since cassandra does not rollback, the record will continue to exist on successful replica. Now, when the read come with CL=QUORUM, and the read request will be forwarded to 2 replica node and if one of the replica node is the previously successful one then cassandra will return the new records as it will have latest timestamp. But from client perspective this record was not written as cassandra had returned failure during write.

Cassandra LOCAL_QUORUM

I'm having trouble understanding / finding information about how various quorums are calculated in cassandra.
Let's say I have a 16 node cluster using Network Topology Strategy across 2 data centers. The replication factor is 2 in each datacenter (DC1: 2, DC2: 2).
In this example, if I write using a LOCAL_QUORUM, I will write the data to 4 nodes (2 in each data center) but when will the acknowledgement happen? After 2 nodes in 1 data center are written?
In addition, to maintain strong read consistency, I need Write nodes + read nodes > replication factor. In the above example, if both reads and writes were LOCAL_QUORUM, I would have 2 + 2 which would not guarantee strong read consistency. Am I understanding this correctly? What level would I need then to ensure strong read consistency?
The goal here is to ensure that if a data center fails, reads/writes can continue while minimizing latency.
The write will be successful after the coordinator received acknowledgement from 2 nodes from the same DC of the coordinator.
Using LOCAL_QUORUM for both reads and write will get you strong consistency, provided the same DC will be used for both reads and write, and just for this DC.
The previous answer is correct: "The write will be successful after the coordinator received acknowledgement from 2 nodes from the same DC of the coordinator." It is the same for reads.
The Quorum is always calculated by N/2+1 (N being the replication factor), having a local_quorum avoids the latency of the other data center.
As far as I understand, with a RF of 2 and LOCAL_QUORUM you have better local consistency but no availability in case of partition: if one single node drops, all writes and reads will fail for the range tokens of that node and its replica.
Therefore I recommend a RF of 3 if you intend to use Quorum. For 2 replica you should better use ONE.
client will get WRITE or READ acknowledgement from the corrdinator node once LOCAL_QUORUM complete its work in any one data center.
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html
If the write consistency level is LOCAL_ONE or LOCAL_QUORUM, only the nodes in the same datacenter as the coordinator node must respond to the client request in order for the request to succeed.
Use either LOCAL_ONE or LOCAL_QUORUM to reduce geographical latency lessen the impact on client write request response times.

Does Cassandra write to a node(which is up) even if Consistency cannot be met?

The below statement from Cassandra documentation is the reason for my doubt.
For example, if using a write consistency level of QUORUM with a replication factor of 3, Cassandra will replicate the write to all nodes in the cluster and wait for acknowledgement from two nodes. If the write fails on one of the nodes but succeeds on the other, Cassandra reports a failure to replicate the write on that node. However, the replicated write that succeeds on the other node is not automatically rolled back.
Ref : http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_atomicity_c.html
So does Cassandra write to a node(which is up) even if Consistency cannot be met ?
I got it. Cassandra will not even attempt to write if it knows that consistency cannot be met. If consistency CAN be met, but does not have enough replicas to satisfy replication factor, then Cassandra would write to currently available replicas and gives a success message. Later when the replica is up again, it will write to other replica.
For e.g. If Replication factor is 3 , 1 of 3 nodes are down, then if I write with a Consistency of 2, the write will succeed. But if Replication factor is 2 and 1 of 2 nodes are down , then if I write with a Consistency of 2, Cassandra will not even write to that single node which is available.
What is mentioned in the documentation is a case where while write was initiated when the consistency can be met. But in between, one node went down and couldn't complete the write, whereas write succeeded in other node. Since consistency cannot be met, client would get a failure message. The record which was written to a single node would be removed later during node repair or compaction.
Consistency in Cassandra can (is?) be defined at statement level. That means you specify on a particular query, what level of consistency you need.
This will imply that if the consistency level is not met, the statement above has not met consistency requirements.
There is no rollback in Cassandra. What you have in Cassandra is Eventual consistency. That means your statement might be a success in future if not immediately. When a replica node comes a live, the cluster (aka the Cassandra's fault tolerance) will take care of writing to the replica node.
So, if your statement is failed, it might be succeeded in future. This is in contrary to the RDBMS world, where an uncommitted transaction is rolled back as if nothing has happened.
Update:
I stand corrected. Thanks Arun.
From:
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html
During a write operation, when hinted handoff is enabled and consistency can be met, the coordinator stores a hint about dead replicas in the local system.hints table under either of these conditions:
So it's still not rollback. Nodes know the current cluster state and doesn't initiate the write if consistency cannot be met.
At driver level, you get an exception.
On the nodes that the write succeeded, the data is actually written and it is going to be eventually rolled back.
In a normal situation, you can consider that the data was not written to any of the nodes.
From the documentation:
If the write fails on one of the nodes but succeeds on the other,
Cassandra reports a failure to replicate the write on that node.
However, the replicated write that succeeds on the other node is not
automatically rolled back.

Resources