Cassandra LOCAL_QUORUM - cassandra

I'm having trouble understanding / finding information about how various quorums are calculated in cassandra.
Let's say I have a 16 node cluster using Network Topology Strategy across 2 data centers. The replication factor is 2 in each datacenter (DC1: 2, DC2: 2).
In this example, if I write using a LOCAL_QUORUM, I will write the data to 4 nodes (2 in each data center) but when will the acknowledgement happen? After 2 nodes in 1 data center are written?
In addition, to maintain strong read consistency, I need Write nodes + read nodes > replication factor. In the above example, if both reads and writes were LOCAL_QUORUM, I would have 2 + 2 which would not guarantee strong read consistency. Am I understanding this correctly? What level would I need then to ensure strong read consistency?
The goal here is to ensure that if a data center fails, reads/writes can continue while minimizing latency.

The write will be successful after the coordinator received acknowledgement from 2 nodes from the same DC of the coordinator.
Using LOCAL_QUORUM for both reads and write will get you strong consistency, provided the same DC will be used for both reads and write, and just for this DC.

The previous answer is correct: "The write will be successful after the coordinator received acknowledgement from 2 nodes from the same DC of the coordinator." It is the same for reads.
The Quorum is always calculated by N/2+1 (N being the replication factor), having a local_quorum avoids the latency of the other data center.
As far as I understand, with a RF of 2 and LOCAL_QUORUM you have better local consistency but no availability in case of partition: if one single node drops, all writes and reads will fail for the range tokens of that node and its replica.
Therefore I recommend a RF of 3 if you intend to use Quorum. For 2 replica you should better use ONE.

client will get WRITE or READ acknowledgement from the corrdinator node once LOCAL_QUORUM complete its work in any one data center.
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html
If the write consistency level is LOCAL_ONE or LOCAL_QUORUM, only the nodes in the same datacenter as the coordinator node must respond to the client request in order for the request to succeed.
Use either LOCAL_ONE or LOCAL_QUORUM to reduce geographical latency lessen the impact on client write request response times.

Related

Cassandra Spark Connector : requirement failed, contact points contain multiple data centers

I have two Cassandra datacenters, with all servers in the same building, connected with 10 gbps network. The RF is 2 in each datacenter.
I need to ensure strong consistency inside my app, so I first planed to use QUORUM consistency (3 replicas of 4 must respond) on both reads and writes. With that configuration, I can also be fault tolerant if a node crash on a particular datacenter.
So I set multiples contact point from multiples datacenter to my spark connector, but the following error is immediately returned : requirement failed, contact points contain multiple data centers
So I look at the documentation. It say :
Connections are never made to data centers other than the data center of spark.cassandra.connection.host [...]. This technique guarantees proper workload isolation so that a huge analytics job won't disturb the realtime part of the system.
Okay. So after reading that, I plan to switch to LOCAL_QUORUM (2 replicas of 2 must respond) on write, and LOCAL_ONE on read, to still get strong consistency, and connect by default on datacenter1.
The problem, is still consistency, because Spark apps working on the second datacenter datacenter2 don't have strong consistency on write, because data are just asynchronously synchronized from datacenter1.
To avoid that, I can set write consistency to EACH_QUORUM (= ALL). But the problem in that case, is if a single node is unresponsive or down, the entire writes are unable to process.
So my only option, to have both some fault tolerance, AND strong consistency, is to switch my replication factor from 2 to 3 on each datacenter. Then use EACH_QUORUM on write, and LOCAL_QUORUM on read ? Is that correct ?
Thank you
This comment indicates there is some misunderstanding on your part:
... because data are just asynchronously synchronized from datacenter1.
so allow me to clarify.
The coordinator of a write request sends each mutation (INSERT, UPDATE, DELETE) to ALL replicas in ALL data centres in real time. It doesn't happen at some later point in time (i.e. 2 seconds later, 10s later or 1 minute later) -- it gets sent to all DCs at the same time without delay regardless of whether you have a 1Mbps or 10Gbps link between DCs.
We also recommend a minimum of 3 replicas in each DC in production as well as use LOCAL_QUORUM for both reads and writes. There are very limited edge cases where these recommendations do not apply.
The spark-cassandra-connector requires all contacts points to belong to the same DC so:
analytics workloads do not impact the performance of OLTP DCs (as you already pointed out), and
it can achieve data-locality for optimal performance where possible.

What is correct understandanding of Cassandra tunable consistency?

It is said consistency level N defines number of replicas needed to acknowledge every read and write operation.The bigger that number, the more consistent result we have.
If we define that parameter as N (N < M/2), where M is cluster size, does it mean the following situation is possible :
1 data center. two concurrent writes happened successfully(they updated the same key with different values)?
And consequently, two subsequent concurrent reads return different values for the same key? Am i correct?
Yes, we can tune consistency based on requirements for read and writes. Quorum is recommended consistency level for Cassandra for single DC. we can calculate from below Quorum =N/2+1 where N is number of replica. Consistency we can set from below command
CONSISTENCY [level]
For more details on tunable consistency please refer below.
https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-1-7aee6b472fb4
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutDataConsistency.html
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshConsistency.html
In Cassandra it is highly possible that different client applications are updating value of same key on different nodes. You can restrict this always by tuning your consistency level.
Consistency Level always depends on Replication Factor decided by you.
If RF=3 from 5 nodes DC, then Consistency level QUORUM or LOCAL_QUORUM means 2 nodes out of 3 which are having replica.
Any of the below combination should give you correct data, after tuning:
WRITE=ALL READ=ONE
WRITE=ONE READ=ALL
WRITE=LOCAL_QUORUM READ=LOCAL_QUORUM
You can tune consistency level in your application, as per load of the application.
According to me, Number 3 LOCAL_QUORUM should work better, As sometimes a node can be under high load or maybe is down. Your application will not get affected.
In case, you have more writes than READ; WRITE CL=ALL will make your application slow.

Cassandra Read , Read repair

Scenario: Single data centre with replication factor 7 and read consistency level quorum.
During read request fastest replica gets a data request. But How many remaining replicas send the digest.
Q1 : Does all remaining (leaving fastest replica) replicas send the digest to coordinator. and the fastest 3 will be considered to satisfy the consistency. OR only 3 ((7 / 2 + 1) - 1(fastest) = 3) replicas will be chosen to send the digest.
Q2 : In both the case how read repair will work. How many and which nodes will get in sync after read repair runs.
This is taken from this excellent blog post which you should absolutely read: https://academy.datastax.com/support-blog/read-repair
There are broadly two types of read repair: foreground and background. Foreground here means blocking -- we complete all operations before returning to the client. Background means non-blocking -- we begin the background repair operation and then return to the client before it has completed.
In your case, you'll be doing a foreground read-repair as it is performed on queries which use a consistency level greater than ONE/LOCAL_ONE. The coordinator asks one replica for data and the others for digests of their data (currently MD5). If there's a mismatch in the data returned to the coordinator from the replicas, Cassandra resolves the situation by doing a data read from all replicas and then merging the results.
This is one of the reasons why it's important to make sure you continually have anti-entropy repair running and completing. This way, the chances of digest mismatches on reads are lower.

Cassandra tunable consistency exercise

I need some help solving an exercise for school. It's about tunable consistency in Cassandra.
Given a cluster of 15 nodes, complete the following table. In case of multiple posibilities give all of them. CL values are: ANY, ONE, QUORUM, ALL
Thank you very much for your help!
p.s. I'm sure that we need the following rule to solve this: nodes read + nodes written > replication factor to be consistent
This document here should outline the consistency levels and how they function:
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html
I've copied some of the content here for clarity if the link becomes broken in the future
Write Consistency Levels
ALL
A write must be written to the commit log and memtable on all replica
nodes in the cluster for that partition.
EACH_QUORUM Strong consistency. A write must be written to the commit log and memtable on a quorum of replica nodes in each
datacenter.
QUORUM
A write must be written to the commit log and memtable on a quorum of
replica nodes across all datacenters.
LOCAL_QUORUM
Strong consistency. A write must be written to the commit log and
memtable on a quorum of replica nodes in the same datacenter as the
coordinator. Avoids latency of inter-datacenter communication.
ONE
A write must be written to the commit log and memtable of at least one
replica node.
TWO A write must be written to the commit log and memtable of at least two replica nodes.
THREE A write must be written to the commit log and memtable
of at least three replica nodes.
LOCAL_ONE
A write must be sent to, and successfully acknowledged by, at least
one replica node in the local datacenter.
ANY
A write must be written to at least one node. If all replica nodes for
the given partition key are down, the write can still succeed after a
hinted handoff has been written. If all replica nodes are down at
write time, an ANY write is not readable until the replica nodes for
that partition have recovered.
Read consistency levels
ALL
Returns the record after all replicas have responded. The read
operation will fail if a replica does not respond. EACH_QUORUM
Not supported for reads.
QUORUM
Returns the record after a quorum of replicas from all datacenters has
responded.
LOCAL_QUORUM
Returns the record after a quorum of replicas in the current
datacenter as the coordinator has reported. Avoids latency of
inter-datacenter communication.
ONE
Returns a response from the closest replica, as determined by the
snitch. By default, a read repair runs in the background to make the
other replicas consistent.
TWO
Returns the most recent data from two of the closest replicas.
THREE
Returns the most recent data from three of the closest replicas.
LOCAL_ONE
Returns a response from the closest replica in the local datacenter.
SERIAL
Allows reading the current (and possibly uncommitted) state of data
without proposing a new addition or update. If a SERIAL read finds an
uncommitted transaction in progress, it will commit the transaction as
part of the read. Similar to QUORUM.
LOCAL_SERIAL
Same as SERIAL, but confined to the datacenter.
I think this should be the correct answer. Please correct me if I'm wrong. Ignore the dutch sentences in the table, I don't think it will pose any problems for english readers.

Consistency and timeout issues with Cassandra 2.2

I'm using Cassandra 2.2 and I've an application that requires a high level of consistency.
I've configured one datacenter cluster with 3 nodes.
My keyspace is created with replication_factor of 2.
In each configuration.yaml files I've set 2 seed_providers (for example NODE_1 and NODE_3).
The important thing is that my app should be full-functional even if one node is down.
Currently I've some issues with the consistency and timeout when my app contacts the cluster.
I've read the whole Cassandra 2.2 documentation and I concluded that the best CONSISTENCY LEVEL for my write operations should be QUORUM and for my read operations ONE, but I still have some consistency issues.
First of all, is it the right choice to have a strong level of consistency?
And also, are UPDATE and DELETE operations considered as write or read operations, since for example an update operation with a WHERE clause still has to 'read' data? I'm not sure, spacially in the context of the cassandra' write workflow.
My second issue is the timeout during the write operations. A simple and lightweight INSERT sometimes get "Cassandra timeout during write query at consistency QUORUM (2 replicas were required but only 1 acknowledged the write)"
or sometines even "... 0 acknoledged" even though all of my 3 nodes are UP.
Are there some other parameters that I should check, like for example write_request_timeout_in_ms, with default value of 2000 ms (which is already a high value)?
You will have strong consistency with Replication Factor = 2 and Consistency Level = QUORUM for write operations and ONE for read operations. But write operations will fail if one node is down. Consistency Level = QUORUM is the same as ALL in case Replication Factor = 2.
You should use Replication Factor = 3 and Consistency Level = QUORUM for both write and read operations, to have strong consistency and full functional app even if one node is down.
DELETE and UPDATE operations are write operations.
For the timeout issue please provide table model and queries that fails.
Updated
Consistency level applies to replicas, not nodes.
Replication factor = 2 means that 2 of 3 nodes will contain data. These nodes will be replicas.
QUORUM means that a write operation must be acknowledged by 2 replicas (when replication factor=2), not nodes.
Cassandra places the data on each node according to the partition key. Each node is responsible for a range of partition keys. Not any node can store any data, so you need have alive replicas (not nodes) to perform operations. Here article about data replication and distribution.
When you perform QUORUM write request to cluster with 2 of 3 alive nodes, there is a chance that the cluster has only 1 alive replica for the partition key, in this case the write request will fail.
In additional: here is a simple calculator for Cassandra parameters

Resources