I'm using Cassandra 2.2 and I've an application that requires a high level of consistency.
I've configured one datacenter cluster with 3 nodes.
My keyspace is created with replication_factor of 2.
In each configuration.yaml files I've set 2 seed_providers (for example NODE_1 and NODE_3).
The important thing is that my app should be full-functional even if one node is down.
Currently I've some issues with the consistency and timeout when my app contacts the cluster.
I've read the whole Cassandra 2.2 documentation and I concluded that the best CONSISTENCY LEVEL for my write operations should be QUORUM and for my read operations ONE, but I still have some consistency issues.
First of all, is it the right choice to have a strong level of consistency?
And also, are UPDATE and DELETE operations considered as write or read operations, since for example an update operation with a WHERE clause still has to 'read' data? I'm not sure, spacially in the context of the cassandra' write workflow.
My second issue is the timeout during the write operations. A simple and lightweight INSERT sometimes get "Cassandra timeout during write query at consistency QUORUM (2 replicas were required but only 1 acknowledged the write)"
or sometines even "... 0 acknoledged" even though all of my 3 nodes are UP.
Are there some other parameters that I should check, like for example write_request_timeout_in_ms, with default value of 2000 ms (which is already a high value)?
You will have strong consistency with Replication Factor = 2 and Consistency Level = QUORUM for write operations and ONE for read operations. But write operations will fail if one node is down. Consistency Level = QUORUM is the same as ALL in case Replication Factor = 2.
You should use Replication Factor = 3 and Consistency Level = QUORUM for both write and read operations, to have strong consistency and full functional app even if one node is down.
DELETE and UPDATE operations are write operations.
For the timeout issue please provide table model and queries that fails.
Updated
Consistency level applies to replicas, not nodes.
Replication factor = 2 means that 2 of 3 nodes will contain data. These nodes will be replicas.
QUORUM means that a write operation must be acknowledged by 2 replicas (when replication factor=2), not nodes.
Cassandra places the data on each node according to the partition key. Each node is responsible for a range of partition keys. Not any node can store any data, so you need have alive replicas (not nodes) to perform operations. Here article about data replication and distribution.
When you perform QUORUM write request to cluster with 2 of 3 alive nodes, there is a chance that the cluster has only 1 alive replica for the partition key, in this case the write request will fail.
In additional: here is a simple calculator for Cassandra parameters
Related
I have two Cassandra datacenters, with all servers in the same building, connected with 10 gbps network. The RF is 2 in each datacenter.
I need to ensure strong consistency inside my app, so I first planed to use QUORUM consistency (3 replicas of 4 must respond) on both reads and writes. With that configuration, I can also be fault tolerant if a node crash on a particular datacenter.
So I set multiples contact point from multiples datacenter to my spark connector, but the following error is immediately returned : requirement failed, contact points contain multiple data centers
So I look at the documentation. It say :
Connections are never made to data centers other than the data center of spark.cassandra.connection.host [...]. This technique guarantees proper workload isolation so that a huge analytics job won't disturb the realtime part of the system.
Okay. So after reading that, I plan to switch to LOCAL_QUORUM (2 replicas of 2 must respond) on write, and LOCAL_ONE on read, to still get strong consistency, and connect by default on datacenter1.
The problem, is still consistency, because Spark apps working on the second datacenter datacenter2 don't have strong consistency on write, because data are just asynchronously synchronized from datacenter1.
To avoid that, I can set write consistency to EACH_QUORUM (= ALL). But the problem in that case, is if a single node is unresponsive or down, the entire writes are unable to process.
So my only option, to have both some fault tolerance, AND strong consistency, is to switch my replication factor from 2 to 3 on each datacenter. Then use EACH_QUORUM on write, and LOCAL_QUORUM on read ? Is that correct ?
Thank you
This comment indicates there is some misunderstanding on your part:
... because data are just asynchronously synchronized from datacenter1.
so allow me to clarify.
The coordinator of a write request sends each mutation (INSERT, UPDATE, DELETE) to ALL replicas in ALL data centres in real time. It doesn't happen at some later point in time (i.e. 2 seconds later, 10s later or 1 minute later) -- it gets sent to all DCs at the same time without delay regardless of whether you have a 1Mbps or 10Gbps link between DCs.
We also recommend a minimum of 3 replicas in each DC in production as well as use LOCAL_QUORUM for both reads and writes. There are very limited edge cases where these recommendations do not apply.
The spark-cassandra-connector requires all contacts points to belong to the same DC so:
analytics workloads do not impact the performance of OLTP DCs (as you already pointed out), and
it can achieve data-locality for optimal performance where possible.
It is said consistency level N defines number of replicas needed to acknowledge every read and write operation.The bigger that number, the more consistent result we have.
If we define that parameter as N (N < M/2), where M is cluster size, does it mean the following situation is possible :
1 data center. two concurrent writes happened successfully(they updated the same key with different values)?
And consequently, two subsequent concurrent reads return different values for the same key? Am i correct?
Yes, we can tune consistency based on requirements for read and writes. Quorum is recommended consistency level for Cassandra for single DC. we can calculate from below Quorum =N/2+1 where N is number of replica. Consistency we can set from below command
CONSISTENCY [level]
For more details on tunable consistency please refer below.
https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-1-7aee6b472fb4
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutDataConsistency.html
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshConsistency.html
In Cassandra it is highly possible that different client applications are updating value of same key on different nodes. You can restrict this always by tuning your consistency level.
Consistency Level always depends on Replication Factor decided by you.
If RF=3 from 5 nodes DC, then Consistency level QUORUM or LOCAL_QUORUM means 2 nodes out of 3 which are having replica.
Any of the below combination should give you correct data, after tuning:
WRITE=ALL READ=ONE
WRITE=ONE READ=ALL
WRITE=LOCAL_QUORUM READ=LOCAL_QUORUM
You can tune consistency level in your application, as per load of the application.
According to me, Number 3 LOCAL_QUORUM should work better, As sometimes a node can be under high load or maybe is down. Your application will not get affected.
In case, you have more writes than READ; WRITE CL=ALL will make your application slow.
I know that Cassandra have different read consistency levels but I haven't seen a consistency level which allows as read data by key only from one node. I mean if we have a cluster with a replication factor of 3 then we will always ask all nodes when we read. Even if we choose a consistency level of one we will ask all nodes but wait for the first response from any node. That is why we will load not only one node when we read but 3 (4 with a coordinator node). I think we can't really improve a read performance even if we set a bigger replication factor.
Is it possible to read really only from a single node?
Are you using a Token-Aware Load Balancing Policy?
If you are, and you are querying with a consistency of LOCAL_ONE/ONE, a read query should only contact a single node.
Give the article Ideology and Testing of a Resilient Driver a read. In it, you'll notice that using the TokenAwarePolicy has this effect:
"For cases with a single datacenter, the TokenAwarePolicy chooses the primary replica to be the chosen coordinator in hopes of cutting down latency by avoiding the typical coordinator-replica hop."
So here's what happens. Let's say that I have a table for keeping track of Kerbalnauts, and I want to get all data for "Bill." I would use a query like this:
SELECT * FROM kerbalnauts WHERE name='Bill';
The driver hashes my partition key value (name) to the token of 4639906948852899531 (SELECT token(name) FROM kerbalnauts WHERE name='Bill'; returns that value). If I am working with a 6-node cluster, then my primary token ranges will look like this:
node start range end range
1) 9223372036854775808 to -9223372036854775808
2) -9223372036854775807 to -5534023222112865485
3) -5534023222112865484 to -1844674407370955162
4) -1844674407370955161 to 1844674407370955161
5) 1844674407370955162 to 5534023222112865484
6) 5534023222112865485 to 9223372036854775807
As node 5 is responsible for the token range containing the partition key "Bill," my query will be sent to node 5. As I am reading at a consistency of LOCAL_ONE, there will be no need for another node to be contacted, and the result will be returned to the client...having only hit a single node.
Note: Token ranges computed with:
python -c'print [str(((2**64 /5) * i) - 2**63) for i in range(6)]'
I mean if we have a cluster with a replication factor of 3 then we will always ask all nodes when we read
Wrong, with Consistency Level ONE the coordinator picks the fastest node (the one with lowest latency) to ask for data.
How does it know which replica is the fastest ? By keeping internal latency stats for each node.
With consistency level >= QUORUM, the coordinator will ask for data from the fastest node and also asks for digest from other replicas
From the client side, if you choose the appropriate load balancing strategy (e.g. TokenAwareStrategy) the client will always contact the primary replica when using consistency level ONE
I'm having trouble understanding / finding information about how various quorums are calculated in cassandra.
Let's say I have a 16 node cluster using Network Topology Strategy across 2 data centers. The replication factor is 2 in each datacenter (DC1: 2, DC2: 2).
In this example, if I write using a LOCAL_QUORUM, I will write the data to 4 nodes (2 in each data center) but when will the acknowledgement happen? After 2 nodes in 1 data center are written?
In addition, to maintain strong read consistency, I need Write nodes + read nodes > replication factor. In the above example, if both reads and writes were LOCAL_QUORUM, I would have 2 + 2 which would not guarantee strong read consistency. Am I understanding this correctly? What level would I need then to ensure strong read consistency?
The goal here is to ensure that if a data center fails, reads/writes can continue while minimizing latency.
The write will be successful after the coordinator received acknowledgement from 2 nodes from the same DC of the coordinator.
Using LOCAL_QUORUM for both reads and write will get you strong consistency, provided the same DC will be used for both reads and write, and just for this DC.
The previous answer is correct: "The write will be successful after the coordinator received acknowledgement from 2 nodes from the same DC of the coordinator." It is the same for reads.
The Quorum is always calculated by N/2+1 (N being the replication factor), having a local_quorum avoids the latency of the other data center.
As far as I understand, with a RF of 2 and LOCAL_QUORUM you have better local consistency but no availability in case of partition: if one single node drops, all writes and reads will fail for the range tokens of that node and its replica.
Therefore I recommend a RF of 3 if you intend to use Quorum. For 2 replica you should better use ONE.
client will get WRITE or READ acknowledgement from the corrdinator node once LOCAL_QUORUM complete its work in any one data center.
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html
If the write consistency level is LOCAL_ONE or LOCAL_QUORUM, only the nodes in the same datacenter as the coordinator node must respond to the client request in order for the request to succeed.
Use either LOCAL_ONE or LOCAL_QUORUM to reduce geographical latency lessen the impact on client write request response times.
The below statement from Cassandra documentation is the reason for my doubt.
For example, if using a write consistency level of QUORUM with a replication factor of 3, Cassandra will replicate the write to all nodes in the cluster and wait for acknowledgement from two nodes. If the write fails on one of the nodes but succeeds on the other, Cassandra reports a failure to replicate the write on that node. However, the replicated write that succeeds on the other node is not automatically rolled back.
Ref : http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_atomicity_c.html
So does Cassandra write to a node(which is up) even if Consistency cannot be met ?
I got it. Cassandra will not even attempt to write if it knows that consistency cannot be met. If consistency CAN be met, but does not have enough replicas to satisfy replication factor, then Cassandra would write to currently available replicas and gives a success message. Later when the replica is up again, it will write to other replica.
For e.g. If Replication factor is 3 , 1 of 3 nodes are down, then if I write with a Consistency of 2, the write will succeed. But if Replication factor is 2 and 1 of 2 nodes are down , then if I write with a Consistency of 2, Cassandra will not even write to that single node which is available.
What is mentioned in the documentation is a case where while write was initiated when the consistency can be met. But in between, one node went down and couldn't complete the write, whereas write succeeded in other node. Since consistency cannot be met, client would get a failure message. The record which was written to a single node would be removed later during node repair or compaction.
Consistency in Cassandra can (is?) be defined at statement level. That means you specify on a particular query, what level of consistency you need.
This will imply that if the consistency level is not met, the statement above has not met consistency requirements.
There is no rollback in Cassandra. What you have in Cassandra is Eventual consistency. That means your statement might be a success in future if not immediately. When a replica node comes a live, the cluster (aka the Cassandra's fault tolerance) will take care of writing to the replica node.
So, if your statement is failed, it might be succeeded in future. This is in contrary to the RDBMS world, where an uncommitted transaction is rolled back as if nothing has happened.
Update:
I stand corrected. Thanks Arun.
From:
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html
During a write operation, when hinted handoff is enabled and consistency can be met, the coordinator stores a hint about dead replicas in the local system.hints table under either of these conditions:
So it's still not rollback. Nodes know the current cluster state and doesn't initiate the write if consistency cannot be met.
At driver level, you get an exception.
On the nodes that the write succeeded, the data is actually written and it is going to be eventually rolled back.
In a normal situation, you can consider that the data was not written to any of the nodes.
From the documentation:
If the write fails on one of the nodes but succeeds on the other,
Cassandra reports a failure to replicate the write on that node.
However, the replicated write that succeeds on the other node is not
automatically rolled back.