What is correct understandanding of Cassandra tunable consistency? - cassandra

It is said consistency level N defines number of replicas needed to acknowledge every read and write operation.The bigger that number, the more consistent result we have.
If we define that parameter as N (N < M/2), where M is cluster size, does it mean the following situation is possible :
1 data center. two concurrent writes happened successfully(they updated the same key with different values)?
And consequently, two subsequent concurrent reads return different values for the same key? Am i correct?

Yes, we can tune consistency based on requirements for read and writes. Quorum is recommended consistency level for Cassandra for single DC. we can calculate from below Quorum =N/2+1 where N is number of replica. Consistency we can set from below command
CONSISTENCY [level]
For more details on tunable consistency please refer below.
https://medium.com/dugglabs/data-consistency-in-apache-cassandra-part-1-7aee6b472fb4
https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutDataConsistency.html
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshConsistency.html

In Cassandra it is highly possible that different client applications are updating value of same key on different nodes. You can restrict this always by tuning your consistency level.
Consistency Level always depends on Replication Factor decided by you.
If RF=3 from 5 nodes DC, then Consistency level QUORUM or LOCAL_QUORUM means 2 nodes out of 3 which are having replica.
Any of the below combination should give you correct data, after tuning:
WRITE=ALL READ=ONE
WRITE=ONE READ=ALL
WRITE=LOCAL_QUORUM READ=LOCAL_QUORUM
You can tune consistency level in your application, as per load of the application.
According to me, Number 3 LOCAL_QUORUM should work better, As sometimes a node can be under high load or maybe is down. Your application will not get affected.
In case, you have more writes than READ; WRITE CL=ALL will make your application slow.

Related

Consistency and timeout issues with Cassandra 2.2

I'm using Cassandra 2.2 and I've an application that requires a high level of consistency.
I've configured one datacenter cluster with 3 nodes.
My keyspace is created with replication_factor of 2.
In each configuration.yaml files I've set 2 seed_providers (for example NODE_1 and NODE_3).
The important thing is that my app should be full-functional even if one node is down.
Currently I've some issues with the consistency and timeout when my app contacts the cluster.
I've read the whole Cassandra 2.2 documentation and I concluded that the best CONSISTENCY LEVEL for my write operations should be QUORUM and for my read operations ONE, but I still have some consistency issues.
First of all, is it the right choice to have a strong level of consistency?
And also, are UPDATE and DELETE operations considered as write or read operations, since for example an update operation with a WHERE clause still has to 'read' data? I'm not sure, spacially in the context of the cassandra' write workflow.
My second issue is the timeout during the write operations. A simple and lightweight INSERT sometimes get "Cassandra timeout during write query at consistency QUORUM (2 replicas were required but only 1 acknowledged the write)"
or sometines even "... 0 acknoledged" even though all of my 3 nodes are UP.
Are there some other parameters that I should check, like for example write_request_timeout_in_ms, with default value of 2000 ms (which is already a high value)?
You will have strong consistency with Replication Factor = 2 and Consistency Level = QUORUM for write operations and ONE for read operations. But write operations will fail if one node is down. Consistency Level = QUORUM is the same as ALL in case Replication Factor = 2.
You should use Replication Factor = 3 and Consistency Level = QUORUM for both write and read operations, to have strong consistency and full functional app even if one node is down.
DELETE and UPDATE operations are write operations.
For the timeout issue please provide table model and queries that fails.
Updated
Consistency level applies to replicas, not nodes.
Replication factor = 2 means that 2 of 3 nodes will contain data. These nodes will be replicas.
QUORUM means that a write operation must be acknowledged by 2 replicas (when replication factor=2), not nodes.
Cassandra places the data on each node according to the partition key. Each node is responsible for a range of partition keys. Not any node can store any data, so you need have alive replicas (not nodes) to perform operations. Here article about data replication and distribution.
When you perform QUORUM write request to cluster with 2 of 3 alive nodes, there is a chance that the cluster has only 1 alive replica for the partition key, in this case the write request will fail.
In additional: here is a simple calculator for Cassandra parameters

Cassandra LOCAL_QUORUM

I'm having trouble understanding / finding information about how various quorums are calculated in cassandra.
Let's say I have a 16 node cluster using Network Topology Strategy across 2 data centers. The replication factor is 2 in each datacenter (DC1: 2, DC2: 2).
In this example, if I write using a LOCAL_QUORUM, I will write the data to 4 nodes (2 in each data center) but when will the acknowledgement happen? After 2 nodes in 1 data center are written?
In addition, to maintain strong read consistency, I need Write nodes + read nodes > replication factor. In the above example, if both reads and writes were LOCAL_QUORUM, I would have 2 + 2 which would not guarantee strong read consistency. Am I understanding this correctly? What level would I need then to ensure strong read consistency?
The goal here is to ensure that if a data center fails, reads/writes can continue while minimizing latency.
The write will be successful after the coordinator received acknowledgement from 2 nodes from the same DC of the coordinator.
Using LOCAL_QUORUM for both reads and write will get you strong consistency, provided the same DC will be used for both reads and write, and just for this DC.
The previous answer is correct: "The write will be successful after the coordinator received acknowledgement from 2 nodes from the same DC of the coordinator." It is the same for reads.
The Quorum is always calculated by N/2+1 (N being the replication factor), having a local_quorum avoids the latency of the other data center.
As far as I understand, with a RF of 2 and LOCAL_QUORUM you have better local consistency but no availability in case of partition: if one single node drops, all writes and reads will fail for the range tokens of that node and its replica.
Therefore I recommend a RF of 3 if you intend to use Quorum. For 2 replica you should better use ONE.
client will get WRITE or READ acknowledgement from the corrdinator node once LOCAL_QUORUM complete its work in any one data center.
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html
If the write consistency level is LOCAL_ONE or LOCAL_QUORUM, only the nodes in the same datacenter as the coordinator node must respond to the client request in order for the request to succeed.
Use either LOCAL_ONE or LOCAL_QUORUM to reduce geographical latency lessen the impact on client write request response times.

How does Cassandra partitioning work when replication factor == cluster size?

Background:
I'm new to Cassandra and still trying to wrap my mind around the internal workings.
I'm thinking of using Cassandra in an application that will only ever have a limited number of nodes (less than 10, most commonly 3). Ideally each node in my cluster would have a complete copy of all of the application data. So, I'm considering setting replication factor to cluster size. When additional nodes are added, I would alter the keyspace to increment the replication factor setting (nodetool repair to ensure that it gets the necessary data).
I would be using the NetworkTopologyStrategy for replication to take advantage of knowledge about datacenters.
In this situation, how does partitioning actually work? I've read about a combination of nodes and partition keys forming a ring in Cassandra. If all of my nodes are "responsible" for each piece of data regardless of the hash value calculated by the partitioner, do I just have a ring of one partition key?
Are there tremendous downfalls to this type of Cassandra deployment? I'm guessing there would be lots of asynchronous replication going on in the background as data was propagated to every node, but this is one of the design goals so I'm okay with it.
The consistency level on reads would probably generally be "one" or "local_one".
The consistency level on writes would generally be "two".
Actual questions to answer:
Is replication factor == cluster size a common (or even a reasonable) deployment strategy aside from the obvious case of a cluster of one?
Do I actually have a ring of one partition where all possible values generated by the partitioner go to the one partition?
Is each node considered "responsible" for every row of data?
If I were to use a write consistency of "one" does Cassandra always write the data to the node contacted by the client?
Are there other downfalls to this strategy that I don't know about?
Do I actually have a ring of one partition where all possible values
generated by the partitioner go to the one partition?
Is each node considered "responsible" for every row of data?
If all of my nodes are "responsible" for each piece of data regardless
of the hash value calculated by the partitioner, do I just have a ring
of one partition key?
Not exactly, C* nodes still have token ranges and c* still assigns a primary replica to the "responsible" node. But all nodes will also have a replica with RF = N (where N is number of nodes). So in essence the implication is the same as what you described.
Are there tremendous downfalls to this type of Cassandra deployment?
Are there other downfalls to this strategy that I don't know about?
Not that I can think of, I guess you might be more susceptible than average to inconsistent data so use C*'s anti-entropy mechanisms to counter this (repair, read repair, hinted handoff).
Consistency level quorum or all would start to get expensive but I see you don't intend to use them.
Is replication factor == cluster size a common (or even a reasonable)
deployment strategy aside from the obvious case of a cluster of one?
It's not common, I guess you are looking for super high availability and all your data fits on one box. I don't think I've ever seen a c* deployment with RF > 5. Far and wide RF = 3.
If I were to use a write consistency of "one" does Cassandra always
write the data to the node contacted by the client?
This depends on your load balancing policies at the driver. Often we select token aware policies (assuming you're using one of the Datastax drivers), in which case requests are routed to the primary replica automatically. You could use round robin in your case and have the same effect.
The primary downfall will be increased write costs at the coordinator level as you add nodes. The maximum number of replicas written to I've seen is around 8 (5 for other data centers and 3 for local replicas).
In practice this will mean a reduced stability while performing large or batched writes (greater than 1mb) or a lower per node write TPS.
The primary advantage is you can do a lot of things that'd normally be awful and impossible to do. Want to use secondary indexes? probably will work reasonably well (assuming cardinality and partition size doesn't become your bottleneck there). Want to add a custom UDF that does GroupBy or use very large IN queries it'll probably work.
It is as #Phact mentions not a common usage pattern and I primarily saw it used with DSE Search on low write throughput use cases that had requirements for 'single node' features from Solr, but for those same use cases with pure Cassandra you'd get some benefits on the read side and be able to do expensive queries that are normally impossible in a more distributed cluster.

How does increasing the Cassandra Replication Factor give more consistency

I was reading "Cassandra The Definitive Guide" and page 46 has this to say about Replication Factor:
"The replication factor essentially allows you to decide how much you
want to pay in performance to gain more consistency. That is, your
consistency level for reading and writing data is based on the
replication factor"
Now to me that's news. If replication is increased, it is kind of intuitive that it improves availability and depending on topology of the cluster its partition tolerance as well. However why does the author say that it increases consistency. I will think its quite the opposite. You have to take extra effort to ensure consistent state of your persistent data by propagating updates to every replica on different nodes. So more the replica, harder it is to maintain consistency. Why does the author say the exact opposite?
All inputs appreciated.
The consistency level specifies how many replicas must respond before a result is returned. See the documentation.
So, if you're using a consistency level of Quorum or higher, the higher the replication factor, the more nodes need to respond before a result can be returned.
Replication factor describes how many copies of your data exist. Consistency level describes the behavior seen by the client. Perhaps there's a better way to categorize these.
As an example, you can have a replication factor of 2. When you write, two copies will always be stored, assuming enough nodes are up. When a node is down, writes for that node are stashed away and written when it comes back up, unless it's down long enough that Cassandra decides it's gone for good.
For example with 2 nodes, a replication factor of 1, read consistency = 1, and write consistency = 1:
Your reads are consistent
You can survive the loss of no nodes.
You are really reading from 1 node every time.
You are really writing to 1 node every time.
Each node holds 50% of your data.
For More Info: Configuring data consistency

Consistency strategy for two datacenters

What is the best write/read strategy that is fault tolerant and fast for reads when all nodes are up?
I have 2 replicas in each datacenter and at first I was considering using QUORUM for writes and LOCAL_QUORUM for reads but reads would fail if one node crashes.
The other strategy that I came up with is to use QUORUM for writes and TWO for reads. It should work fast in normal conditions (because we will get results from the nearest nodes first) and it will work slower when any node crashes.
Is this a situation where it is recommended to use consistency level TWO or it is for some other purpose?
When would you use CL THREE?
Do you have a better strategy for consistent and fault tolerant writes/reads?
You first have to chose if you want consistency or availability. If you chose consistency, then you need to have R + W > N, where R is how many nodes you read from, W is how many nodes you write to, and N is the number of replicas.
Then you have to chose if you want reads/writes to always span multiple data centers.
Once you make those choices, you can then chose you consistency level (or it will be dictated to you).
If, for example, you decide you need consistency, and you don't want writes/reads to span multiple data centers, then you can read at LOCAL_QUORUM (which is 2 in your case) and write at ONE, or vice versa.
2 copies per dc is an odd choice. Typically you want to do LOCAL_QUORUM, with 3 replicas in each data center. That lets you read and write only using nodes within a datacenter, but allows 1 node to go down.

Resources