Which part of the CAP theorem does Cassandra sacrifice and why? - cassandra

There is a great talk here about simulating partition issues in Cassandra with Kingsby's Jesper library.
My question is - with Cassandra are you mainly concerned with the Partitioning part of the CAP theorem, or is Consistency a factor you need to manage as well?

Cassandra is typically classified as an AP system, meaning that availability and partition tolerance are generally considered to be more important than consistency. However, real world systems rarely fall neatly into these categories, so it's more helpful to view CAP as a continuum. Most systems will make some effort to be consistent, available, and partition tolerant, and many (including Cassandra) can be tuned depending on what's most important. Turning knobs like replication factor and consistency level can have a dramatic impact on C, A, and P.
Even defining what the terms mean can be challenging, as various use cases have different requirements for each. So rather than classify a system as CP, AP, or whatever, it's more helpful to think in terms of the options it provides for tuning these properties as appropriate for the use case.
Here's an interesting discussion on how things have changed in the years since the CAP theorem was first introduced.

CAP stands for Consistency, Availability and Partition Tolerance.
In general, its impossible for a distributed system to guarantee above three at a given point.
Apache Cassandra falls under AP system meaning Cassandra holds true for Availability and Partition Tolerance but not for Consistency but this can further tuned via replication factor(how many copies of data) and consistency level (read and write).
For more info: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html

Interestingly it depends on your Cassandra configuration. Cassandra can at max be AP system. But if you configure it to read or write based on Quorum then it does not remain CAP-available (available as per definition of the CAP theorem) and is only P system.
Just to explain things in more detail CAP theorem means:
C: (Linearizability or strong consistency) roughly means
If operation B started after operation A successfully completed, then
operation B must see the system in the same state as it was on
completion of operation A, or a newer state (but never older state).
A:
“every request received by a non-failing [database] node in the system must result in a [non-error] response”. It’s not sufficient for some node to be able to handle the request: any non-failing node needs to be able to handle it. Many so-called “highly available” (i.e. low downtime) systems actually do not meet this definition of availability.
P
Partition Tolerance (terribly misnamed) basically means that you’re communicating over an asynchronous network that may delay or drop messages. The internet and all our data centres have this property, so you don’t really have any choice in this matter.
Source: Awesome Martin kleppmann's work

The CAP theorem states that a database can’t simultaneously guarantee consistency, availability, and partition tolerance
Since network partitions are part of life, distributed databases tend to be either CP or AP
Cassandara was meant for AP but you can fine tune consistency at the cost of availability.
Availability : It was ensured with replicas. Cassandra typically writes multiple copies to different cluster nodes (generally 3). If one node is unavailable, data won't be lost.
Writing data to multiple nodes will take time because nodes are scattered in different location. At some point of time, data will become eventually consistent.
So with high availability preference, consistency is compramised.
Tunable consistency:
For read or write operation, you can mention consistency level. Consistency level refers to the number of replicas that need to respond for a read or write operation to be considered complete.
For non-critical features, you can provide less consistency level : say 1.
If you think consistency is important, you can increase the level to TWO, THREE or QUORAM ( A majority of replicas)
Assume that you set the consistency level to high (QUORAM) for your critical features and majority of the nodes are down. In this case, write operation will fail.
Here Cassandra sacrificies availabiltiy for consistency.
Have a look at this article for more details.

Related

What is the main difference between the new PNCOUNTER and IAtomicLong in Hazelcast?

I have issues understanding the new features of hazelcast 5.0 Which is the main difference between those data structures? because pncounter is a counter that replicates data and when there is no more updates they combine together, I want to understand how does PNCounter of hazelcast controls the concurrency on hazelcast.
The replicant count max value it's about the max nodes you have running on hazelcast?
How does work internally? because I need to understand how this works, I'm working with a counter that counts the activity of several clients, we create like 1000 or even more pncounters for different activities because I don't know if one pn counter would work.
Does the client know which counter need to connect or does the counter follow a certain logic flow? I don't understand this feature, I really want to know the difference between pncounter and atomiclong.
for me it's like atomiclong which the feature that it can replicate.
It all boils down to the CAP Theorem.
In summary, out of Consistency, Availability & Partition-Resistance, you can pick 2 out of 3. And since Hazelcast is distributed by nature, your choice is between Consistency and Availability.
IAtomicLong is a member of CP Subsystem API
-- https://docs.hazelcast.com/imdg/4.2/data-structures/iatomiclong
A Conflict-free Replicated Data Type (CRDT) is a distributed data structure that achieves high availability by relaxing consistency constraints. There may be several replicas for the same data and these replicas can be modified concurrently without coordination. This means that you may achieve high throughput and low latency when updating a CRDT data structure. On the other hand, all of the updates are replicated asynchronously.
-- https://docs.hazelcast.com/imdg/4.2/data-structures/pn-counter
In summary, IAtomicLong sacrifices Availability for Consistency. The result will always be correct, but it might not always be available.
PNCounter makes the opposite trade-off. It's always available (depending on the number of nodes of the cluster, of course) but it's eventually consistent as it's asynchronously replicated.

How can Cassandra tuned to be CA in CAP theorem?

I know The CAP theorem:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
Cassandra is typically classified as an AP system, I heard yes it can turned to CA, but I didn't find the documentation.
How to use CA Cassandra ?
Thanks.
Generally speaking, the 'P' in CAP is what NoSQL technologies were built to solve for. This is usually accomplished by spreading data horizontally across multiple instances.
Therefore, if you wanted Cassandra to run in a "CA" CAP configuration, running it as a single node cluster would be a good first step.
I heard yes it can turned to CA, but I didn't find the documentation.
After re-reading this, it's possible that you may have confused "CA" with "CP."
It is possible to run Cassandra as a "CP" database, or at least tune it to behave more in that regard. The way to go about this, would be to set queries on the application side to use the higher levels of consistency, like [LOCAL_]QUORUM, EACH_QUORUM, or even ALL. Consistency could be tuned even higher, by increasing the replication factor (RF) in each keyspace definition. Setting RF equal to number of nodes and querying at ALL consistency would be about as high as it could be tuned to be consistent.
However, I feel compelled to mention at what a terrible, terrible idea this all is. Cassandra was engineered to be "AP." Fighting that intrinsic design is a fool's errand. I've always said, nobody wins when you try to out-Cassandra Cassandra.
If you're employing engineering time to make a datastore function in ways that are contrary to its design, then a different datastore (one you don't have to work against) might be the better choice.

maintaining dynamic consistency level in datastax

I have a 5 node cluster and keyspace with replication factor of 3. The nature of operations are such that writes are much more important than read, but frequency of read operations are about 10 times higher than write. To achieve consistency while improving overall performance, I chose to set consistency level for writes as ALL, and ONE for read. But this causes operations to fail if even one node is down.
Is there a method by which I can simultaneously change consistency level for (Write,Read) from (ALL,ONE) to (QUORUM, QUORUM) if one node is detected down, or if there is a query-execution-exception; plus this be done in a manner that no operations pass through a temporary phase where it sees a temporary (QUORUM, ONE) setting.
We also plan to expand to twice the capacity, 3 datacenter with 4 nodes each. Is it possible to define custom consistency levels, like, (a level of ALL in any one datacenter and ONE in others). I'm thinking that a level of (EACH_ONE) for read, coupled with above level for write will insure consistency but will allow the cluster to remain available even if a node goes down.
The flexibility is there since you can set your consistency level at a per request basis. Depending on the client you are using, there are some nice capabilities. For example, the java driver has something called a DowngradingConsistencyRetryPolicy such that if a request fails, it will be retried with the next lowest consistency level until the request succeeds. This pushes the complexity of retrying into the client so you don't have to write a bunch of code for it, it's really nice!
The java driver also allows you to configure consistency level per request with Statement#setConsistencyLevel()
As far as custom consistency levels, this is not an option available to you (without changing the cassandra source code), however I think what is made available should be sufficient.
For reads, I don't find much value in ensuring consistency between Data Centers on read. I think LOCAL_QUORUM is more than sufficient, but if you really care, you can use something like EACH_QUORUM for to ensure all datacenters agree, but that will severely impact your response time and availability. For example, if one of your datacenters goes down completely, you won't be able to do reads at all (unless downgrading).
For writes, I'd strongly recommend not using ALL in a multi datacenter set up if you care about response time and availability. Depending on your requirements, LOCAL_QUORUM should likely be more than sufficient.
While one of the benefits of Cassandra is that consistency is tunable, you can have as much strong consistency as you like, but keep in mind that Cassandra is at its best as a Highly Available, Partition Tolerant system.
A really good presentation on consistency that I think really nails a lot of these points is Christos Kalazantis' talk 'Eventual Consistency != Hopeful Consistency' which suggests that a consistency level of ONE is sufficient for a lot of use cases.

When would Cassandra not provide C, A, and P with W/R set to QUORUM?

When both read and write are set to quorum, I can be guaranteed the client will always get the latest value when reading.
I realize this may be a novice question, but I'm not understanding how this setup doesn't provide consistency, availability, and partitioning.
With a quorum, you are unavailable (i.e. won't accept reads or writes) if there aren't enough replicas available. You can choose to relax and read / write on lower consistency levels granting you availability, but then you won't be consistent.
There's also the case where a quorum on reads and writes guarantees you the latest "written" data is retrieved. However, if a coordinator doesn't know about required partitions being down (i.e. gossip hasn't propagated after 2 of 3 nodes fail), it will issue a write to 3 replicas [assuming quorum consistency on a replication factor of 3.] The one live node will write, and the other 2 won't (they're down). The write times out (it doesn't fail). A write timeout where even one node has writte IS NOT a write failure. It's a write "in progress". Let's say the down nodes come up now. If a client next requests that data with quorum consistency, one of two things happen:
Request goes to one of the two downed nodes, and to the "was live" node. Client gets latest data, read repair triggers, all is good.
Request goes to the two nodes that were down. OLD data is returned (assuming repair hasn't happened). Coordinator gets digest from third, read repair kicks in. This is when the original write is considered "complete" and subsequent reads will get the fresh data. All is good, but one client will have received the old data as the write was "in progress" but not "complete". There is a very small rare scenario where this would happen. One thing to note is that write to cassandra are upserts on keys. So usually retries are ok to get around this problem, however in case nodes genuinely go down, the initial read may be a problem.
Typically you balance your consistency and availability requirements. That's where the term tunable consistency comes from.
Said that on the web it's full of links that disprove (or at least try to) the Brewer's CAP theorem ... from the theorem's point of view the C say that
all nodes see the same data at the same time
Which is quite different from the guarantee that a client will always retrieve fresh information. Strictly following the theorem, in your situation, the C it's not respected.
The DataStax documentation contains a section on Configuring Data Consistency. In looking through all of the available consistency configurations, For QUORUM it states:
Returns the record with the most recent timestamp after a quorum of replicas has responded regardless of data center. Ensures strong consistency if you can tolerate some level of failure.
Note that last part "tolerate some level of failure." Right there it's indicating that by using QUORUM consistency you are sacrificing availability (A).
The document referenced above also further defines the QUORUM level, stating that your replication factor comes into play as well:
If consistency is top priority, you can ensure that a read always
reflects the most recent write by using the following formula:
(nodes_written + nodes_read) > replication_factor
For example, if your application is using the QUORUM consistency level
for both write and read operations and you are using a replication
factor of 3, then this ensures that 2 nodes are always written and 2
nodes are always read. The combination of nodes written and read (4)
being greater than the replication factor (3) ensures strong read
consistency.
In the end, it all depends on your application requirements. If your application needs to be highly-available, ONE is probably your best choice. On the other hand, if you need strong-consistency, then QUORUM (or even ALL) would be the better option.

How does increasing the Cassandra Replication Factor give more consistency

I was reading "Cassandra The Definitive Guide" and page 46 has this to say about Replication Factor:
"The replication factor essentially allows you to decide how much you
want to pay in performance to gain more consistency. That is, your
consistency level for reading and writing data is based on the
replication factor"
Now to me that's news. If replication is increased, it is kind of intuitive that it improves availability and depending on topology of the cluster its partition tolerance as well. However why does the author say that it increases consistency. I will think its quite the opposite. You have to take extra effort to ensure consistent state of your persistent data by propagating updates to every replica on different nodes. So more the replica, harder it is to maintain consistency. Why does the author say the exact opposite?
All inputs appreciated.
The consistency level specifies how many replicas must respond before a result is returned. See the documentation.
So, if you're using a consistency level of Quorum or higher, the higher the replication factor, the more nodes need to respond before a result can be returned.
Replication factor describes how many copies of your data exist. Consistency level describes the behavior seen by the client. Perhaps there's a better way to categorize these.
As an example, you can have a replication factor of 2. When you write, two copies will always be stored, assuming enough nodes are up. When a node is down, writes for that node are stashed away and written when it comes back up, unless it's down long enough that Cassandra decides it's gone for good.
For example with 2 nodes, a replication factor of 1, read consistency = 1, and write consistency = 1:
Your reads are consistent
You can survive the loss of no nodes.
You are really reading from 1 node every time.
You are really writing to 1 node every time.
Each node holds 50% of your data.
For More Info: Configuring data consistency

Resources