How is Cassandra not consistent but HBase is? - cassandra

While going through the reading materials of Cassandra and HBase I found that Cassandra is not consistent but HBase is. Didn't find any proper reading materials for the same.
Could anybody provide any blogs/articles on this topic?

Cassandra is consistent, eventually. Based in Brewer's theorem (also known as CAP theorem), distributed data systems can only guarantee to achieve 2 of the following 3 characteristics:
Consistency.
Availability.
Partition tolerance.
What this means is that Cassandra, in its default configuration, can guarantee to be available and partition tolerant, and there may be a delay before achieving consistency. But this is configurable, as you can increase the consistency levels for any query, sacrificing partition tolerance.
There are multiple resources in the web, you should look up for "eventual consistency in Cassandra", you can start with Ed Capriolo's talk, or this post in quora

Actually, since version 1.1 HBase has two consistency models:
Consistency.STRONG is the default consistency model provided by HBase.
In case the table has region replication = 1, or in a table with
region replicas but the reads are done with this consistency, the read
is always performed by the primary regions, so that there will not be
any change from the previous behaviour, and the client always observes
the latest data.
In case a read is performed with Consistency.TIMELINE, then the read
RPC will be sent to the primary region server first. After a short
interval (hbase.client.primaryCallTimeout.get, 10ms by default),
parallel RPC for secondary region replicas will also be sent if the
primary does not respond back...
In other words, strong consistency is achieved by allowing reads only against replica that does the writing, while timeline consistent (Ref. Guide makes it a point to differentiate timeline vs eventual consistency) behavior provides highly available reads with low-latency at the expense of a small chance of reading stale data.

Related

Why cassandra is considered as partition tolerant by CAP theorem despite we can isolate the coordinator?

Here is the definition of partition tolerance by Gilbert and Lynch
When a network is partitioned, all messages sent from nodes in one
component of the partition to nodes in another component are lost.
Let's divide the cluster into two partitions: the first one contains only the coordinator, the second one contains all other nodes. This way coordinator will not be able to contact any replicas and will respond with error. Is it allowed for partition tolerant systems?
More specifically I think the question is which of the other two CAP attributes does Cassandra retain in the face of such a Partition.
The answer is dependent on the configured consistency level. For writes there is the ANY consistency level. At this consistency level, so long as hinted-handoffs are enabled, the coordinator will record the write and maintain Availability. Clients connected to other coordinators will not be able to see the udpated value until the partition is resolved, so reads will not be Consistent. If a stronger consistency level is chosen, then the client is explicitly configuring Consistency over Availability.
So can Cassandra (given that it does not necessarily replicate all data to all nodes) be considered AP when a read coordinator is alone in a partition? If it responds with an error that sounds like Consistency to me, if it responds with an empty result set because the data is not in its partition, then that would be Availability. Since the weakest read consistency level is ONE - requiring at least one replica to respond, Cassandra opts for the former: If the coordinator is not itself one of the replicas owning the requested data then the read will time out and not be Available. As with writes, any stronger read consistency level explicitly configures Cassandra to behave more Consistently at the expense of Availability.
So the "coordinator" node isn't a long-lasting or "leader"-like definition. It changes with practically every query. If there was a non-token-aware operation which needed a coordinator node, and that coordinator was suddenly partitioned-off from the rest, then that one query would fail.
The next query (or a retry) would pick a new node as a coordinator. The only issue, would be that some data rows will be short by one replica (data stored on the partitioned node). But as long as you're querying by ONE and have a RF >= 2, the cluster will continue on like nothing happened.
So "yes," Cassandra is definitely partition-tolerant.
Note: This is why it's important to use a token-aware load balancing policy. That way the driver picks one of the nodes containing the required data as the "coordinator." At consistency ONE, the operation is completed locally, and a network hop is taken out of the equation.

How does Cassandra partitioning work when replication factor == cluster size?

Background:
I'm new to Cassandra and still trying to wrap my mind around the internal workings.
I'm thinking of using Cassandra in an application that will only ever have a limited number of nodes (less than 10, most commonly 3). Ideally each node in my cluster would have a complete copy of all of the application data. So, I'm considering setting replication factor to cluster size. When additional nodes are added, I would alter the keyspace to increment the replication factor setting (nodetool repair to ensure that it gets the necessary data).
I would be using the NetworkTopologyStrategy for replication to take advantage of knowledge about datacenters.
In this situation, how does partitioning actually work? I've read about a combination of nodes and partition keys forming a ring in Cassandra. If all of my nodes are "responsible" for each piece of data regardless of the hash value calculated by the partitioner, do I just have a ring of one partition key?
Are there tremendous downfalls to this type of Cassandra deployment? I'm guessing there would be lots of asynchronous replication going on in the background as data was propagated to every node, but this is one of the design goals so I'm okay with it.
The consistency level on reads would probably generally be "one" or "local_one".
The consistency level on writes would generally be "two".
Actual questions to answer:
Is replication factor == cluster size a common (or even a reasonable) deployment strategy aside from the obvious case of a cluster of one?
Do I actually have a ring of one partition where all possible values generated by the partitioner go to the one partition?
Is each node considered "responsible" for every row of data?
If I were to use a write consistency of "one" does Cassandra always write the data to the node contacted by the client?
Are there other downfalls to this strategy that I don't know about?
Do I actually have a ring of one partition where all possible values
generated by the partitioner go to the one partition?
Is each node considered "responsible" for every row of data?
If all of my nodes are "responsible" for each piece of data regardless
of the hash value calculated by the partitioner, do I just have a ring
of one partition key?
Not exactly, C* nodes still have token ranges and c* still assigns a primary replica to the "responsible" node. But all nodes will also have a replica with RF = N (where N is number of nodes). So in essence the implication is the same as what you described.
Are there tremendous downfalls to this type of Cassandra deployment?
Are there other downfalls to this strategy that I don't know about?
Not that I can think of, I guess you might be more susceptible than average to inconsistent data so use C*'s anti-entropy mechanisms to counter this (repair, read repair, hinted handoff).
Consistency level quorum or all would start to get expensive but I see you don't intend to use them.
Is replication factor == cluster size a common (or even a reasonable)
deployment strategy aside from the obvious case of a cluster of one?
It's not common, I guess you are looking for super high availability and all your data fits on one box. I don't think I've ever seen a c* deployment with RF > 5. Far and wide RF = 3.
If I were to use a write consistency of "one" does Cassandra always
write the data to the node contacted by the client?
This depends on your load balancing policies at the driver. Often we select token aware policies (assuming you're using one of the Datastax drivers), in which case requests are routed to the primary replica automatically. You could use round robin in your case and have the same effect.
The primary downfall will be increased write costs at the coordinator level as you add nodes. The maximum number of replicas written to I've seen is around 8 (5 for other data centers and 3 for local replicas).
In practice this will mean a reduced stability while performing large or batched writes (greater than 1mb) or a lower per node write TPS.
The primary advantage is you can do a lot of things that'd normally be awful and impossible to do. Want to use secondary indexes? probably will work reasonably well (assuming cardinality and partition size doesn't become your bottleneck there). Want to add a custom UDF that does GroupBy or use very large IN queries it'll probably work.
It is as #Phact mentions not a common usage pattern and I primarily saw it used with DSE Search on low write throughput use cases that had requirements for 'single node' features from Solr, but for those same use cases with pure Cassandra you'd get some benefits on the read side and be able to do expensive queries that are normally impossible in a more distributed cluster.

maintaining dynamic consistency level in datastax

I have a 5 node cluster and keyspace with replication factor of 3. The nature of operations are such that writes are much more important than read, but frequency of read operations are about 10 times higher than write. To achieve consistency while improving overall performance, I chose to set consistency level for writes as ALL, and ONE for read. But this causes operations to fail if even one node is down.
Is there a method by which I can simultaneously change consistency level for (Write,Read) from (ALL,ONE) to (QUORUM, QUORUM) if one node is detected down, or if there is a query-execution-exception; plus this be done in a manner that no operations pass through a temporary phase where it sees a temporary (QUORUM, ONE) setting.
We also plan to expand to twice the capacity, 3 datacenter with 4 nodes each. Is it possible to define custom consistency levels, like, (a level of ALL in any one datacenter and ONE in others). I'm thinking that a level of (EACH_ONE) for read, coupled with above level for write will insure consistency but will allow the cluster to remain available even if a node goes down.
The flexibility is there since you can set your consistency level at a per request basis. Depending on the client you are using, there are some nice capabilities. For example, the java driver has something called a DowngradingConsistencyRetryPolicy such that if a request fails, it will be retried with the next lowest consistency level until the request succeeds. This pushes the complexity of retrying into the client so you don't have to write a bunch of code for it, it's really nice!
The java driver also allows you to configure consistency level per request with Statement#setConsistencyLevel()
As far as custom consistency levels, this is not an option available to you (without changing the cassandra source code), however I think what is made available should be sufficient.
For reads, I don't find much value in ensuring consistency between Data Centers on read. I think LOCAL_QUORUM is more than sufficient, but if you really care, you can use something like EACH_QUORUM for to ensure all datacenters agree, but that will severely impact your response time and availability. For example, if one of your datacenters goes down completely, you won't be able to do reads at all (unless downgrading).
For writes, I'd strongly recommend not using ALL in a multi datacenter set up if you care about response time and availability. Depending on your requirements, LOCAL_QUORUM should likely be more than sufficient.
While one of the benefits of Cassandra is that consistency is tunable, you can have as much strong consistency as you like, but keep in mind that Cassandra is at its best as a Highly Available, Partition Tolerant system.
A really good presentation on consistency that I think really nails a lot of these points is Christos Kalazantis' talk 'Eventual Consistency != Hopeful Consistency' which suggests that a consistency level of ONE is sufficient for a lot of use cases.

Which part of the CAP theorem does Cassandra sacrifice and why?

There is a great talk here about simulating partition issues in Cassandra with Kingsby's Jesper library.
My question is - with Cassandra are you mainly concerned with the Partitioning part of the CAP theorem, or is Consistency a factor you need to manage as well?
Cassandra is typically classified as an AP system, meaning that availability and partition tolerance are generally considered to be more important than consistency. However, real world systems rarely fall neatly into these categories, so it's more helpful to view CAP as a continuum. Most systems will make some effort to be consistent, available, and partition tolerant, and many (including Cassandra) can be tuned depending on what's most important. Turning knobs like replication factor and consistency level can have a dramatic impact on C, A, and P.
Even defining what the terms mean can be challenging, as various use cases have different requirements for each. So rather than classify a system as CP, AP, or whatever, it's more helpful to think in terms of the options it provides for tuning these properties as appropriate for the use case.
Here's an interesting discussion on how things have changed in the years since the CAP theorem was first introduced.
CAP stands for Consistency, Availability and Partition Tolerance.
In general, its impossible for a distributed system to guarantee above three at a given point.
Apache Cassandra falls under AP system meaning Cassandra holds true for Availability and Partition Tolerance but not for Consistency but this can further tuned via replication factor(how many copies of data) and consistency level (read and write).
For more info: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html
Interestingly it depends on your Cassandra configuration. Cassandra can at max be AP system. But if you configure it to read or write based on Quorum then it does not remain CAP-available (available as per definition of the CAP theorem) and is only P system.
Just to explain things in more detail CAP theorem means:
C: (Linearizability or strong consistency) roughly means
If operation B started after operation A successfully completed, then
operation B must see the system in the same state as it was on
completion of operation A, or a newer state (but never older state).
A:
“every request received by a non-failing [database] node in the system must result in a [non-error] response”. It’s not sufficient for some node to be able to handle the request: any non-failing node needs to be able to handle it. Many so-called “highly available” (i.e. low downtime) systems actually do not meet this definition of availability.
P
Partition Tolerance (terribly misnamed) basically means that you’re communicating over an asynchronous network that may delay or drop messages. The internet and all our data centres have this property, so you don’t really have any choice in this matter.
Source: Awesome Martin kleppmann's work
The CAP theorem states that a database can’t simultaneously guarantee consistency, availability, and partition tolerance
Since network partitions are part of life, distributed databases tend to be either CP or AP
Cassandara was meant for AP but you can fine tune consistency at the cost of availability.
Availability : It was ensured with replicas. Cassandra typically writes multiple copies to different cluster nodes (generally 3). If one node is unavailable, data won't be lost.
Writing data to multiple nodes will take time because nodes are scattered in different location. At some point of time, data will become eventually consistent.
So with high availability preference, consistency is compramised.
Tunable consistency:
For read or write operation, you can mention consistency level. Consistency level refers to the number of replicas that need to respond for a read or write operation to be considered complete.
For non-critical features, you can provide less consistency level : say 1.
If you think consistency is important, you can increase the level to TWO, THREE or QUORAM ( A majority of replicas)
Assume that you set the consistency level to high (QUORAM) for your critical features and majority of the nodes are down. In this case, write operation will fail.
Here Cassandra sacrificies availabiltiy for consistency.
Have a look at this article for more details.

How does increasing the Cassandra Replication Factor give more consistency

I was reading "Cassandra The Definitive Guide" and page 46 has this to say about Replication Factor:
"The replication factor essentially allows you to decide how much you
want to pay in performance to gain more consistency. That is, your
consistency level for reading and writing data is based on the
replication factor"
Now to me that's news. If replication is increased, it is kind of intuitive that it improves availability and depending on topology of the cluster its partition tolerance as well. However why does the author say that it increases consistency. I will think its quite the opposite. You have to take extra effort to ensure consistent state of your persistent data by propagating updates to every replica on different nodes. So more the replica, harder it is to maintain consistency. Why does the author say the exact opposite?
All inputs appreciated.
The consistency level specifies how many replicas must respond before a result is returned. See the documentation.
So, if you're using a consistency level of Quorum or higher, the higher the replication factor, the more nodes need to respond before a result can be returned.
Replication factor describes how many copies of your data exist. Consistency level describes the behavior seen by the client. Perhaps there's a better way to categorize these.
As an example, you can have a replication factor of 2. When you write, two copies will always be stored, assuming enough nodes are up. When a node is down, writes for that node are stashed away and written when it comes back up, unless it's down long enough that Cassandra decides it's gone for good.
For example with 2 nodes, a replication factor of 1, read consistency = 1, and write consistency = 1:
Your reads are consistent
You can survive the loss of no nodes.
You are really reading from 1 node every time.
You are really writing to 1 node every time.
Each node holds 50% of your data.
For More Info: Configuring data consistency

Resources