Where does Cassandra reside in the CAP theorem? - cassandra

Datastax course says that Cassandra is availability/partition tolerance. However, according to this document it can be tuned to be strong consistency (i.e. CP) by setting W + R > RF, where W is the write consistency level, R is the read consistency level, and RF is the replication factor.

Tuneable to strong consistency for single partition
It can be tuned to strong consistency for documents in single partition. So if you statements belong to different partitions (note same partition key and different table is still different partition), you cannot tune it for strong consistency. So Cassandra has its upper bound to it's strong consistency unlike in RDBMS where you can update multiple records in different tables or different rows in same table atomically.
Tuning for higher consistency makes you lose some of the AvailabilityandPartition Tolerance`
When you use hinted handoff, it is almost on the AP axis as it is always available to write even with network partitions. But as soon you start tuning for higher consistency, clients have to wait for writes or reads until it is written to enough replicas /read from enough replicas to satisfy the requested consistency. So you are losing bit of availability and partition tolerance
Summary
You can configure it for maximum availability and partition tolerance but you cannot configure for much stronger consistency. So Cassandra lies in AP axis in CAP

Related

default consistency level and quorum setting in Cassandra and what is the best practice for tuning them

I just started learning Cassandra, wondering if there is a default consistency level and quorum setting. seems to me there are quite a few parameters (like replicator number, quorum number) are tunable to balance Consistency with performance, is there a best practice on these settings? what's the default settings?
Thank you very much.
Default READ and WRITE consistency is ONE in cassandra.
Consistency can be specified for each query. CONSISTENCY command can be used from cqlsh to check current consistency value or set new consistency value.
Replication factor is number of copies of data required.
Deciding consistency depends on factors like whether it is write heavy workload or read heavy workload, how many nodes failure can be handled at a time.
Ideally LOCAL_QUORUM READ & WRITE will give you strong consistency.
quorum = (sum_of_replication_factors / 2) + 1
For example, using a replication factor of 3, a quorum is 2 nodes ((3 / 2) + 1 = 2). The cluster can tolerate one replica down.Similar to QUORUM, the LOCAL_QUORUM level is calculated based on the replication factor of the same datacenter as the coordinator node. Even if the cluster has more than one datacenter, the quorum is calculated with only local replica nodes.
Consistency in cassandra
Following are the excellent links and should help you to understand consistency level and its configuration in Cassandra. Second link contains many pictorial diagrams.
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__about-the-quorum-level
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlClientRequestsReadExp.html

Best practice to set Consistency level and Replication factor for Cassandra

If Replication Factor and Consistency Level are set to QUORUM then we can achieve Availability and Consistency but Performance degrade will increase as the number of nodes increases.
Is this statement correct? If yes then what is the best practice to get better result, considering Availability and Consistency as high priority and not to decrease performance as number of nodes increases.
Not necessarily. If you increase the number of nodes in your cluster, but do not alter your replication factor, the number of replicas required for single partition queries does not increase so you should therefore not expect performance to degrade.
With a 10 node cluster, replication factor 3 and CL QUORUM, only 2 replicas are required to meet quorum, the same is true for a 20 node cluster.
Things change if your query requires some kind of fan out that requires touching all replica sets. Since you have more replica sets, your client or the coordinating C* node needs to make more requests to retrieve all of your data which will impact performance.

Difference between consistency level and replication factor in cassandra?

I am new to cassandra and I wanted to understand the granular difference between consistency level and replication factor.
Scenario: If I have a replication factor of 2 and consistency level of 3, how the write operation would be performed? When consistency level is set to 3, it means the results will be acknowledged to the client after writing to the 3 nodes. If data is written to 3 nodes, then it gives me a replication factor of 3 and not 2..? Are we sacrificing the replication factor in this case?
Can someone please explain where my understanding is wrong?
Thanks!
Replication factor: How many nodes should hold the data for this keyspace.
Consistency level: How many nodes needs to respond the coordinator node in order for the request to be successful.
So you can't have a consistency level higher than the replication factor simply because you can't expect more nodes to answer to a request than the amount of nodes holding the data.
Here are some references:
Understand cassandra replication factor versus consistency level
http://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureDataDistributeReplication_c.html
http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
You will get an error: Cannot achieve consistency level THREE.
You can do some further reading here
Consistency levels are of two types, write consistency and read consistency. Consistency levels can be one, two, three or quorum. If it's quorum, atleast half of the nodes should be available for the operation. Otherwise (for one, two, three), the name itself gives you the definition.
Replication factor is the number of copies that you are planning to maintain in the cluster. If the strategy is simple, you will have just one replication factor. If you are having network topology strategy and is using multi dc cluster, then you have to set replication factor for each data centre.
In your scenario, if you have RF as 2 and CL as 3, it will work(I am assuming there are more than 3 nodes in the cluster and atleast one seed node). In this scenario, it will check whether three nodes are up and normal to receive the data and if the CL is met, it will write two copies to two nodes.
For your second question
When consistency level is set to 3, it means the results will be
acknowledged to the client after writing to the 3 nodes. If data is written to 3 nodes, then it gives me a replication factor of 3 and not 2..?
As far as I understood cassandra, It will not be acknowledged to cassandra. It just needs the CL to be met and the number of nodes acknowledged about new data will be equal to the RF.
So, there is no question in sacrificing RF.

Cassandra consistency Issue

We have our Cassandra cluster running on AWS EC2 with 4 nodes in the ring. We have face data inconsistency issue. We changed consistency level two while using "cqlsh" shell, the data inconsistency issue has been solved.
But we dont know "How to set consistency level on Cassandra cluster?"
Consistency level can be set at per session or per statement basis. You will need to check the consistency level of writes and reads, to get a strong consistency your R + W ( read consistency + write consistency ) should be greater than your replication factor.
If you are using Java Driver, you can set default consistency at cluster level using "Cluster.Builder.withQueryOptions()" method.
http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Cluster.Builder.html#withQueryOptions-com.datastax.driver.core.QueryOptions-

nodetool repair across replicas of data center

Just want to understand the performance of 'nodetool repair' in a multi data center setup with Cassandra 2.
We are planning to have keyspaces with 2-4 replicas in each data center. We may have several tens of data centers. Writes are done with LOCAL_QUORUM/EACH_QUORUM consistency depending on the situation and reads are usually done with LOCAL_QUORUM consistency. Questions:
Does nodetool repair complexity grow linearly with number of replicas across all data centers?
Or does nodetool repair complexity grow linearly with a combination of number of replicas in the current data center, and number of data centers? Vaguely, this model could possibly sync data with each of the individual nodes in current data center, but at EACH_QUORUM-like operation against replicas in other data centers.
To scale the cluster, is it better to add more nodes in an existing data center or add a new data center assuming constant number of replicas as a whole? I ask this question in the context of nodetool repair performance.
To understand how nodetool repair affects the cluster or how the cluster size affects repair, we need to understand what happens during repair. There are two phases to repair, the first of which is building a Merkle tree of the data. The second is having the replicas actually compare the differences between their trees and then streaming them to each other as needed.
This first phase can be intensive on disk io since it will touch almost all data on the disk on the node on which you run the repair. One simple way to avoid repair touching the full disk is to use the -pr flag. When using -pr, it will disksize/RF instead of disksize data that repair has to touch. Running repair on a node also sends a message to all nodes that store replicas of any of these ranges to build merkle trees as well. This can be a problem, since all the replicas will be doing it at the same time, possibly making them all slow to respond for that portion of your data.
The factor which determines how the repair operation affects other data centers is the use of the replica placement strategy. Since you are going to need consistency across data centers (EACH_QOURUM cases) it is imperative that you use a cross-dc replication strategy like the Network Topology strategy in your case. For repair this will mean that you cannot limit yourself to local dc while running the repair since you have some EACH_QUORUM consistency cases. To avoid a repair affecting all replicas in all data centers, you should a) Wrap your replication strategy using Dynamic snitch and configure the badness threshold properly b) Use -snapshot option while running the repair.
What this will do is take a snapshot of your data (snapshots are just hardlinks to existing sstables, exploiting the fact that sstables are immutable, thus making snapshots extremely cheap) and sequentially repair from the snapshot. This means that for any given replica set, only one replica at a time will be performing the validation compaction, allowing the dynamic snitch to maintain performance for your application via the other replicas.
Now we can answer the questions you have.
Does nodetool repair complexity grow linearly with number of replicas across all data centers?
You can limit this by wrapping your replication strategy with Dynamic snitch and pass -snapshot option during repair.
Or does nodetool repair complexity grow linearly with a combination of number of replicas in the current data center, and number of data centers? Vaguely, this model could possibly sync data with each of the individual nodes in current data center, but at EACH_QUORUM-like operation against replicas in other data centers.
The complexity will grow in terms of running time with the number of replicas if you use the approach above. This is because the above approach will do a sequential repair on one replica at a time.
To scale the cluster, is it better to add more nodes in an existing data center or add a new data center assuming constant number of replicas as a whole? I ask this question in the context of nodetool repair performance.
From nodetool repair perspective IMO, this does not make any difference if you take the above approach. Since it depends on the overall number of replicas.
Also, the goal of repair using nodetool is so that deletes do not come back. The hard requirement for routine repair frequency is the value of gc_grace_seconds. In systems that seldom delete or overwrite data, you can raise the value of gc_grace with minimal impact to disk space. This allows wider intervals for scheduling repair operations with the nodetool utility. One of the recommended ways to avoid frequent repairs is to have immutability of records by design. This may be important to you since you need to run on a tens of data centers and ops will otherwise already be painful.

Resources