Cassandra Quorum : Consistency Level - cassandra

I have a 3 DC ring in Cassandra with each DC having a 4 node cluster. So its 4 nodes*3(DC) = 12 nodes. I'm testing how Cassandra behaves when some nodes go down when we have Quorum consistency level. We have set a replication factor of 3 on each datacenter. So our
Quorum = Floor(Sum of Replication FActor/2) + 1. RF = 3 quorum= 5.
In theory if I have five nodes in my 12 node cluster, I should be good for read and write. So I brought down a full Datacenter DC1, and 3 nodes in another datacenter(Dc2). So I have 1 node up in DC2 and whole of DC3(4 nodes). I have 5 nodes up. By theory, this should be good for my writes to be succesfull in quorum consistency. But, when I ran, I get
Cassandra.Unavailable Exception: Not enough replica available for query at consistency ONE (5 required but only 4 alive).
But, I do have 5 nodes alive. What am I missing here ?

Quorum is for the entire cluster and Local_Quorum is for a single Data center. Just some basics to understand, cassandra is distributed systems meaning data is distributed in your cluster with each node owning a primary range and at the same time replicating data of other nodes. This means nodes which are responsible to store a piece of data are the only nodes which are calculated for Consistency. In your case 5 nodes are up does not mean Quorum consistency is met for the writes or reads, because the DC with all nodes up will definitely have data in atleast 3 nodes (remember your RF is 3), but the DC with only 1 node will either have or not have data you are querying.
In your case if you hit the DC with all node up using a Local_quorum you will get correct results.

QUORUM by itself, refers to members of same data-center.
Which in your case DC3 has of 4.
But you asked for QUORUM of 5, which DC3 cannot provide.
That is why there is concept like ONE and LOCAL_ONE.
I am pretty sure you will get same error at QUORUM 5, even if your all DC nodes are up.
You can refer : http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html

From my point of view the operations should and will fail.
From the DC that is up you can guarantee at any time 3 replicas, RF 3.
The node up in the other DC has ~60% to nail down another replica.
3 + 1 = 4.
You`re asking for CL 5.
5 > 4 => fail.

Related

ResponseError: Not enough replicas available for query at consistency SERIAL (2 required but only 1 alive)

I am a newcomer for Cassandra, current I met an issue, my cassandra setup as following,
1 DC, 1 Cluster
3 Nodes.
SimpleStrategy
durable write : true
Replication factor : 2 when creating keyspace.
Use IF NOT EXISTS to insert data into table.
Seed node: 2 of them
Then I bring down one seed node, and I got the following error:
ResponseError: Not enough replicas available for query at consistency SERIAL (2 required but only 1 alive)
That's normal, SERIAL requires a Paxos transaction with a quorum of replicas. For RF 2, the quorum is 2; iow, you cannot tolerate any node down to write at SERIAL to a keyspace with RF 2.
Rule of thumb: don't use RF 2, it's useless. Your quorum is: (2/2)+1 = 2, but for RF 3, it's the same quorum. So you should always prefer RF 3. If you change your keyspace to RF 3, your application would be able to write at SERIAL even if one replica is down.
Also see https://www.ecyrd.com/cassandracalculator/
As per understanding Consistency serial is equivalent to QUORUM.You have RF=2 in 3 node cluster so data in Cassandra inserted based on hash. so when you have inserted the data into the cluster, data may be inserted on both seed nodes.So when you are retrieving the data with one seed node down you can get this error as cluster is not achieving the desired consistency level.
Please refer link for more details.
https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbInternals/dbIntConfigSerialConsistency.html

Cassandra QUORUM write consistency level and multiple DC

I'm a bit confused about how QUORUM write selects nodes to write into in case of multiple DC.
Suppose, for example, that I have a 3 DC cluster with 3 nodes in each DC, and the replications factor is 2, so that the number of replicas needed to achieve QUORUM is 3. Note: this is just an example to help me formulate my question and not the actual configuration.
My question is the following: in case of write, how these 3 replicas will be distributed across all the DCs in my cluster? Is it possible that all 3 replicas will end up in the same DC?
The replication is defined at the key space level. So for example
create keyspace test with replication = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 2, 'DC2' : 2, 'DC3' : 2 };
As you can see clearly each DC will hold two copies of data for that keyspace and not more. You could have another key space in the same cluster defined only to replicate in one DC and not the other two. So its flexible.
Now for consistency, with 3 DCs and RF=2 in each DC, you have 6 copies of data. By definition of Quorum a majority (which is RF/2 + 1) of those 6 members needs to acknowledge the write, before claiming that the write was successful. So 4 nodes needs to respond for a quorum write here and these 4 members could be a combination of nodes from any DC. Remember the number of replicas matter to calculate quorum and not the total no. of nodes in DC.
On a side note, in Cassandra, RF=2 is as good as RF=1. To simplify, lets imagine a 3 node single DC situation. With RF=2 there are two copies of data and in order to achieve quorum ((RF=2)/2 + 1), 2 nodes needs to acknowledge the write. So both the nodes always have to be available. Even if one node fails the writes will start to fail. Event another node can take hints here, but your reads with quorum are bound to fail. So fault tolerance of node failure is equal to zero in this situation.
You could use local_quorum to speed up the writes instead of quorum. Its sacrifice of consistency over speed. Welcome to "eventually consistency".
Consistency Level Determine the number of replicas on which the write must succeed before returning an acknowledgment to the client application
Even at low consistency levels, the write is still sent to all replicas for the written key, even replicas in other data centers. The consistency level just determines how many replicas are required to respond that they received the write.
Source : http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
So If you set Consistency Level to QUORUM. I assume each DC have RF of 2. And so QUORUM is 3. So all your write still send all replicas of each DC (3 * 2 = 6 node) And will wait for 3 node to success after that it will send the acknowledgment to the client

If the replication factor is 3, does it mean I need 4 nodes in a datacenter?

I set keyspace like the following.
CREATE KEYSPACE name_of_keyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 3};
If I want to follow the rule of this keyspace, do I need to have 3 or 4 nodes in dc1?
The reason why I'm confused is that there are two different types of nodes, one is coordinator node and the other is general node that can be chosen when a node fails.
Should I include this coordinator node as part of general node and create only 3 nodes in dc1 or create 4 nodes to make this work?
In Cassandra all nodes can act as coordinator. So for a request that requires a coordinator the node the client connected to will act as a coordinator.
A RF of 3 with 4 nodes is fine for a DC, but it is not needed unless you have a capacity you are trying to reach with the extra node. In one of my clusters we have 18 nodes for capacity with a RF of 3. That's generally how you scale Cassandra.
The coordinator node is chosen at query time. All nodes have the same capabilities.
When you run a cluster with rf 3 and run a query, for a partition:
need only one node up if you read/write with consistency level 1.
need two nodes if you read/write with quorum or two CL
three nodes if you read/write with all or three CL
Note that the read/writes are issued to all nodes that holds/should write the data, but the driver will wait for the configured level.
Check this page for more information about consistency levels.
So, you can run a 3 nodes cluster with rf 3,and depending on what CL you read/write you can survive 0, 1, or 2 nodes being down.

Deciding Optimal number of cassandra nodes with a seed nodes of 3 and replication factor 3

I am working on creating a cassandra cluster.
Our system is write heavey and planning use 3 seed nodes and total of 10 cassandra nodes (including 3 seed nodes).
We are using replication factor of 3 and consistency level QUORUM.
Is there any consideration of odd/even number of cassandra nodes based on replication factor / no of seed nodes.?
The number of seed nodes is unrelated to the replication factor. The seeds are used when a new node joins the cluster. New nodes consult the seeds to get their initial configuration and learn the addresses of the other nodes. You need 2-3 seeds to provide redundancy, that's all.
The replication factor indicates how many nodes have copies of the data as you probably know. RF=3 means three nodes have copies of data. Consistency level QUORUM means that 2 nodes need to reply to the coordinator (because 2 is a quorum of 3). This has nothing to do with the number of nodes in the cluster, as long as you have more than 3 nodes for RF=3! Even/odd doesn't matter, number of seeds doesn't matter.

Cassandra Cluster 1.1.10

I am new to Cassandra and at work I have a 4 node cluster.
nodetool gossipinfo tells me that there are one datacentre, 2 racks and 2 nodes in each rack. Replication factor is defined as 2. nodetool ring tell me that each node has 50% ownership. There are 2 seed nodes in our config. Each rack has 1 seed node.
Does this mean that for each rack, there is one seed node and its replicated node. If that is the case then why is datasize not the same for seed node and its replicated node.
what happens if one node goes down. Will it have any impact on the data availability of the cluster.
Seeds
Seeds nodes are only special in the way that new nodes that join the cluster contact the seed nodes to find out about other nodes and the topology of the ring. But in Cassandra, all nodes are the same, i.e. there are no master or slave, no primary or secondary node. Because of this, you can elect any (or all) node as the seed.
Since seeds only relate to gossip information, it does not have anything to do with replicated data.
Size
In relation to data size, each node will never be exactly the same since each partition/row size is never the same. If you look at the nodetool cfstats output, you will see that there is a big range between minimum and maximum sizes.
Availability
If the reads are done with a consistency level CL=ONE, then if a node is down the other replica will continue to serve requests. But if reads are done with a higher consistency, then reads will fail since it needs 2 nodes to be available, i.e. CL=LOCAL_QUORUM requires [ RF/2 + 1 ] nodes to respond.
EDIT: Response to:
Shouldn't each node own 25%?
Ownership
In Cassandra, data is not "distributed" across ALL nodes in ALL DCs. In fact, a DC is a copy of another DC depending on the replication factor.
To illustrate, consider the following keyspace definition:
CREATE KEYSPACE "myKS"
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'DC1' : 2,
'DC2' : 2};
Based on this definition, it means that the myKS keyspace has 2 replicas in DC1 and 2 replicas in DC2. Since each of your data centres only have 2 nodes, this effectively means that each DC is a copy of each other.
Following from that, since the tokens are split between 2 nodes, each node owns half of the data which is 50%. So in DC1, each node owns 50% and in DC2 (which is a copy of DC1) each node also owns 50%.

Resources