Cassandra high availability - cassandra

I have a six node Cassandra cluster with NetworkTopologyStrategy and here is my schema:
Rack1
Cassandra-01
Cassandra-02
Rack2
Cassandra-03
Cassandra-04
Rack3
Cassandra-05
Cassandra-06
We use the CL=QUORUM and Replication factor 3 for Read/Write, so technically we tolerate a single RACK failure (loss of 2 nodes from rack).
For example, I write to Cassandra cluster (CL=QUORUM,RF=3) and Rack3 is going to be offline (hardware failure) and in total I have 4 nodes. Theoretically I should be able to write and read data to Cassandra because the Consistency level satisfied. But when I use the [Cassandra calculator] it says:
You can survive the loss of 1 node without impacting the application.
and
You can survive the loss of 1 node without data loss.
But why only 1 node?

The calculator has no knowledge built into it about the rack aspect of the above configuration - so let's leave that for the moment. You have entered 6 nodes, RF 3 and Write / Read at Quorum.
If there were no racks involved (they are all in the same rack) - then the answers make sense.
Since writes were being made at Quorum, you can only guarantee 2 of the nodes will have the data at the point of write being acknowledged as successful, if immediately after writing 2 nodes they then failed, you could suffer data loss (because the 3rd did not get the data). Thus you can only tolerate 1 node loss without potential data loss in a worst case scenario.
You are correct to say that using NetworkTopologyStrategy with 3 racks, 2 nodes per rack - and using Quorum, you could lose an entire rack and still operate. Why does the calculation change?
Well, some of the calculation doesn't - while you can write at Quorum and Read at Quorum still, there is still the possibility of a node being read not having the data as yet, but it should read-repair and fix itself. (Assuming it is enabled on the table etc)
You shouldn't lose data though, since the rack aspect means you also have gained a further certainty that the 2 nodes in the same rack which went down did not have the same partitions appearing on both of them. So while 2 nodes are down, you did not eliminate 2 copies of the same partition - at least one node in another rack has the data (otherwise the quorum write would not of acknowledged)
If you follow the github link on the page itself, you can see the calculation for each of the values it provides in the html, for example:
var dataloss = w - 1;
$('#dataloss').text( dataloss > 1 ? dataloss+" nodes" : dataloss === 1 ? "1 node" : "no nodes");
w in this instance is the 'write' consistency level, when set to Quorum, is calculates w as being 2. There is no input for racks nor consideration of it in the code.

The answer is in your question .
Check the max(write level) or max(read level) , its 2 in your case .
RF - 2 , i.e. 3 - 2 = 1 node can be compromised at any point of time .

Related

Cassandra: 2 required but only 1 alive & 3 replica were required but only 2 acknowledged the write

I got two errors on write the data into Cassandra, want to know the difference between them.
3 replica were required but only 2 acknowledged the write
2 required but only 1 alive
Consistency Level is LOCAL_QUORUM.
As per my observations, When I got the first exception I see the data is written into one of the node, on second exception I do not see the data in any node.
Is my observation is correct, please help me on this.
It's a bit difficult to provide a clear answer without knowing the cluster topology and the keyspace replication factor. The full error messages + full stack trace are also relevant to know.
For LOCAL_QUORUM consistency to require 3 replicas to respond, it indicates that you either have 4 or 5 nodes in the local DC -- quorum of 4 or 5 is 3.
In the second instance, LOCAL_QUORUM requires 2 replicas when there are either 2 or 3 nodes in the local DC. And yes, quorum of 2 nodes is still 2 nodes meaning your app cannot tolerate an outage if either node goes down. For this reason, we recommend a minimum of 3 nodes in each DC for production clusters. Cheers!

ScyllaDB / Cassandra higher replication factor than total number of nodes with CL=QUORUM

Highly appreciate if someone can help with below questions.
*RF= Replication Factor
*CL= Consistency Level
We have requirement of strong Consistency and higher Availability. So, I have been testing RF and CL for 7 nodes ScyllaDB cluster , by keeping RF=7 (100% data on each node) and CL=QUORUM.
What will happen to data copy / replication if 2 nodes goes down ? Does it replicate 2 down nodes data (6th & 7th copy) on to remaining 5 nodes?
or will it simply discard these copies ? What will be effect of RF=7 when there are only 5 active nodes ?
I could not find anything in logs. Do we have any document/link reference for this case ? Or how can I verify and prove this behaviour? Please explain?
With RF=7, the data is always replicated to 7 nodes.
When a node (or two) goes down, the rest of the five nodes already have a copy, and no additional streaming is required.
Using CL=QUORUM, even three nodes down, will not hurt your HA or consistency.
When the fail nodes come back to life, they will be sync, either automatically using Hinted Handoff (for a short failure) or with Repair (for longer failure)[1]
If you replace a dead node[2], the other replicas will stream the data to it till it is up to speed with the
[1] https://docs.scylladb.com/architecture/anti-entropy/
[2] https://docs.scylladb.com/operating-scylla/procedures/cluster-management/replace_dead_node/
Data will always replicate to all nodes cause you have set RF=7 if 2 nodes down then remaining nodes will store hints for those nodes once, nodes come up remaining nodes will replicate the data automatically based on hint period.If hint period(default 3 hours) expired then you need to run manual repair to get data sync in the cluster.

Cassandra cluster works with 2 nodes?

I have 2 nodes with replication factor = 1, it means will have 1 set of data copy in each node.
Based on above description when i use murmur3partitioner,
Will data shared among nodes ? like 50% of data in node1 and 50% of data in node 2?
when i read request to node 1 , will it internally connect with node 2 for consistency ?
And my intention is to make a replica and both nodes should server the request independently without inter communication.
First of all, please try to ask only one question at per post.
I have 2 nodes with replication factor = 1, it means will have 1 set of data copy in each node.
Incorrect. A RF=1 indicates that your entire cluster will have 1 copy of the data.
Will data shared among nodes ? like 50% of data in node1 and 50% of data in node 2?
That is what it will try to do. Do note that it probably won't be exact. It'll probably be something like 49/51-ish.
when i read request to node 1 , will it internally connect with node 2 for consistency ?
With a RF=1, no it will not. Based on the hashed token value of your partition key, it will be directed only to the node which contains the data.
As an example, with a RF=2 with 2 nodes, it would depend on the consistency level set for your operation. Reading at ONE will always read only one replica. Reading at QUORUM will always read from 2 replicas with 2 nodes (after all, QUORUM of 2 equals 2). Reading at ALL will require a response from all replicas, and initiate a read repair if they do not agree.
Important to note, but you cannot force your driver to connect to a specific Cassandra node. You may provide one endpoint, but it will find the other via gossip, and use it as it needs to.

2 and 3 node Cassandra cluster setup

I have 2 issues.
1- I need 2 node cluster. I must write & read if 1 node is gone.
I think I need= Replication Factor:2 Consistency:1
But it certainly should not be the loss of any data.
2- I need 3 node cluster and nodes may increase over time. I should be able to add more nodes with this setup.
I think I need= Replication Factor:3 Consistency:2
The question is: Am i right on this configurations?
Thank you for answers.
tldr; Use 3 node cluster for both, write/read quorum.
Keep in mind: Write Consistency + Read Consistency > Replication factor to ensure that your data is consistent.
You may as well go with a 3 node cluster for both. Alternatively in your first setup (2 nodes) is the same as a 1 node 'cluster'. Because you're either writing to ALL, or your reads won't be consistent.
Setup 1:
For Setup 1 in a RF 2 you would need to:
Write(2) + Read(1) If a node goes down, you can no longer write to the cluster.
Read(2) + write(1) If a node goes down you can no longer read from the cluster AND you've lost data.
Write(2) + Read(2) Same effort as 3 node cluster. Why do 2?
Neither of these is ideal and the only advantage the 2 node cluster has over a 1 node 'cluster' is that if you're writing to all at least you don't lose data when a node dies.
Setup 2:
Ideally here you go with quorum, otherwise you're more likely to suffer outages.
Write(2) + Read(2) > RF Quorum (option 1) enables 1 node to die and your cluster to function as normal.
Write(3) + Read(1) > RF Write 3 allows you to only read from 1 node, speeding up reads at the cost of failing writes if a node becomes disconnected.
Write(1) + Read(3) > RF Read 3 Speeds up writes at the cost of losing data if a node goes down.
You should keep in mind that each of your servers will hold 100% of your data. So in terms of speed you probably won't gain too much. Otherwise, having a CL of 1 for both read and writes won't lead to data loss. Also, don't forget about the hints, if 1 node is down and bring it up.
For this one you could use also QUORUM. This one implies that half of the replicas plus 1 should respond to the operation.
Also, I would suggest some reading Configuring data consistency and Replication, High Availability and Consistency in Cassandra

Brand new to Cassandra, having trouble understanding replication topology

So I'm taking over our Cassandra cluster after the previous admin left so I'm busy trying to learn as much as I can about it. I'm going through all the documentation on Datastax's site as we're using their product.
That said, on the replication factor part I'm having a bit of trouble understanding why I wouldn't have the replication factor set to the number of nodes I have. I have four nodes currently and one datacenter, all nodes are located in the same physical location as well.
What, if any, benefit would there be to having a replication factor of less than 4?
I'm just thinking that it would be beneficial from a fault tolerance standpoint if each node had its own copy/replica of the data, not sure why I would want less replicas than the number of nodes I have. Are there performance tradeoffs or other reasons? Am I COMPLETELY missing the concept here (entirely possible)?
There are a few reasons why you might not want to increase your RF from 3 to 4:
Increasing your RF effectively multiplies your original data volume
by that amount. Depending on your data volume and data density you
may not want to incur the additional storage hit. RF > number of nodes will help you scale beyond one node's capacity.
Depending on your consistency level you could experience a performance hit. I.E. when writing with quorum consistency level (CL) to an RF of 3 you wait for 2 nodes to come back before confirming the write to the client. In RF of 4 you would be waiting for 3 nodes to come back.
Regardless of the CL, every write will eventually be going to every node. This is more activity on your cluster and may not perform well if your nodes aren't scaled for that workload.
You mentioned fault tolerance. With an RF of 4 and reads on CL one, you can absorb up to 3 of your servers being down simultaneously and your app will still be up. From a fault tolerance perspective this is pretty impressive, but also unlikely. My guess would be if you have 3 nodes down at the same time in the same dc, the 4th is probably also down (natural disaster, flood, who knows...).
At the end of the day it all depends on your needs and C* is nothing if not configurable. An RF of 3 is very common among Cassandra implementations
Check out this deck by Joe Chu
The reason why your RF is often less than the number of nodes in the cluster is explained in the post: Cassandra column family bigger than nodes drive space. This post provides insight into this interesting aspect of Cassandra replication. Here's a summary of the post:
QUESTION: . .. every node has 2Tb drive space and column family is replicated on every node so every node contains a full copy of it . . . after some years that column family will exceed 2Tb . . .
Answer: RF can be less than the number of nodes and does not need to scale if you add more nodes.
For example, if you today had 3 nodes with RF 3, each node will
contain a copy of all the data, as you say. But then if you add 3 more
nodes and keep RF at 3, each node will have half the data. You can
keep adding more nodes so each node contains a smaller and smaller
proportion of the data . . . no limit in principle to
how big your data can be.

Resources