I have 2 issues.
1- I need 2 node cluster. I must write & read if 1 node is gone.
I think I need= Replication Factor:2 Consistency:1
But it certainly should not be the loss of any data.
2- I need 3 node cluster and nodes may increase over time. I should be able to add more nodes with this setup.
I think I need= Replication Factor:3 Consistency:2
The question is: Am i right on this configurations?
Thank you for answers.
tldr; Use 3 node cluster for both, write/read quorum.
Keep in mind: Write Consistency + Read Consistency > Replication factor to ensure that your data is consistent.
You may as well go with a 3 node cluster for both. Alternatively in your first setup (2 nodes) is the same as a 1 node 'cluster'. Because you're either writing to ALL, or your reads won't be consistent.
Setup 1:
For Setup 1 in a RF 2 you would need to:
Write(2) + Read(1) If a node goes down, you can no longer write to the cluster.
Read(2) + write(1) If a node goes down you can no longer read from the cluster AND you've lost data.
Write(2) + Read(2) Same effort as 3 node cluster. Why do 2?
Neither of these is ideal and the only advantage the 2 node cluster has over a 1 node 'cluster' is that if you're writing to all at least you don't lose data when a node dies.
Setup 2:
Ideally here you go with quorum, otherwise you're more likely to suffer outages.
Write(2) + Read(2) > RF Quorum (option 1) enables 1 node to die and your cluster to function as normal.
Write(3) + Read(1) > RF Write 3 allows you to only read from 1 node, speeding up reads at the cost of failing writes if a node becomes disconnected.
Write(1) + Read(3) > RF Read 3 Speeds up writes at the cost of losing data if a node goes down.
You should keep in mind that each of your servers will hold 100% of your data. So in terms of speed you probably won't gain too much. Otherwise, having a CL of 1 for both read and writes won't lead to data loss. Also, don't forget about the hints, if 1 node is down and bring it up.
For this one you could use also QUORUM. This one implies that half of the replicas plus 1 should respond to the operation.
Also, I would suggest some reading Configuring data consistency and Replication, High Availability and Consistency in Cassandra
Related
I got two errors on write the data into Cassandra, want to know the difference between them.
3 replica were required but only 2 acknowledged the write
2 required but only 1 alive
Consistency Level is LOCAL_QUORUM.
As per my observations, When I got the first exception I see the data is written into one of the node, on second exception I do not see the data in any node.
Is my observation is correct, please help me on this.
It's a bit difficult to provide a clear answer without knowing the cluster topology and the keyspace replication factor. The full error messages + full stack trace are also relevant to know.
For LOCAL_QUORUM consistency to require 3 replicas to respond, it indicates that you either have 4 or 5 nodes in the local DC -- quorum of 4 or 5 is 3.
In the second instance, LOCAL_QUORUM requires 2 replicas when there are either 2 or 3 nodes in the local DC. And yes, quorum of 2 nodes is still 2 nodes meaning your app cannot tolerate an outage if either node goes down. For this reason, we recommend a minimum of 3 nodes in each DC for production clusters. Cheers!
I have a six node Cassandra cluster with NetworkTopologyStrategy and here is my schema:
Rack1
Cassandra-01
Cassandra-02
Rack2
Cassandra-03
Cassandra-04
Rack3
Cassandra-05
Cassandra-06
We use the CL=QUORUM and Replication factor 3 for Read/Write, so technically we tolerate a single RACK failure (loss of 2 nodes from rack).
For example, I write to Cassandra cluster (CL=QUORUM,RF=3) and Rack3 is going to be offline (hardware failure) and in total I have 4 nodes. Theoretically I should be able to write and read data to Cassandra because the Consistency level satisfied. But when I use the [Cassandra calculator] it says:
You can survive the loss of 1 node without impacting the application.
and
You can survive the loss of 1 node without data loss.
But why only 1 node?
The calculator has no knowledge built into it about the rack aspect of the above configuration - so let's leave that for the moment. You have entered 6 nodes, RF 3 and Write / Read at Quorum.
If there were no racks involved (they are all in the same rack) - then the answers make sense.
Since writes were being made at Quorum, you can only guarantee 2 of the nodes will have the data at the point of write being acknowledged as successful, if immediately after writing 2 nodes they then failed, you could suffer data loss (because the 3rd did not get the data). Thus you can only tolerate 1 node loss without potential data loss in a worst case scenario.
You are correct to say that using NetworkTopologyStrategy with 3 racks, 2 nodes per rack - and using Quorum, you could lose an entire rack and still operate. Why does the calculation change?
Well, some of the calculation doesn't - while you can write at Quorum and Read at Quorum still, there is still the possibility of a node being read not having the data as yet, but it should read-repair and fix itself. (Assuming it is enabled on the table etc)
You shouldn't lose data though, since the rack aspect means you also have gained a further certainty that the 2 nodes in the same rack which went down did not have the same partitions appearing on both of them. So while 2 nodes are down, you did not eliminate 2 copies of the same partition - at least one node in another rack has the data (otherwise the quorum write would not of acknowledged)
If you follow the github link on the page itself, you can see the calculation for each of the values it provides in the html, for example:
var dataloss = w - 1;
$('#dataloss').text( dataloss > 1 ? dataloss+" nodes" : dataloss === 1 ? "1 node" : "no nodes");
w in this instance is the 'write' consistency level, when set to Quorum, is calculates w as being 2. There is no input for racks nor consideration of it in the code.
The answer is in your question .
Check the max(write level) or max(read level) , its 2 in your case .
RF - 2 , i.e. 3 - 2 = 1 node can be compromised at any point of time .
If there is a 4 node Cassandra cluster, is it possible to configure Cassandra in a way to have half of the nodes down (two in this case) without affecting the applications?
Also how long can nodes be down without Cassandra cancelling the write queue?
This depends on the client CL and DC replication factor.
Let's assume the RF is 4 (all), if the client has a CL=ONE or LOCAL_ONE, the application would not notice any issues. Any other client CL would have problems (e.g. cl=local_quorum of 4 is 3, allowing only a single node to be down).
Let's assume the RF=1 or 2. If CL=ONE or LOCAL_ONE, the application would be unaffected by queries that only manipulate data on the available nodes. However, any access to rows that only exist on the unavailable nodes would be impacted. In other words, CL=ONE or LOCAL_ONE only works if you're manipulating data that has at least one node available to return the response (You only need ONE to respond in this scenario). If the rows you're querying are on both of the unavailable nodes, you'll get an error stating something like: Expected response of 1, received 0.
Many applications configure CL to be some sort of quorum (local or not) - so in that case, the application would certainly fail unless you had RF=5 (so at least 5 nodes). Quorum of 5 is 3, allowing for 2 nodes to fail.
Hopefully that makes sense.
Yes, assuming you are talking about all four nodes in one data centre, if you set your replication factor to 3 or greater and your read and write consistency level to ONE.
For writes the nodes that are up will store hints for the nodes that are down, so when they come back up they can write the data. How long the nodes store these hints can be set in cassandra.yaml.
I have a setup with RF=2 and all my read/writes are done with CL=1. There are few places where i open a session, write an entry, backend processin and read again. This mostly works but sometimes the read returns Nil. We are suspecting that the read's from the co-ordinator node goes to a node that is different from where the Write was done. My understanding is that a co-ordinator node sends the read request to both replica nodes and return the results correctly.
We are not worried about the updates to a row as most of the time we need immediate consistency only for newly created row's. We really don't need Quoram and the RF=2 is mostly for HA to tolerate the loss of one node. Any pointers on how to acheive immediate consistency with RF=2 and CL=1 is greatly appreciated.
have a RF=3 with QUORUM would give you immediate consistency with ability to have single node loss. Anything less then that and its impossible to guarantee it as there will always be windows where one node sees mutation before other.
R + W > N to have a consistent read/write.
R (number of nodes needed for read) + W (number of nodes needed for write) > N (number of nodes with data, RF)
with CL 1 on reads/writes and RF=2 you have 1+1 which is not > 2. You can use ALL, TWO or QUORUM on either read or write and you would get your consistency (only because rf=2 for TWO and QUORUM) but then any node failure will bring down abilities to do either reads or writes.
Let's say I have a 3 node cluster.
I am writing to node #1.
If node #2 in that cluster goes down, and then comes back up and is resyncing the data from the other nodes, and I continue writing to node #1, will the data be synchronously replicated to node #2? That is, is the replication factor of that write honored synchronously or is it behind the queue post resync?
Thanks
Steve
Yes granted that you are reading and writing at a consistency level that can handle 1 node becoming unavailable.
Consider the following scenario:
You have a 3 node cluster with a keyspace 'ks' with a replication factor of 3.
You are writing at a Consistency Level of 'QUORUM'
You are reading at a Consistency level of 'QUORUM'.
Node 2 goes down for 10 minutes.
Reads and Writes can successfully continue while node is down since 'QUORUM' only requires 2 (3/2+1=2) nodes to be available. While Node 2 is down, both Node 1 and 3 maintain 'hints' for Node 2.
Node 2 comes online. Node 1 and 3 send hints they recorded while Node 2 was down to Node 2.
If a read happens and the coordinating cassandra node detects that nodes are missing data/not consistent, it may execute a 'read repair'
If Node 2 was down for a long time, Node 1 and Node 3 may not retain all hints destined for it. In this case, an operator should consider running repairs on a scheduled basis.
Also note that when doing reads, if Cassandra finds that there is a data mismatch during a digest request, it will always consider the data with the newest timestamp as the right one (see 'Why cassandra doesn't need vector clocks').
Node2 will immediately start taking the new writes and also any hints stored for this node by others. It is good idea to run a read repair on the node after it is back up, which will ensure the data is accurate with other nodes.
Note that each column has a timestamp stored against it which will help cassandra determine which data is recent when running node repair.