For some test purposes I want to break a consistency of data in my test cassandra cluster, consisting of two datacenters.
I assumed that if I use a consistency level equal to LOCAL_QUORUM, or LOCAL_ONE I will achieve this. Let us say I have a cassandra node node11 belonging to DC1:
cqlsh node11
CONSISTENCY LOCAL_QUORUM;
INSERT INTO test.test (...) VALUES (...) ;
But in fact, data appears in all nodes. I can read it from the node22 belonging to the DC2 even with the consistency level LOCAL_*. I've double checked: the nodetool shows me the two datacenters and node11 certainly belongs to the DC1, while node22 belongs to the DC2.
My keyspace test is configured as follows:
CREATE KEYSPACE "test"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2};
and I have two nodes in each DC respectively.
My questions:
It seems to me that I wrongly understand the idea of these consistency level. In fact they do not prevent from writing data to the different DC's, but just ask for appearing of the data at least in the current datacenter. Is it correct understanding?
More essentially: is any way to perform such a trick and achieve such a "broken" consistency, when I have a different data stored in two datacenters within one cluster?
(At the moment I think that the only one way to achieve that - is to break the ring and do not allow nodes from one DC know anything about nodes from another DC, but I don't like this solution).
LOCAL_QUORUM, this consistency level requires a quorum of acknoledgement received from the local DC but all the data are sent to all the nodes defined in the keyspace.
Even at low consistency levels, the write is still sent to all
replicas for the written key, even replicas in other data centers. The
consistency level just determines how many replicas are required to
respond that they received the write.
https://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
I don't think there is proper way to do that
This suggestion is to test scenario only to break data consistency between 2 DCs. (haven't tried but based on my understanding should work)
Write data in one DC (say DC1) with Local* consistency
Before write, keep nodes in DC2 down so DC1 will store hints as DC2 nodes are down.
Let max_hint_window_in_ms (3 hours by default - and you can reduce it) time pass so that DC1 coordinator will delete all the hints
Start DC2 nodes and query with LOCAL* query, the data from DC1 won't be present in DC2.
You can repeat these steps and insert data in DC2 with different values keeping DC1 down so same data will have different values in DC1 and DC2.
Related
Lets say I have two Data Centers(DC1, DC2) in a Single Cassandra cluster.
DC1 - 4 nodes.
DC2 - 4 nodes.
Initially i have set the replication factor for all the keyspaces to be {DC1:2 , DC2:2}.(Network topology strategy)
But After some time lets say I alter the keyspace and change the replication factor to {DC:2} for all the keyspaces.(removing DC1).No replication factor for DC1.
So now what will happen? Will DC1 get any data written into it in the future?
Will all the token ranges be assigned to only DC2?
If you exclude DC1 - it won't get data written for that keyspace, nor data will be read from the DC1. Before switching off DC1, make sure that you perform nodetool repair on the serves in DC2, to make sure that you have all data synchronized. After changing RF, you
When you change RF for specific keyspace, drivers and Cassandra itself recalculate the token ranges assignments taking into account information about data centers.
We use a multi-data center (DC) cassandra cluster. During write on to the cluster, I want only LOCAL DC to perform writes on its nodes as we are already routing the write requests to the desired DC only based on the source from where write is initiated. So, I want only LOCAL DC to process the write and no other DC to perform the writes on its nodes. But later on by virtue of replication among nodes across DCs, I want the written data to be replicated across DCs. Is this replication across DCs possible when I am restricting the write to only one DC in the first place.If I do not open connections to REMOTE hosts lying in different DCs during my write operation, is data replication possible amongst DCs later on. Why I definitely need replicas of data in all DCs is because during data read from cluster, we want the data to be read from any DC the read request falls on, not necessarily the LOCAL one.
Do anyone has solution to this?
You may want to use Local_Quorum consistency for writes if you want to perform them in only Local DC.
Check keyspace definition for the one you want these restriction. It should have class "Network topology" and RF in both DC. Something like this:
ALTER KEYSPACE <Keyspace_name> WITH REPLICATION =
{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
It states that after consistency is satisfied Cassandra will propagate the writes to another DC.
Use Quorum consistency for reads if they are not restricted to one DC but be aware that it might add bit latency because Cassandra has to read data from other data center as well.
Thanks for your answer Nikita. Also, one more clarification. Assume I use LOCAL_QUORUM for read consistency in my multi-DC cluster with three DCs - DC1, DC2, DC3 with three nodes in each DC with replication factor of 3. During read, let us assume request first lands on a node in DC1. This node has failed and hence second node in DC1 is contacted and so on and assume all nodes in the DC1 have failed. Then will the cluster connect to either DC2 or DC3 to satisfy the LOCAL_QUORUM, i.e., look for acknowledgement from two consistent reads from either one of the DCs (either DC2 or DC3). I am not expecting one read from DC2 and another from DC3. What I mean to ask is if the cluster falls back on DC2 after all DC1 nodes fail, will it start evaluating the LOCAL_QUORUM factor within the perspective of DC2 and if yes, then will the cluster call it is a successful read?
CQL query won't hit other data centers in case of LOCAL_QUORUM can't be succeed in local data center. However drivers implement such feature using DCAwareRoundRobinPolicy as you mentioned, but seems that it's not recommended. Also this article can be helpful for choosing of proper consistency level.
I'm a bit confused about how QUORUM write selects nodes to write into in case of multiple DC.
Suppose, for example, that I have a 3 DC cluster with 3 nodes in each DC, and the replications factor is 2, so that the number of replicas needed to achieve QUORUM is 3. Note: this is just an example to help me formulate my question and not the actual configuration.
My question is the following: in case of write, how these 3 replicas will be distributed across all the DCs in my cluster? Is it possible that all 3 replicas will end up in the same DC?
The replication is defined at the key space level. So for example
create keyspace test with replication = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 2, 'DC2' : 2, 'DC3' : 2 };
As you can see clearly each DC will hold two copies of data for that keyspace and not more. You could have another key space in the same cluster defined only to replicate in one DC and not the other two. So its flexible.
Now for consistency, with 3 DCs and RF=2 in each DC, you have 6 copies of data. By definition of Quorum a majority (which is RF/2 + 1) of those 6 members needs to acknowledge the write, before claiming that the write was successful. So 4 nodes needs to respond for a quorum write here and these 4 members could be a combination of nodes from any DC. Remember the number of replicas matter to calculate quorum and not the total no. of nodes in DC.
On a side note, in Cassandra, RF=2 is as good as RF=1. To simplify, lets imagine a 3 node single DC situation. With RF=2 there are two copies of data and in order to achieve quorum ((RF=2)/2 + 1), 2 nodes needs to acknowledge the write. So both the nodes always have to be available. Even if one node fails the writes will start to fail. Event another node can take hints here, but your reads with quorum are bound to fail. So fault tolerance of node failure is equal to zero in this situation.
You could use local_quorum to speed up the writes instead of quorum. Its sacrifice of consistency over speed. Welcome to "eventually consistency".
Consistency Level Determine the number of replicas on which the write must succeed before returning an acknowledgment to the client application
Even at low consistency levels, the write is still sent to all replicas for the written key, even replicas in other data centers. The consistency level just determines how many replicas are required to respond that they received the write.
Source : http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
So If you set Consistency Level to QUORUM. I assume each DC have RF of 2. And so QUORUM is 3. So all your write still send all replicas of each DC (3 * 2 = 6 node) And will wait for 3 node to success after that it will send the acknowledgment to the client
I am new to Cassandra and at work I have a 4 node cluster.
nodetool gossipinfo tells me that there are one datacentre, 2 racks and 2 nodes in each rack. Replication factor is defined as 2. nodetool ring tell me that each node has 50% ownership. There are 2 seed nodes in our config. Each rack has 1 seed node.
Does this mean that for each rack, there is one seed node and its replicated node. If that is the case then why is datasize not the same for seed node and its replicated node.
what happens if one node goes down. Will it have any impact on the data availability of the cluster.
Seeds
Seeds nodes are only special in the way that new nodes that join the cluster contact the seed nodes to find out about other nodes and the topology of the ring. But in Cassandra, all nodes are the same, i.e. there are no master or slave, no primary or secondary node. Because of this, you can elect any (or all) node as the seed.
Since seeds only relate to gossip information, it does not have anything to do with replicated data.
Size
In relation to data size, each node will never be exactly the same since each partition/row size is never the same. If you look at the nodetool cfstats output, you will see that there is a big range between minimum and maximum sizes.
Availability
If the reads are done with a consistency level CL=ONE, then if a node is down the other replica will continue to serve requests. But if reads are done with a higher consistency, then reads will fail since it needs 2 nodes to be available, i.e. CL=LOCAL_QUORUM requires [ RF/2 + 1 ] nodes to respond.
EDIT: Response to:
Shouldn't each node own 25%?
Ownership
In Cassandra, data is not "distributed" across ALL nodes in ALL DCs. In fact, a DC is a copy of another DC depending on the replication factor.
To illustrate, consider the following keyspace definition:
CREATE KEYSPACE "myKS"
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'DC1' : 2,
'DC2' : 2};
Based on this definition, it means that the myKS keyspace has 2 replicas in DC1 and 2 replicas in DC2. Since each of your data centres only have 2 nodes, this effectively means that each DC is a copy of each other.
Following from that, since the tokens are split between 2 nodes, each node owns half of the data which is 50%. So in DC1, each node owns 50% and in DC2 (which is a copy of DC1) each node also owns 50%.