Cassandra multi-DC:write on LOCAL and read from any DC - cassandra

We use a multi-data center (DC) cassandra cluster. During write on to the cluster, I want only LOCAL DC to perform writes on its nodes as we are already routing the write requests to the desired DC only based on the source from where write is initiated. So, I want only LOCAL DC to process the write and no other DC to perform the writes on its nodes. But later on by virtue of replication among nodes across DCs, I want the written data to be replicated across DCs. Is this replication across DCs possible when I am restricting the write to only one DC in the first place.If I do not open connections to REMOTE hosts lying in different DCs during my write operation, is data replication possible amongst DCs later on. Why I definitely need replicas of data in all DCs is because during data read from cluster, we want the data to be read from any DC the read request falls on, not necessarily the LOCAL one.
Do anyone has solution to this?

You may want to use Local_Quorum consistency for writes if you want to perform them in only Local DC.
Check keyspace definition for the one you want these restriction. It should have class "Network topology" and RF in both DC. Something like this:
ALTER KEYSPACE <Keyspace_name> WITH REPLICATION =
{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
It states that after consistency is satisfied Cassandra will propagate the writes to another DC.
Use Quorum consistency for reads if they are not restricted to one DC but be aware that it might add bit latency because Cassandra has to read data from other data center as well.

Related

Cassandra Replication Factor

Lets say I have two Data Centers(DC1, DC2) in a Single Cassandra cluster.
DC1 - 4 nodes.
DC2 - 4 nodes.
Initially i have set the replication factor for all the keyspaces to be {DC1:2 , DC2:2}.(Network topology strategy)
But After some time lets say I alter the keyspace and change the replication factor to {DC:2} for all the keyspaces.(removing DC1).No replication factor for DC1.
So now what will happen? Will DC1 get any data written into it in the future?
Will all the token ranges be assigned to only DC2?
If you exclude DC1 - it won't get data written for that keyspace, nor data will be read from the DC1. Before switching off DC1, make sure that you perform nodetool repair on the serves in DC2, to make sure that you have all data synchronized. After changing RF, you
When you change RF for specific keyspace, drivers and Cassandra itself recalculate the token ranges assignments taking into account information about data centers.

Insert rows only in one datacenter in cassandra cluster

For some test purposes I want to break a consistency of data in my test cassandra cluster, consisting of two datacenters.
I assumed that if I use a consistency level equal to LOCAL_QUORUM, or LOCAL_ONE I will achieve this. Let us say I have a cassandra node node11 belonging to DC1:
cqlsh node11
CONSISTENCY LOCAL_QUORUM;
INSERT INTO test.test (...) VALUES (...) ;
But in fact, data appears in all nodes. I can read it from the node22 belonging to the DC2 even with the consistency level LOCAL_*. I've double checked: the nodetool shows me the two datacenters and node11 certainly belongs to the DC1, while node22 belongs to the DC2.
My keyspace test is configured as follows:
CREATE KEYSPACE "test"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2};
and I have two nodes in each DC respectively.
My questions:
It seems to me that I wrongly understand the idea of these consistency level. In fact they do not prevent from writing data to the different DC's, but just ask for appearing of the data at least in the current datacenter. Is it correct understanding?
More essentially: is any way to perform such a trick and achieve such a "broken" consistency, when I have a different data stored in two datacenters within one cluster?
(At the moment I think that the only one way to achieve that - is to break the ring and do not allow nodes from one DC know anything about nodes from another DC, but I don't like this solution).
LOCAL_QUORUM, this consistency level requires a quorum of acknoledgement received from the local DC but all the data are sent to all the nodes defined in the keyspace.
Even at low consistency levels, the write is still sent to all
replicas for the written key, even replicas in other data centers. The
consistency level just determines how many replicas are required to
respond that they received the write.
https://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
I don't think there is proper way to do that
This suggestion is to test scenario only to break data consistency between 2 DCs. (haven't tried but based on my understanding should work)
Write data in one DC (say DC1) with Local* consistency
Before write, keep nodes in DC2 down so DC1 will store hints as DC2 nodes are down.
Let max_hint_window_in_ms (3 hours by default - and you can reduce it) time pass so that DC1 coordinator will delete all the hints
Start DC2 nodes and query with LOCAL* query, the data from DC1 won't be present in DC2.
You can repeat these steps and insert data in DC2 with different values keeping DC1 down so same data will have different values in DC1 and DC2.

Cassandra QUORUM write consistency level and multiple DC

I'm a bit confused about how QUORUM write selects nodes to write into in case of multiple DC.
Suppose, for example, that I have a 3 DC cluster with 3 nodes in each DC, and the replications factor is 2, so that the number of replicas needed to achieve QUORUM is 3. Note: this is just an example to help me formulate my question and not the actual configuration.
My question is the following: in case of write, how these 3 replicas will be distributed across all the DCs in my cluster? Is it possible that all 3 replicas will end up in the same DC?
The replication is defined at the key space level. So for example
create keyspace test with replication = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 2, 'DC2' : 2, 'DC3' : 2 };
As you can see clearly each DC will hold two copies of data for that keyspace and not more. You could have another key space in the same cluster defined only to replicate in one DC and not the other two. So its flexible.
Now for consistency, with 3 DCs and RF=2 in each DC, you have 6 copies of data. By definition of Quorum a majority (which is RF/2 + 1) of those 6 members needs to acknowledge the write, before claiming that the write was successful. So 4 nodes needs to respond for a quorum write here and these 4 members could be a combination of nodes from any DC. Remember the number of replicas matter to calculate quorum and not the total no. of nodes in DC.
On a side note, in Cassandra, RF=2 is as good as RF=1. To simplify, lets imagine a 3 node single DC situation. With RF=2 there are two copies of data and in order to achieve quorum ((RF=2)/2 + 1), 2 nodes needs to acknowledge the write. So both the nodes always have to be available. Even if one node fails the writes will start to fail. Event another node can take hints here, but your reads with quorum are bound to fail. So fault tolerance of node failure is equal to zero in this situation.
You could use local_quorum to speed up the writes instead of quorum. Its sacrifice of consistency over speed. Welcome to "eventually consistency".
Consistency Level Determine the number of replicas on which the write must succeed before returning an acknowledgment to the client application
Even at low consistency levels, the write is still sent to all replicas for the written key, even replicas in other data centers. The consistency level just determines how many replicas are required to respond that they received the write.
Source : http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
So If you set Consistency Level to QUORUM. I assume each DC have RF of 2. And so QUORUM is 3. So all your write still send all replicas of each DC (3 * 2 = 6 node) And will wait for 3 node to success after that it will send the acknowledgment to the client

Cassandra Read/Write CONSISTENCY Level in NetworkTopologyStrategy

I have setup cassandra in 2 data centers with 4 nodes each with replication factor of 2.
Consistency level is ONE (set by default)
I was facing consistency issue when trying to read data at consistency level of ONE.
As read in DataStax documentation, Consistency level (read + write) should be greater than replication factor.
I decided to change the write consistency level to TWO and read consistency level as ONE which resolves the inconsistency problem in single data center.
But in case of multiple data center, the problem would be resolved by consistency level as LOCAL_QUORUM.
How would i achieve that write should be (LOCAL_QUORUM + TWO) so that i should write to the local data center and also on 2 nodes.
Just write using LOCAL_QUORUM in the datacenter you want. If you have a replication factor of 2 in each of your datacenter then the data you are writing in the "local" datacenter will eventually be replicated in the "other" datacenter (but you have no guaranty of when).
LOCAL_QUORUM means: "after the write operation returns, data has been effectively writen on a quorum of nodes in the local datacenter"
TWO means: "after the write operation returns, data has been writen on at least 2 nodes in any of the datacenter"
If you want to read the data you have just written with LOCAL_QUORUM in the same datacenter, you should use LOCAL_ONE consistency. If you read with ONE, then there is a chance that the closest replica is in the "remote" datacenter and therefore not yet replicated by Cassandra.
This also depends on the load balancing strategy configured at the driver level. You can read more about this here: https://datastax.github.io/java-driver/manual/load_balancing/

Cassandra cluster with each node total replication

Hi I'm new to Cassandra. I have a 2 node Cassandra cluster. For reasons imposed by the front end I need...
Total replication of all data on each of the two nodes.
Eventual consistent writes. So the node being written to will respond with an acknowledge to the front end straight away. Not synchronized on the replication
Can anyone tell me is this possible? Is it done in the YAML file? I know there is properties there for consistency but I don't see that any of the Partitioners suit my needs. Where can I set the replication factor?
Thanks
You set the replication factor during creation of the keyspace. So if you use (and plan for the future on using) a single data center set-up, you create the keyspace using cqlsh like so
CREATE KEYSPACE "Excalibur"
WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 3};
Check out the documentation regarding the create keyspace. How this is handled internally is related to the snitch definition of the cluster and a strategy option defined per keyspace. In the case of the SimpleStrategy above, this simply assumes a ring topology of your cluster and places the data clockwise in that ring (see this).
Regarding consistency, you can very different levels of consistency for write and read operations in your client/driver during each operation:
Cassandra extends the concept of eventual consistency by offering tunable consistency―for any given read or write operation, the client application decides how consistent the requested data should be.
Read the doc
If you use Java in your clients, and the DatatStax Java driver, you can set the consistency level using
QueryOptions.setConsistencyLevel(ConsistencyLevel consistencyLevel)
"One" is the default setting.
Hope that helps

Resources