Removed DC2 from replication but still able query date from DC2 nodes - cassandra

I am trying to remove a DC2 from my Cassandra Cluster. For this I start with altering replication factor from 2 to 0 in DC2. I try to insert a row in DC1 node1 and I am still receiving this row while queried from DC2 nodes.
Why is this happening?

I'm assuming that you're querying the data with cqlsh. By default, it uses a consistency of ONE so it will query any replica. In your case, they all happen to be in DC1.
If you try to query with a local consistency then you will probably get the result (or lack of) which I think you're expecting.
As a side note, although setting replication to 0 is technically valid, it is more customary to simply remove a DC completely from replication so you end up with:
ALTER KEYSPACE some_ks WITH REPLICATION = { \
'class' : 'NetworkTopologyStrategy', \
'DC1' : 3
}

Related

Cassandra Replication Factor

Lets say I have two Data Centers(DC1, DC2) in a Single Cassandra cluster.
DC1 - 4 nodes.
DC2 - 4 nodes.
Initially i have set the replication factor for all the keyspaces to be {DC1:2 , DC2:2}.(Network topology strategy)
But After some time lets say I alter the keyspace and change the replication factor to {DC:2} for all the keyspaces.(removing DC1).No replication factor for DC1.
So now what will happen? Will DC1 get any data written into it in the future?
Will all the token ranges be assigned to only DC2?
If you exclude DC1 - it won't get data written for that keyspace, nor data will be read from the DC1. Before switching off DC1, make sure that you perform nodetool repair on the serves in DC2, to make sure that you have all data synchronized. After changing RF, you
When you change RF for specific keyspace, drivers and Cassandra itself recalculate the token ranges assignments taking into account information about data centers.

Insert rows only in one datacenter in cassandra cluster

For some test purposes I want to break a consistency of data in my test cassandra cluster, consisting of two datacenters.
I assumed that if I use a consistency level equal to LOCAL_QUORUM, or LOCAL_ONE I will achieve this. Let us say I have a cassandra node node11 belonging to DC1:
cqlsh node11
CONSISTENCY LOCAL_QUORUM;
INSERT INTO test.test (...) VALUES (...) ;
But in fact, data appears in all nodes. I can read it from the node22 belonging to the DC2 even with the consistency level LOCAL_*. I've double checked: the nodetool shows me the two datacenters and node11 certainly belongs to the DC1, while node22 belongs to the DC2.
My keyspace test is configured as follows:
CREATE KEYSPACE "test"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2};
and I have two nodes in each DC respectively.
My questions:
It seems to me that I wrongly understand the idea of these consistency level. In fact they do not prevent from writing data to the different DC's, but just ask for appearing of the data at least in the current datacenter. Is it correct understanding?
More essentially: is any way to perform such a trick and achieve such a "broken" consistency, when I have a different data stored in two datacenters within one cluster?
(At the moment I think that the only one way to achieve that - is to break the ring and do not allow nodes from one DC know anything about nodes from another DC, but I don't like this solution).
LOCAL_QUORUM, this consistency level requires a quorum of acknoledgement received from the local DC but all the data are sent to all the nodes defined in the keyspace.
Even at low consistency levels, the write is still sent to all
replicas for the written key, even replicas in other data centers. The
consistency level just determines how many replicas are required to
respond that they received the write.
https://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
I don't think there is proper way to do that
This suggestion is to test scenario only to break data consistency between 2 DCs. (haven't tried but based on my understanding should work)
Write data in one DC (say DC1) with Local* consistency
Before write, keep nodes in DC2 down so DC1 will store hints as DC2 nodes are down.
Let max_hint_window_in_ms (3 hours by default - and you can reduce it) time pass so that DC1 coordinator will delete all the hints
Start DC2 nodes and query with LOCAL* query, the data from DC1 won't be present in DC2.
You can repeat these steps and insert data in DC2 with different values keeping DC1 down so same data will have different values in DC1 and DC2.

Different results for same query in different Cassandra nodes

I have 3 cassandra nodes, when I execute a query, 2 nodes are giving same response but 1 node is giving different response
Suppose I executed following query
select * from employee;
Node1 and Node2 are giving 2 rows but Node3 is giving 0 rows(empty response)
How to solve this issue
1.You are not using Network topology.
2.Your replication factor is 2.
Simple strategy : Use only for a single datacenter and one rack. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or datacenter location).
Go to this link :
https://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archDataDistributeReplication.html
I did the following steps, then problem was solved and now the data is in sync in all the 3 nodes
run the command nodetool rebuild on the instances and also
update 'replication_factor': '2' to 'replication_factor': '3'

Synchronizing keyspaces in new cassandra datacenter

I have a question about a potential scenario and wanted to know if our assumption is correct. (using cassandra 3.x with DSE 5.x)
We've learned from the docs that in order to add a new (and fresh) datacenter to a cluster, we need to temporarily set ReplicationFactor like so:
{'class' : 'NetworkTopologyStrategy', 'DC1' : 3, 'DC2' : 0 }
Where DC1 is the currently running datacenter and DC2 is the one we are adding.
This test helped us understand the impact of the streaming of data from an existing live ring to a brand new one.
Now to our hypothetical scenario, which is to be able to start replicating a keyspace that was initially only replicated to one DC, to now save to other currently running DCs.
When creating the keyspace:
CREATE KEYSPACE Foo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'US' : 2, 'EU' : 0};
Then, when business requirements change:
ALTER KEYSPACE Foo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'US' : 2, 'EU' : 2};
Is it considered safer to define all new keyspaces in an application with all DCs to 0, so that the value can be modified at some point. And would changing that replication factor be enough to trigger the streaming of the keyspace to the other datacenters - or do we also need to run nodetool rebuild?
The accepted practice is to simply not define a replication factor for a DC that you don't want a particular keyspace to replicate to. I don't think that anything bad would happen if you did it your way, but I feel that not defining it would be the safer way to go.
would changing that replication factor be enough to trigger the streaming of the keyspace to the other datacenters - or do we also need to run nodetool rebuild?
Altering the replication factor on the keyspace will tell all future writes to that keyspace to go to also the new data center. However, for the existing data to replicate to the new data center you will have to run a nodetool repair or nodetool rebuild.

Cassandra one way replication

Does Cassandra support one direction replication? Say I have 2 DCs, DC1 and DC2. Real time data is being written only in DC1 and asynch replication happens in DC2. Is there a way now if I do some write on same data in DC2, it does not get replicated in DC1?
There is no concept of one way replication. If your replication factor is 2 then it will replicate data in any two nodes. You are using DC1 and DC2 then you have to use the "NetworkTopologyStrategy" and define the replication factor for each DC. Your problem will automatically resolve using "Snitch" tool to decided data store in different nodes in both DC's.
This feature is available when you create a keyspace
Let's say you want the keyspace 1 to be replicated on both datacenters and keyspace 2 on one datacenter:
This will replicate your data on one datacenter:
CREATE KEYSPACE keyspace1 WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
And this on both datacenters :
CREATE KEYSPACE keyspace2
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1, 'datacenter2' : 1};
There is no concept of one way replication. You have a few options:
1) use low consistency levels (LOCAL_*) when writing on writes to DC2 so the app doesn't block to replicate to DC1
2) keep the dcs in separate rings, and bulk load a synchronously with stable loader

Resources