My Architecture is as follows:
Two data centers DC1 and DC2 in my test cluster named Test Cluster1, DC1 has two spark nodes and DC2 has two transactional(data) nodes, that is i have 4 nodes in my cluster.Then my doubt is that is it possible to set a replication factor of 3 to DC1 or DC2.
No, as RF is set per DC, then you can't RF higher than you have nodes in this particular DC - you need to use NetworkTopologyStrategy as this:
{'class':'NetworkTopologyStrategy', 'DC1':2, 'DC2':2 }
Replication factor should be equal or less with the number of nodes present in a data center.as you have 2-2 nodes in each dc, you can set 2 as RF while altering the Keyspace in below manner
cqlsh>ALTER KEYSPACE WITH replication = {'class': 'NetworkTopologyStrategy' , DC1:2, DC2:2};
Related
Lets say I have two Data Centers(DC1, DC2) in a Single Cassandra cluster.
DC1 - 4 nodes.
DC2 - 4 nodes.
Initially i have set the replication factor for all the keyspaces to be {DC1:2 , DC2:2}.(Network topology strategy)
But After some time lets say I alter the keyspace and change the replication factor to {DC:2} for all the keyspaces.(removing DC1).No replication factor for DC1.
So now what will happen? Will DC1 get any data written into it in the future?
Will all the token ranges be assigned to only DC2?
If you exclude DC1 - it won't get data written for that keyspace, nor data will be read from the DC1. Before switching off DC1, make sure that you perform nodetool repair on the serves in DC2, to make sure that you have all data synchronized. After changing RF, you
When you change RF for specific keyspace, drivers and Cassandra itself recalculate the token ranges assignments taking into account information about data centers.
What is replication factor in cassandra and how does it affect single DC or multiple DC nodes ?
Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica
When creating keyspace, you need to specify the replication factor on each DC.
Example Single DC with SimpleStrategy:
CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
Here we specify replication_factor 3 means, Each row will be placed on three different node.
Example Multi DC :
CREATE KEYSPACE Excalibur WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
This example sets three replicas for a data center named dc1 and two replicas for a data center named dc2
Source : https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureDataDistributeReplication_c.html
I am newbie to cassandra.
What exactly replication factor in cassandra means?
For example,
I have 3 node cluster(node1,node2,node3) and If I create keyspace with replication factor 1,and insert data through node1,Can I read the data from other 2 nodes?
Or It will store the data in node1. Is data available in other 2 nodes for read/write operations?
The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. You should be able to read/write data from the other two nodes, depending on ports and firewalls between nodes.
I have a 3 nodes Cassandra cluster with 2 keyspaces. One of them has replication factor 1 and the other, replication factor 2. I want to reduce the cluster, using nodetool decommission, to remove 2 nodes and leave only one (single node cluster).
So, what must I do with the replication factor? I think both keyspaces must have replication factor 1, but when must I modify it? Before decommission?
Thanks a lot!
You must reduce the RF to 1 before the decommissioning and you must also run a repair on the keyspace you are reducing the RF on to be on the safe side. Then you can proceed with the decommissioning in a sequential manner.
You will need to reduce the Replication Factor to 1 and you should do this before you decommission the 2 nodes.
Does Cassandra support one direction replication? Say I have 2 DCs, DC1 and DC2. Real time data is being written only in DC1 and asynch replication happens in DC2. Is there a way now if I do some write on same data in DC2, it does not get replicated in DC1?
There is no concept of one way replication. If your replication factor is 2 then it will replicate data in any two nodes. You are using DC1 and DC2 then you have to use the "NetworkTopologyStrategy" and define the replication factor for each DC. Your problem will automatically resolve using "Snitch" tool to decided data store in different nodes in both DC's.
This feature is available when you create a keyspace
Let's say you want the keyspace 1 to be replicated on both datacenters and keyspace 2 on one datacenter:
This will replicate your data on one datacenter:
CREATE KEYSPACE keyspace1 WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
And this on both datacenters :
CREATE KEYSPACE keyspace2
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1, 'datacenter2' : 1};
There is no concept of one way replication. You have a few options:
1) use low consistency levels (LOCAL_*) when writing on writes to DC2 so the app doesn't block to replicate to DC1
2) keep the dcs in separate rings, and bulk load a synchronously with stable loader