How can we understand the concept of replication factor in cassandra? - cassandra

What is replication factor in cassandra and how does it affect single DC or multiple DC nodes ?

Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica
When creating keyspace, you need to specify the replication factor on each DC.
Example Single DC with SimpleStrategy:
CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
Here we specify replication_factor 3 means, Each row will be placed on three different node.
Example Multi DC :
CREATE KEYSPACE Excalibur WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
This example sets three replicas for a data center named dc1 and two replicas for a data center named dc2
Source : https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureDataDistributeReplication_c.html

Related

use of replication_factor in cassandra

I have this code:
CREATE KEYSPACE “KeySpace Name”
WITH replication = {'class': ‘Strategy name’, 'replication_factor' : ‘No.Of replicas’}
What is the use of 'replication_factor' in the above Cassandra query?
The replication factor is an integer, it determines how many time your data is replicated across your cluster. You usually want to replicate your data to achieve high availability. Of course, it comes at the cost of extra storage. Replication factor (RF) of 3 is by far the most common value. This works if you have at least 3 nodes in your cluster.

How to set Replication Factor in multi data center Datasatx cassandra

My Architecture is as follows:
Two data centers DC1 and DC2 in my test cluster named Test Cluster1, DC1 has two spark nodes and DC2 has two transactional(data) nodes, that is i have 4 nodes in my cluster.Then my doubt is that is it possible to set a replication factor of 3 to DC1 or DC2.
No, as RF is set per DC, then you can't RF higher than you have nodes in this particular DC - you need to use NetworkTopologyStrategy as this:
{'class':'NetworkTopologyStrategy', 'DC1':2, 'DC2':2 }
Replication factor should be equal or less with the number of nodes present in a data center.as you have 2-2 nodes in each dc, you can set 2 as RF while altering the Keyspace in below manner
cqlsh>ALTER KEYSPACE WITH replication = {'class': 'NetworkTopologyStrategy' , DC1:2, DC2:2};

How to disable Cassandra Replication with keyspace

I have a Cassandra DB and a keyspace with some Tables which i do not want to repplicate.
I know, a key feature of cassandra is the replication, but i do not want to replicate.
I have 3 DataCenters: dc1, dc2, dc3
currently i'am creating the Keyspace like this on each DC:
CREATE KEYSPACE IF NOT EXISTS myKeyspace
WITH replication={'class':'NetworkTopologyStrategy', 'dc1': '1'};
As i understood this means dc1 will be replicated to one of the three other DCs?
How should this look like if i do not want to replicate?
CREATE KEYSPACE IF NOT EXISTS myKeyspace WITH replication={'class':'NetworkTopologyStrategy', 'dc1': '1'};
This means that you have replication factor 1 on dc1. So what have you have currently is what you want. Replication factor of 1 will mean only one node will hold the data and it won't be replicated anywhere else. The number is not for how many copies but for the number of nodes that hold the data.
If you wanted it replicated to other dcs it would be something like this:
CREATE KEYSPACE IF NOT EXISTS myKeyspace WITH replication={'class':'NetworkTopologyStrategy', 'dc1': '1', 'dc2': '1', 'dc3': '3'};
Meaning dc1 will have data on 1 node, dc2 will have data on 1 node and dc1 will have the data on 3 nodes

Cassandra one way replication

Does Cassandra support one direction replication? Say I have 2 DCs, DC1 and DC2. Real time data is being written only in DC1 and asynch replication happens in DC2. Is there a way now if I do some write on same data in DC2, it does not get replicated in DC1?
There is no concept of one way replication. If your replication factor is 2 then it will replicate data in any two nodes. You are using DC1 and DC2 then you have to use the "NetworkTopologyStrategy" and define the replication factor for each DC. Your problem will automatically resolve using "Snitch" tool to decided data store in different nodes in both DC's.
This feature is available when you create a keyspace
Let's say you want the keyspace 1 to be replicated on both datacenters and keyspace 2 on one datacenter:
This will replicate your data on one datacenter:
CREATE KEYSPACE keyspace1 WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
And this on both datacenters :
CREATE KEYSPACE keyspace2
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1, 'datacenter2' : 1};
There is no concept of one way replication. You have a few options:
1) use low consistency levels (LOCAL_*) when writing on writes to DC2 so the app doesn't block to replicate to DC1
2) keep the dcs in separate rings, and bulk load a synchronously with stable loader

Cassandra different replication factor across cluster

Is is possible to have different replication settings on different nodes of the same cluster?
(All DCs have same keyspace/tables, but different replication settings)
We would like to have DC1 and DC2 collecting sensor data on different geographical locations, and sending these to a DC3. So DC3 contains all data from DC1 + DC2.
However, DC1 and DC2 should not contain each other's data (only data which was written by local clients).
Can this be achieved in Cassandra by having different keyspace replication settings on the DCs?
On DC1: 'DC1':1, 'DC3':1
On DC2: 'DC2':1, 'DC3':1
On DC3: 'DC3':1
You can't really do this with NetworkTopologyStrategy. Depending on how much effort you want to put into this you could implement your own replication strategy. I don't think this is very common, but Cassandra does allow it and it likely wouldn't be too difficult to implement what you want (take a look at NTS's implementation as an example).
If you don't want to implement your own strategy I would recommend creating 2 keyspaces with the following configuration:
CREATE KEYSPACE keyspace1
WITH replication = {
'class' : 'NetworkTopologyStrategy',
'DC1' : 1,
'DC3' : 1
};
CREATE KEYSPACE keyspace2
WITH replication = {
'class' : 'NetworkTopologyStrategy',
'DC2' : 1,
'DC3' : 1
};
and then depending on the location of your client you would use either keyspace.

Resources