Cassandra different replication factor across cluster - cassandra

Is is possible to have different replication settings on different nodes of the same cluster?
(All DCs have same keyspace/tables, but different replication settings)
We would like to have DC1 and DC2 collecting sensor data on different geographical locations, and sending these to a DC3. So DC3 contains all data from DC1 + DC2.
However, DC1 and DC2 should not contain each other's data (only data which was written by local clients).
Can this be achieved in Cassandra by having different keyspace replication settings on the DCs?
On DC1: 'DC1':1, 'DC3':1
On DC2: 'DC2':1, 'DC3':1
On DC3: 'DC3':1

You can't really do this with NetworkTopologyStrategy. Depending on how much effort you want to put into this you could implement your own replication strategy. I don't think this is very common, but Cassandra does allow it and it likely wouldn't be too difficult to implement what you want (take a look at NTS's implementation as an example).
If you don't want to implement your own strategy I would recommend creating 2 keyspaces with the following configuration:
CREATE KEYSPACE keyspace1
WITH replication = {
'class' : 'NetworkTopologyStrategy',
'DC1' : 1,
'DC3' : 1
};
CREATE KEYSPACE keyspace2
WITH replication = {
'class' : 'NetworkTopologyStrategy',
'DC2' : 1,
'DC3' : 1
};
and then depending on the location of your client you would use either keyspace.

Related

use of replication_factor in cassandra

I have this code:
CREATE KEYSPACE “KeySpace Name”
WITH replication = {'class': ‘Strategy name’, 'replication_factor' : ‘No.Of replicas’}
What is the use of 'replication_factor' in the above Cassandra query?
The replication factor is an integer, it determines how many time your data is replicated across your cluster. You usually want to replicate your data to achieve high availability. Of course, it comes at the cost of extra storage. Replication factor (RF) of 3 is by far the most common value. This works if you have at least 3 nodes in your cluster.

How to disable Cassandra Replication with keyspace

I have a Cassandra DB and a keyspace with some Tables which i do not want to repplicate.
I know, a key feature of cassandra is the replication, but i do not want to replicate.
I have 3 DataCenters: dc1, dc2, dc3
currently i'am creating the Keyspace like this on each DC:
CREATE KEYSPACE IF NOT EXISTS myKeyspace
WITH replication={'class':'NetworkTopologyStrategy', 'dc1': '1'};
As i understood this means dc1 will be replicated to one of the three other DCs?
How should this look like if i do not want to replicate?
CREATE KEYSPACE IF NOT EXISTS myKeyspace WITH replication={'class':'NetworkTopologyStrategy', 'dc1': '1'};
This means that you have replication factor 1 on dc1. So what have you have currently is what you want. Replication factor of 1 will mean only one node will hold the data and it won't be replicated anywhere else. The number is not for how many copies but for the number of nodes that hold the data.
If you wanted it replicated to other dcs it would be something like this:
CREATE KEYSPACE IF NOT EXISTS myKeyspace WITH replication={'class':'NetworkTopologyStrategy', 'dc1': '1', 'dc2': '1', 'dc3': '3'};
Meaning dc1 will have data on 1 node, dc2 will have data on 1 node and dc1 will have the data on 3 nodes

How can we understand the concept of replication factor in cassandra?

What is replication factor in cassandra and how does it affect single DC or multiple DC nodes ?
Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica
When creating keyspace, you need to specify the replication factor on each DC.
Example Single DC with SimpleStrategy:
CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
Here we specify replication_factor 3 means, Each row will be placed on three different node.
Example Multi DC :
CREATE KEYSPACE Excalibur WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
This example sets three replicas for a data center named dc1 and two replicas for a data center named dc2
Source : https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureDataDistributeReplication_c.html

Synchronizing keyspaces in new cassandra datacenter

I have a question about a potential scenario and wanted to know if our assumption is correct. (using cassandra 3.x with DSE 5.x)
We've learned from the docs that in order to add a new (and fresh) datacenter to a cluster, we need to temporarily set ReplicationFactor like so:
{'class' : 'NetworkTopologyStrategy', 'DC1' : 3, 'DC2' : 0 }
Where DC1 is the currently running datacenter and DC2 is the one we are adding.
This test helped us understand the impact of the streaming of data from an existing live ring to a brand new one.
Now to our hypothetical scenario, which is to be able to start replicating a keyspace that was initially only replicated to one DC, to now save to other currently running DCs.
When creating the keyspace:
CREATE KEYSPACE Foo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'US' : 2, 'EU' : 0};
Then, when business requirements change:
ALTER KEYSPACE Foo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'US' : 2, 'EU' : 2};
Is it considered safer to define all new keyspaces in an application with all DCs to 0, so that the value can be modified at some point. And would changing that replication factor be enough to trigger the streaming of the keyspace to the other datacenters - or do we also need to run nodetool rebuild?
The accepted practice is to simply not define a replication factor for a DC that you don't want a particular keyspace to replicate to. I don't think that anything bad would happen if you did it your way, but I feel that not defining it would be the safer way to go.
would changing that replication factor be enough to trigger the streaming of the keyspace to the other datacenters - or do we also need to run nodetool rebuild?
Altering the replication factor on the keyspace will tell all future writes to that keyspace to go to also the new data center. However, for the existing data to replicate to the new data center you will have to run a nodetool repair or nodetool rebuild.

Cassandra one way replication

Does Cassandra support one direction replication? Say I have 2 DCs, DC1 and DC2. Real time data is being written only in DC1 and asynch replication happens in DC2. Is there a way now if I do some write on same data in DC2, it does not get replicated in DC1?
There is no concept of one way replication. If your replication factor is 2 then it will replicate data in any two nodes. You are using DC1 and DC2 then you have to use the "NetworkTopologyStrategy" and define the replication factor for each DC. Your problem will automatically resolve using "Snitch" tool to decided data store in different nodes in both DC's.
This feature is available when you create a keyspace
Let's say you want the keyspace 1 to be replicated on both datacenters and keyspace 2 on one datacenter:
This will replicate your data on one datacenter:
CREATE KEYSPACE keyspace1 WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
And this on both datacenters :
CREATE KEYSPACE keyspace2
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1, 'datacenter2' : 1};
There is no concept of one way replication. You have a few options:
1) use low consistency levels (LOCAL_*) when writing on writes to DC2 so the app doesn't block to replicate to DC1
2) keep the dcs in separate rings, and bulk load a synchronously with stable loader

Resources