I have created a keyspace in cassandra once using NetworkTopologyStrategy and next time using SimpleStrategy with the following syntax :
Keyspace definition:
CREATE KEYSPACE cw WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter16' : 1 };
CREATE KEYSPACE cw WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 1}
Output of bin/nodetool ring :
Datacenter: 16
==========
Address Rack Status State Load Owns Token
172.16.4.196 4 Up Normal 35.92 KB 100.00% 0
When i create one table in NetworkTopologyStrategy keyspace and do the select * query on the table. It returns the following error :
Unable to complete request: one or more nodes were unavailable
Whereas it works fine in SimpleStrategy keyspace why is it so? Can't we use the NetworkTopologyStrategy on single cassandra node cluster?
While everyone else is right, you are already using a different snitch as your data center name is '16'. In your keyspace definition, you have Datacenter: 16. That means data center name is actually '16'.
Try this:
CREATE KEYSPACE cw WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', '16' : 1 };
By default cassandra is configured to use SimpleSnitch.
SimpleSnitch does not recognize datacenter and rack information, hence can use only SimpleStrategy.
To Change the Snitch you have to edit following in cassandra.yaml
endpoint_snitch: CHANGE THIS TO WHATEVER YOU WANT
also you have to then change corresponding properties file to define datacenter and racks
You have to define a network-aware snitch in order to use NetworkTopologyStrategy. See this document for more information: http://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureSnitchPFSnitch_t.html
Related
I am total beginner with Cassandra.I have tried this
cqlsh> CREATE KEYSPACE cycling
... WITH REPLICATION = {'class' : 'SimpleStrategy', 'datacenter1' : 1 };
But datacenter1 is not recognized
ConfigurationException: Unrecognized strategy option {datacenter1} passed to SimpleStrategy for keyspace cycling
Why?
The SimpleStrategy doesn't support that option. The correct create statement would be:
CREATE KEYSPACE cycling WITH REPLICATION = {'class':'SimpleStrategy', 'replication_factor':1};
If you want to specify replication by datacenter then you need to use the NetworkTopologyStrategy, in which case the create statement would be:
CREATE KEYSPACE cycling WITH REPLICATION = {'class':'NetworkTopologyStrategy','datacenter1':1};
More information on this can be found here
What is replication factor in cassandra and how does it affect single DC or multiple DC nodes ?
Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica
When creating keyspace, you need to specify the replication factor on each DC.
Example Single DC with SimpleStrategy:
CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
Here we specify replication_factor 3 means, Each row will be placed on three different node.
Example Multi DC :
CREATE KEYSPACE Excalibur WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
This example sets three replicas for a data center named dc1 and two replicas for a data center named dc2
Source : https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureDataDistributeReplication_c.html
I have a question about a potential scenario and wanted to know if our assumption is correct. (using cassandra 3.x with DSE 5.x)
We've learned from the docs that in order to add a new (and fresh) datacenter to a cluster, we need to temporarily set ReplicationFactor like so:
{'class' : 'NetworkTopologyStrategy', 'DC1' : 3, 'DC2' : 0 }
Where DC1 is the currently running datacenter and DC2 is the one we are adding.
This test helped us understand the impact of the streaming of data from an existing live ring to a brand new one.
Now to our hypothetical scenario, which is to be able to start replicating a keyspace that was initially only replicated to one DC, to now save to other currently running DCs.
When creating the keyspace:
CREATE KEYSPACE Foo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'US' : 2, 'EU' : 0};
Then, when business requirements change:
ALTER KEYSPACE Foo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'US' : 2, 'EU' : 2};
Is it considered safer to define all new keyspaces in an application with all DCs to 0, so that the value can be modified at some point. And would changing that replication factor be enough to trigger the streaming of the keyspace to the other datacenters - or do we also need to run nodetool rebuild?
The accepted practice is to simply not define a replication factor for a DC that you don't want a particular keyspace to replicate to. I don't think that anything bad would happen if you did it your way, but I feel that not defining it would be the safer way to go.
would changing that replication factor be enough to trigger the streaming of the keyspace to the other datacenters - or do we also need to run nodetool rebuild?
Altering the replication factor on the keyspace will tell all future writes to that keyspace to go to also the new data center. However, for the existing data to replicate to the new data center you will have to run a nodetool repair or nodetool rebuild.
Does Cassandra support one direction replication? Say I have 2 DCs, DC1 and DC2. Real time data is being written only in DC1 and asynch replication happens in DC2. Is there a way now if I do some write on same data in DC2, it does not get replicated in DC1?
There is no concept of one way replication. If your replication factor is 2 then it will replicate data in any two nodes. You are using DC1 and DC2 then you have to use the "NetworkTopologyStrategy" and define the replication factor for each DC. Your problem will automatically resolve using "Snitch" tool to decided data store in different nodes in both DC's.
This feature is available when you create a keyspace
Let's say you want the keyspace 1 to be replicated on both datacenters and keyspace 2 on one datacenter:
This will replicate your data on one datacenter:
CREATE KEYSPACE keyspace1 WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
And this on both datacenters :
CREATE KEYSPACE keyspace2
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1, 'datacenter2' : 1};
There is no concept of one way replication. You have a few options:
1) use low consistency levels (LOCAL_*) when writing on writes to DC2 so the app doesn't block to replicate to DC1
2) keep the dcs in separate rings, and bulk load a synchronously with stable loader
Is is possible to have different replication settings on different nodes of the same cluster?
(All DCs have same keyspace/tables, but different replication settings)
We would like to have DC1 and DC2 collecting sensor data on different geographical locations, and sending these to a DC3. So DC3 contains all data from DC1 + DC2.
However, DC1 and DC2 should not contain each other's data (only data which was written by local clients).
Can this be achieved in Cassandra by having different keyspace replication settings on the DCs?
On DC1: 'DC1':1, 'DC3':1
On DC2: 'DC2':1, 'DC3':1
On DC3: 'DC3':1
You can't really do this with NetworkTopologyStrategy. Depending on how much effort you want to put into this you could implement your own replication strategy. I don't think this is very common, but Cassandra does allow it and it likely wouldn't be too difficult to implement what you want (take a look at NTS's implementation as an example).
If you don't want to implement your own strategy I would recommend creating 2 keyspaces with the following configuration:
CREATE KEYSPACE keyspace1
WITH replication = {
'class' : 'NetworkTopologyStrategy',
'DC1' : 1,
'DC3' : 1
};
CREATE KEYSPACE keyspace2
WITH replication = {
'class' : 'NetworkTopologyStrategy',
'DC2' : 1,
'DC3' : 1
};
and then depending on the location of your client you would use either keyspace.