Replication factor in Cassandra - cassandra

I am newbie to cassandra.
What exactly replication factor in cassandra means?
For example,
I have 3 node cluster(node1,node2,node3) and If I create keyspace with replication factor 1,and insert data through node1,Can I read the data from other 2 nodes?
Or It will store the data in node1. Is data available in other 2 nodes for read/write operations?

The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. You should be able to read/write data from the other two nodes, depending on ports and firewalls between nodes.

Related

Can Cassandra cluster have even number of nodes?

Currently running a 3 node cluster with replication factor 3 on the keyspaces. Need to add more nodes to the cluster as the size of each node is approaching 2TB.
Can I add just 1 more node to the cluster and have a 4 node cluster or does the cluster always need to have odd number of nodes? Using a consistency level of ONE currently for both read and write.
You can have as many nodes in the cluster as you want, particularly if you are not using the racks feature in Cassandra (all nodes are in the same logical C* rack).
If you are using C* racks, our recommendation is to have an equal number of nodes in each rack so the load distribution is balanced across the racks in each DC.
For example, if your app keyspaces have a replication factor of 3 and you have 3 racks then the number of nodes in the DC should be in multiples of the replication factor -- 3, 6, 9, 12 and so on. This would allow you to configure the same number of nodes in each rack.
This isn't a hard requirement but is best practice so nodes have an equal amount of load and data on them. Cheers!
You can have even number of nodes in a Cassandra cluster. So you can add another node to the cluster. If you are using vnodes, then it will be easier, otherwise a lot of work needs to be done to balance the cluster.
One more thing, reading and writing with consistency level ONE decreases the consistency. If it suits your usecase then it is fine but general recommendation is to use QUORUM on the production system.

Cassandra Replication Factor

Lets say I have two Data Centers(DC1, DC2) in a Single Cassandra cluster.
DC1 - 4 nodes.
DC2 - 4 nodes.
Initially i have set the replication factor for all the keyspaces to be {DC1:2 , DC2:2}.(Network topology strategy)
But After some time lets say I alter the keyspace and change the replication factor to {DC:2} for all the keyspaces.(removing DC1).No replication factor for DC1.
So now what will happen? Will DC1 get any data written into it in the future?
Will all the token ranges be assigned to only DC2?
If you exclude DC1 - it won't get data written for that keyspace, nor data will be read from the DC1. Before switching off DC1, make sure that you perform nodetool repair on the serves in DC2, to make sure that you have all data synchronized. After changing RF, you
When you change RF for specific keyspace, drivers and Cassandra itself recalculate the token ranges assignments taking into account information about data centers.

What is Cassandra system_auth replication factor 2 means?

As i read and understood from official cassandra document and from other posts here when we configure system_auth replication factor is 1.
But i would like to understood, how the system_auth replication works if i configure value as system_auth replication = 2?
which two nodes will maintain replicas?
There will be two copies of the system_auth keyspace spread across ALL of your nodes. That way, if one goes down, the data is still available on another node. Different entries to system_auth may be stored on different nodes, but there will always be two copies.
If your replication factor = the number of nodes, then each node will hold all the system_auth data. If your replication factor > number of nodes, you are gaining nothing, since all nodes already have a full copy of the data, no extra safety here. If your replication factor < number of nodes, no node will hold a complete copy of the data, but it will hold a portion of it.
Here system_auth replication = 2 means data of system_auth will be replicated on 2 nodes(total 2 copy of data) on cluster. if one node goes down then you can also able to login and authenticate the node.
you may increase the replication factor as well.

Deciding Optimal number of cassandra nodes with a seed nodes of 3 and replication factor 3

I am working on creating a cassandra cluster.
Our system is write heavey and planning use 3 seed nodes and total of 10 cassandra nodes (including 3 seed nodes).
We are using replication factor of 3 and consistency level QUORUM.
Is there any consideration of odd/even number of cassandra nodes based on replication factor / no of seed nodes.?
The number of seed nodes is unrelated to the replication factor. The seeds are used when a new node joins the cluster. New nodes consult the seeds to get their initial configuration and learn the addresses of the other nodes. You need 2-3 seeds to provide redundancy, that's all.
The replication factor indicates how many nodes have copies of the data as you probably know. RF=3 means three nodes have copies of data. Consistency level QUORUM means that 2 nodes need to reply to the coordinator (because 2 is a quorum of 3). This has nothing to do with the number of nodes in the cluster, as long as you have more than 3 nodes for RF=3! Even/odd doesn't matter, number of seeds doesn't matter.

What to do with replication factor when reducing a Cassandra cluster?

I have a 3 nodes Cassandra cluster with 2 keyspaces. One of them has replication factor 1 and the other, replication factor 2. I want to reduce the cluster, using nodetool decommission, to remove 2 nodes and leave only one (single node cluster).
So, what must I do with the replication factor? I think both keyspaces must have replication factor 1, but when must I modify it? Before decommission?
Thanks a lot!
You must reduce the RF to 1 before the decommissioning and you must also run a repair on the keyspace you are reducing the RF on to be on the safe side. Then you can proceed with the decommissioning in a sequential manner.
You will need to reduce the Replication Factor to 1 and you should do this before you decommission the 2 nodes.

Resources