What to do with replication factor when reducing a Cassandra cluster? - cassandra

I have a 3 nodes Cassandra cluster with 2 keyspaces. One of them has replication factor 1 and the other, replication factor 2. I want to reduce the cluster, using nodetool decommission, to remove 2 nodes and leave only one (single node cluster).
So, what must I do with the replication factor? I think both keyspaces must have replication factor 1, but when must I modify it? Before decommission?
Thanks a lot!

You must reduce the RF to 1 before the decommissioning and you must also run a repair on the keyspace you are reducing the RF on to be on the safe side. Then you can proceed with the decommissioning in a sequential manner.

You will need to reduce the Replication Factor to 1 and you should do this before you decommission the 2 nodes.

Related

Can Cassandra cluster have even number of nodes?

Currently running a 3 node cluster with replication factor 3 on the keyspaces. Need to add more nodes to the cluster as the size of each node is approaching 2TB.
Can I add just 1 more node to the cluster and have a 4 node cluster or does the cluster always need to have odd number of nodes? Using a consistency level of ONE currently for both read and write.
You can have as many nodes in the cluster as you want, particularly if you are not using the racks feature in Cassandra (all nodes are in the same logical C* rack).
If you are using C* racks, our recommendation is to have an equal number of nodes in each rack so the load distribution is balanced across the racks in each DC.
For example, if your app keyspaces have a replication factor of 3 and you have 3 racks then the number of nodes in the DC should be in multiples of the replication factor -- 3, 6, 9, 12 and so on. This would allow you to configure the same number of nodes in each rack.
This isn't a hard requirement but is best practice so nodes have an equal amount of load and data on them. Cheers!
You can have even number of nodes in a Cassandra cluster. So you can add another node to the cluster. If you are using vnodes, then it will be easier, otherwise a lot of work needs to be done to balance the cluster.
One more thing, reading and writing with consistency level ONE decreases the consistency. If it suits your usecase then it is fine but general recommendation is to use QUORUM on the production system.

What is Cassandra system_auth replication factor 2 means?

As i read and understood from official cassandra document and from other posts here when we configure system_auth replication factor is 1.
But i would like to understood, how the system_auth replication works if i configure value as system_auth replication = 2?
which two nodes will maintain replicas?
There will be two copies of the system_auth keyspace spread across ALL of your nodes. That way, if one goes down, the data is still available on another node. Different entries to system_auth may be stored on different nodes, but there will always be two copies.
If your replication factor = the number of nodes, then each node will hold all the system_auth data. If your replication factor > number of nodes, you are gaining nothing, since all nodes already have a full copy of the data, no extra safety here. If your replication factor < number of nodes, no node will hold a complete copy of the data, but it will hold a portion of it.
Here system_auth replication = 2 means data of system_auth will be replicated on 2 nodes(total 2 copy of data) on cluster. if one node goes down then you can also able to login and authenticate the node.
you may increase the replication factor as well.

Replication factor in Cassandra

I am newbie to cassandra.
What exactly replication factor in cassandra means?
For example,
I have 3 node cluster(node1,node2,node3) and If I create keyspace with replication factor 1,and insert data through node1,Can I read the data from other 2 nodes?
Or It will store the data in node1. Is data available in other 2 nodes for read/write operations?
The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. You should be able to read/write data from the other two nodes, depending on ports and firewalls between nodes.

Deciding Optimal number of cassandra nodes with a seed nodes of 3 and replication factor 3

I am working on creating a cassandra cluster.
Our system is write heavey and planning use 3 seed nodes and total of 10 cassandra nodes (including 3 seed nodes).
We are using replication factor of 3 and consistency level QUORUM.
Is there any consideration of odd/even number of cassandra nodes based on replication factor / no of seed nodes.?
The number of seed nodes is unrelated to the replication factor. The seeds are used when a new node joins the cluster. New nodes consult the seeds to get their initial configuration and learn the addresses of the other nodes. You need 2-3 seeds to provide redundancy, that's all.
The replication factor indicates how many nodes have copies of the data as you probably know. RF=3 means three nodes have copies of data. Consistency level QUORUM means that 2 nodes need to reply to the coordinator (because 2 is a quorum of 3). This has nothing to do with the number of nodes in the cluster, as long as you have more than 3 nodes for RF=3! Even/odd doesn't matter, number of seeds doesn't matter.

Do i need decease my replication factor if replication > no.of nodes ( planning to decommission node)

Now presently we are having a Dc with 3 nodes and with a replication of 3, i am planning to decommission a node do i need to decrease my replication to 2 or just decommissioning node will adjust the data among the two nodes with a replication of 3??
Decommissioning a node will not cause your Cassandra cluster to break necessarily, but it will make it so that a few things will stop working.
A few things that will happen if you decommission the node but don't adjust the replication factor:
First, nothing about your replication factor will be changed just because you decommission a node. To do otherwise would cause chaos.
Queries (both read and write) that attempt to use ConsistencyLevel.ALL will fail, because they will not be able to get 3 machines to participate
Queries with ConsistencyLevel.QUORUM will be less available, because BOTH remaining machines will need to respond to queries to meet quorum.
Because you have 3 machines and a RF of 3, that means that every machine has a complete copy of the data. Decommission the node, update your replication factor, and then run nodetool repair on the remaining two nodes. After you do that, you should be good to go.
My 2 cents: I would suggest you to first change your replication to 2, run a repair on all nodes and then issue "nodetool decommission" from the node you want to decommission. There will be data moving around, but by doing it this way nothing should stop working.

Resources