Currently running a 3 node cluster with replication factor 3 on the keyspaces. Need to add more nodes to the cluster as the size of each node is approaching 2TB.
Can I add just 1 more node to the cluster and have a 4 node cluster or does the cluster always need to have odd number of nodes? Using a consistency level of ONE currently for both read and write.
You can have as many nodes in the cluster as you want, particularly if you are not using the racks feature in Cassandra (all nodes are in the same logical C* rack).
If you are using C* racks, our recommendation is to have an equal number of nodes in each rack so the load distribution is balanced across the racks in each DC.
For example, if your app keyspaces have a replication factor of 3 and you have 3 racks then the number of nodes in the DC should be in multiples of the replication factor -- 3, 6, 9, 12 and so on. This would allow you to configure the same number of nodes in each rack.
This isn't a hard requirement but is best practice so nodes have an equal amount of load and data on them. Cheers!
You can have even number of nodes in a Cassandra cluster. So you can add another node to the cluster. If you are using vnodes, then it will be easier, otherwise a lot of work needs to be done to balance the cluster.
One more thing, reading and writing with consistency level ONE decreases the consistency. If it suits your usecase then it is fine but general recommendation is to use QUORUM on the production system.
Related
I was wondering which configuration will be best suited for even distribution of data among nodes.
5 nodes with 3 racs (2 nodes(node 1,node4) on rac1 , 2 nodes on rac2 (node2,node4) , 1 node on rac3 (node3))
Replication factor 3 and Read / Write on Quorum
In this case I am wondering whether the node3 which is the only node in rac3 will have more data than other nodes since replication strategy suggests that replicas will be but in nodes on different rac.
6 nodes with 3 racs (2 nodes(node 1,node4) on rac1 , 2 nodes on rac2 (node2,node4) , 2 nodes on rac3 (node3, node6))
Replication factor 3 and Read / Write on Quorum
In this case data will be distributed equally among all nodes.
Want to know whether my understanding is correct or not?
In the case of 5 nodes across 3 racks, yes, one node will be under greater load/stress.
It's a good idea to scale the cluster in multiples of the rack count to keep the data balanced across nodes. For example, in a 3 rack cluster you should add 3 nodes each time you expand the cluster.
If you choose to use multiple racks the ideal rack count should be ≥ your chosen replication factor. This allows Cassandra to store each replica in a separate rack.
In the case of a rack outage the other replicas would be still available.
For example, with RF=3 and 3 racks and queries at QUORUM, you can sustain the failure of a single rack. Whereas, with RF=3 and 2 racks at QUORUM, there is no guarantee that 2 replicas will still be available in the case of a rack failure.
Racks are for informing Cassandra about fault domains. If your running in your own data center, as the name implies, racks should be assigned based on the rack the node is located in. If you're running in the cloud, the best option is to map racks to AWS availability zones (or whatever is equivalent for your provider).
Yes, you should use 6 nodes to make sure you have equal number of nodes in each rack - having equal number of nodes in each rack is a basic requirement when going with multiple racks.
But, do you really need to have multiple racks ? because it makes scaling more difficult when you want to scale up as every time you need take care of the alternate node order and the data distribution.
In Cassandra, Multiple RACKs provide data continues availability in cassandra cluster for any disastrous situations. Cassandra recommends also the same in prod cluster. your both option is fine. however, you should go odd number of nodes in cassandra cluster.
I am working on creating a cassandra cluster.
Our system is write heavey and planning use 3 seed nodes and total of 10 cassandra nodes (including 3 seed nodes).
We are using replication factor of 3 and consistency level QUORUM.
Is there any consideration of odd/even number of cassandra nodes based on replication factor / no of seed nodes.?
The number of seed nodes is unrelated to the replication factor. The seeds are used when a new node joins the cluster. New nodes consult the seeds to get their initial configuration and learn the addresses of the other nodes. You need 2-3 seeds to provide redundancy, that's all.
The replication factor indicates how many nodes have copies of the data as you probably know. RF=3 means three nodes have copies of data. Consistency level QUORUM means that 2 nodes need to reply to the coordinator (because 2 is a quorum of 3). This has nothing to do with the number of nodes in the cluster, as long as you have more than 3 nodes for RF=3! Even/odd doesn't matter, number of seeds doesn't matter.
Now presently we are having a Dc with 3 nodes and with a replication of 3, i am planning to decommission a node do i need to decrease my replication to 2 or just decommissioning node will adjust the data among the two nodes with a replication of 3??
Decommissioning a node will not cause your Cassandra cluster to break necessarily, but it will make it so that a few things will stop working.
A few things that will happen if you decommission the node but don't adjust the replication factor:
First, nothing about your replication factor will be changed just because you decommission a node. To do otherwise would cause chaos.
Queries (both read and write) that attempt to use ConsistencyLevel.ALL will fail, because they will not be able to get 3 machines to participate
Queries with ConsistencyLevel.QUORUM will be less available, because BOTH remaining machines will need to respond to queries to meet quorum.
Because you have 3 machines and a RF of 3, that means that every machine has a complete copy of the data. Decommission the node, update your replication factor, and then run nodetool repair on the remaining two nodes. After you do that, you should be good to go.
My 2 cents: I would suggest you to first change your replication to 2, run a repair on all nodes and then issue "nodetool decommission" from the node you want to decommission. There will be data moving around, but by doing it this way nothing should stop working.
Installing Cassandra in a single node to run some tests, we noticed that we were using a RF of 3 and everything was working correctly.
This is of course because that node has 256 vnodes (by default) so the same data can be replicated in the same node in different vnodes.
This is worrying because if one node were to fail, you'd lose all your data even though you thought the data was replicated in different nodes.
How can I be sure that in a standard installation (with a ring with several nodes) the same data will not be replicated in the same "physical" node? Is there a setting to avoid Cassandra from using the same node for replicating data?
Replication strategy is schema dependent. You probably used the SimpleStrategy with RF=3 in your schema. That means that each piece of data will be placed on the node determined by the partition key, and successive replicas will be placed on the successive nodes. In your case, the successive node is the same physical node, hence you get 3 copies of your data there.
Increasing the number of nodes solves your problem. In general, your data will be placed in different physical nodes when your replication factor RF is less than/equal to your number of nodes N.
The other solution is to switch replication strategy and use the NetworkTopologyStrategy, usually used in multi datacenter clusters, and where you can specify how many replicas you want in each data center. This strategy
places replicas in the same data center by walking the ring clockwise
until reaching the first node in another rack. NetworkTopologyStrategy
attempts to place replicas on distinct racks because nodes in the same
rack (or similar physical grouping) often fail at the same time due to
power, cooling, or network issues.
Look at DataStax documentation for more information.
Without vnodes each physical node owns a single token range. With vnodes each physical node will own multiple, non-consecutive token ranges (aka a vnode), and furthermore vnodes are randomly assigned to physical nodes.
Which means that even when data gets replicated on the vnodes right next to the primary replica's node (i.e. when using SimpleStrategy) the replicas will - with high probability but not guaranteed - be on different physical nodes.
This random assignment can be seen in the output of nodetool ring.
More info can be found here.
Cassandra stores replicas on different nodes in the same keyspace. It would be nonsensical to have multiple replicas in the same keyspace. If the replication factor exceeds the number of nodes, than the number of nodes is your replication factor.
But, why is this not an error? Well, this allows for provisioning more nodes later.
As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later.
I have a Cassandra cluster with 2 nodes. I am using NetworkTopologyStrategy
I was trying to increase the replication factor of keyspace in Cassandra to 2. I did the following steps:
UPDATE KEYSPACE demo WITH strategy_options = {DC1:2,DC2:2}; on both the nodes
Then I ran the nodetool repair on both the nodes
Then I ran my Hector code to count the number of rows and columns in the database.
I get the following error: Unavailable Exception
Also when I run the command
./nodetool –h ip_address ring
I found that both nodes ownership is 0 %. Please tell me how should I fix that.
You mention "both nodes", which implies that you have two total nodes rather than two data centers as would be suggested by your strategy options. Specifying {DC1:2,DC2:2} would require a minimum of four nodes (two in each DC to satisfy the replication factor), although this would not be advised since essentially all your nodes would be points of failure.
A minimal Cassandra cluster should have at least three nodes, in which case a RF of two would allow one node to go down without bringing down the system. It sounds like you have a single cluster (rather than two data centers), so what you really need is one more node (3 total), RF=2, using the SimpleStrategy instead of NetworkTopologyStrategy.