Can Cassandra support multi-DC cluster with different number of nodes? - cassandra

I want to be able to get the backup/replicarw of operational data to a single node so we can do some adhoc queries.
Having just one machine handle this replica will be work for now.
Is this possible ? If not what are the arguments against it ?

Yes, you can have different number of nodes in each data center. Set the replication factor as per your requirement.
E.g. If you have DC1 with 4 nodes and going to add DC2 with 1 node then replication factor for your keyspace should be DC1=x,DC2=1(where x<=4).
To add one more data center you need to check the Topology, Snitch and seeds configurations.
E.g. If you are using SimpleSnitch then you can't have multiple data centers, So you need to change your snitch and topology. Check this link which explains more about changing snitch and topology.

Related

Apache Cassandra decommission second DC and join nodes into first DC as brand new nodes?

My Cassandra cluster consists of 2 DCs, each DC has 5 nodes and replication factor per DC is 3. Both DCs are hosted onto the same docker orchestrator. This is a legacy and probably it was done during last major system migration years ago. At the time being I don't see any advantage of having 2 DCs with same replication factor 3. This way same data is written 6 times. Cluster is at least 80% write heavy, reads are more or less limited.
Cassandra load is struggling at peak times, so I would like to have 1 DC with 10 nodes (instead of 2DCs x 5 ndoes) to be able to balance across 10 nodes, instead of just 5. This way I will also bring down data size per node. Having same amount of RAM and CPU dedicated to Cassandra, I would win performance and free storage space ;-)
So idea is to decommission DC2 and bring all 5 nodes from it to DC1 as brand new nodes.
Steps are known:
alter keyspaces to be limited to DC1 only.
no clients to be writing/reading to/from DC2 - DCAwarePolicy with LOCAL_*
I wonder about next step - it says I need to start decommissioning node by node DC2. Is this mandatory or I could somehow just take those nodes down? Goal is not to decommission some, but all nodes in a DC. If I decommission say node5, data would be transferred to remaining 4 nodes and so on. At some point I would be left with 3 nodes and replication factor 3, so I won't be able to decommission any further. What is more - I guess there would be no free space on those node volumes and I am not willing to extend those any further.
So my questions are:
is there a way to alter keyspace to DC1 only, then just to bring all DC2 nodes down, erase volumes and add them one by one to DC1, expanding DC1? Basically to decommission all DC2 nodes at once?
Is there a way for even quicker move of those 5 DC2 nodes to DC1 (at the end they contain same data as 5 nodes in DC1)? Like just join them to DC1 with all data they contain?
What is the advantage of having 2 DCs in a single cluster, instead of having a single DC with double the nodes? Or it strongly depends on the usage and the way services write and read data from Cassandra?
Appreciate your replies, thanks.
Cheers,
OvivO
is there a way to alter keyspace to DC1 only, then just to bring all DC2 nodes down, erase volumes and add them one by one to DC1, expanding DC1? Basically to decommission all DC2 nodes at once?
Yes, you can adjust the keyspace definition to just replicate within DC1. Since you're basically removing a DC, you could shut them all down, and run a nodetool removenode for each. In theory, that would remove the nodes from gossip and (if they're down) not attempt to move data around. Then yes, add each node back to DC1, one at a time. Once you're done, run a repair, followed by a nodetool cleanup on each node.
Is there a way for even quicker move of those 5 DC2 nodes to DC1 (at the end they contain same data as 5 nodes in DC1)? Like just join them to DC1 with all data they contain?
No. Token range assignment is DC dependent. If they moved to a new DC, their range assingments would change, and the nodes would very likely be responsible for different ranges of data.
What is the advantage of having 2 DCs in a single cluster, instead of having a single DC with double the nodes?
Geographic awareness. If you have a mobile app and users on both the West Coast and East Coast, you don't want your East Coast users making a call for data all the way to the West Coast. You want that data call to happen as locally as possible. So, you'd build up a DC on each coast, and let Cassandra keep them in-sync.

Can Cassandra cluster have even number of nodes?

Currently running a 3 node cluster with replication factor 3 on the keyspaces. Need to add more nodes to the cluster as the size of each node is approaching 2TB.
Can I add just 1 more node to the cluster and have a 4 node cluster or does the cluster always need to have odd number of nodes? Using a consistency level of ONE currently for both read and write.
You can have as many nodes in the cluster as you want, particularly if you are not using the racks feature in Cassandra (all nodes are in the same logical C* rack).
If you are using C* racks, our recommendation is to have an equal number of nodes in each rack so the load distribution is balanced across the racks in each DC.
For example, if your app keyspaces have a replication factor of 3 and you have 3 racks then the number of nodes in the DC should be in multiples of the replication factor -- 3, 6, 9, 12 and so on. This would allow you to configure the same number of nodes in each rack.
This isn't a hard requirement but is best practice so nodes have an equal amount of load and data on them. Cheers!
You can have even number of nodes in a Cassandra cluster. So you can add another node to the cluster. If you are using vnodes, then it will be easier, otherwise a lot of work needs to be done to balance the cluster.
One more thing, reading and writing with consistency level ONE decreases the consistency. If it suits your usecase then it is fine but general recommendation is to use QUORUM on the production system.

How to force Cassandra not to use the same node for replication in a schema with vnodes

Installing Cassandra in a single node to run some tests, we noticed that we were using a RF of 3 and everything was working correctly.
This is of course because that node has 256 vnodes (by default) so the same data can be replicated in the same node in different vnodes.
This is worrying because if one node were to fail, you'd lose all your data even though you thought the data was replicated in different nodes.
How can I be sure that in a standard installation (with a ring with several nodes) the same data will not be replicated in the same "physical" node? Is there a setting to avoid Cassandra from using the same node for replicating data?
Replication strategy is schema dependent. You probably used the SimpleStrategy with RF=3 in your schema. That means that each piece of data will be placed on the node determined by the partition key, and successive replicas will be placed on the successive nodes. In your case, the successive node is the same physical node, hence you get 3 copies of your data there.
Increasing the number of nodes solves your problem. In general, your data will be placed in different physical nodes when your replication factor RF is less than/equal to your number of nodes N.
The other solution is to switch replication strategy and use the NetworkTopologyStrategy, usually used in multi datacenter clusters, and where you can specify how many replicas you want in each data center. This strategy
places replicas in the same data center by walking the ring clockwise
until reaching the first node in another rack. NetworkTopologyStrategy
attempts to place replicas on distinct racks because nodes in the same
rack (or similar physical grouping) often fail at the same time due to
power, cooling, or network issues.
Look at DataStax documentation for more information.
Without vnodes each physical node owns a single token range. With vnodes each physical node will own multiple, non-consecutive token ranges (aka a vnode), and furthermore vnodes are randomly assigned to physical nodes.
Which means that even when data gets replicated on the vnodes right next to the primary replica's node (i.e. when using SimpleStrategy) the replicas will - with high probability but not guaranteed - be on different physical nodes.
This random assignment can be seen in the output of nodetool ring.
More info can be found here.
Cassandra stores replicas on different nodes in the same keyspace. It would be nonsensical to have multiple replicas in the same keyspace. If the replication factor exceeds the number of nodes, than the number of nodes is your replication factor.
But, why is this not an error? Well, this allows for provisioning more nodes later.
As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later.

How to migrate single-token cluster to a new vnodes cluster without downtime?

We have Cassandra cluster with single token per node, total 22 nodes, average load per node is 500Gb. It has SimpleStrategy for the main keyspace and SimpleSnitch.
We need to migrate all data to the new datacenter and shutdown the old one without a downtime. New cluster has 28 nodes. I want to have vnodes on it.
I'm thinking of the following process:
Migrate the old cluster to vnodes
Setup the new cluster with vnodes
Add nodes from the new cluster to the old one and wait until it balances everything
Switch clients to the new cluster
Decommission nodes from the old cluster one by one
But there are a lot of technical details. First of all, should I shuffle the old cluster after vnodes migration? Then, what is the best way to switch to NetworkTopologyStrategy and to GossipingPropertyFileSnitch? I want to switch to NetworkTopologyStrategy because new cluster has 2 different racks with separate power/network switches.
should I shuffle the old cluster after vnodes migration?
You don't need to. If you go from one token per node to 256 (the default), each node will split its range into 256 adjacent, equally sized ranges. This doesn't affect where data lives. But it means that when you bootstrap in a new node in the new DC it will remain balanced throughout the process.
what is the best way to switch to NetworkTopologyStrategy and to GossipingPropertyFileSnitch?
The difficulty is that switching replication strategy is in general not safe since data would need to be moved around the cluster. NetworkToplogyStrategy (NTS) will place data on different nodes if you tell it nodes are in different racks. For this reason, you should move to NTS before adding the new nodes.
Here is a method to do this, after you have upgraded the old cluster to vnodes (your step 1 above):
1a. List all existing nodes as being in DC0 in the properties file. List the new nodes as being in DC1 and their correct racks.
1b. Change the replication strategy to NTS with options DC0:3 (or whatever your current replication factor is) and DC1:0.
Then to add the new nodes, follow the process here: http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-a-data-center-to-a-cluster. Remember to set the number of tokens to 256 since it will be 1 by default.
In step 5, you should set the replication factor for DC0 to be 0 i.e. change replication options to DC0:0, DC1:3. Now those nodes aren't being used so decommission won't stream any data but you should still do it rather than powering them off so they are removed from the ring.
Note one risk is that writes made at a low consistency level to the old nodes could get lost. To guard against this, you could write at CL.LOCAL_QUORUM after you switch to the new DC. There is still a small window where writes could get lost (between steps 3 and 4). If it is important, you can run repair before decommissioning the old nodes to guarantee no losses or write at a high consistency level.
If you are trying to migrate to a new cluster with vnodes, wouldn't you need to change the Partitioner. The documents say that it isn't a good idea to migrate data between different Partitioners.

Cassandra nodes ownership is 0.00%

I have a Cassandra cluster with 2 nodes. I am using NetworkTopologyStrategy
I was trying to increase the replication factor of keyspace in Cassandra to 2. I did the following steps:
UPDATE KEYSPACE demo WITH strategy_options = {DC1:2,DC2:2}; on both the nodes
Then I ran the nodetool repair on both the nodes
Then I ran my Hector code to count the number of rows and columns in the database.
I get the following error: Unavailable Exception
Also when I run the command
./nodetool –h ip_address ring
I found that both nodes ownership is 0 %. Please tell me how should I fix that.
You mention "both nodes", which implies that you have two total nodes rather than two data centers as would be suggested by your strategy options. Specifying {DC1:2,DC2:2} would require a minimum of four nodes (two in each DC to satisfy the replication factor), although this would not be advised since essentially all your nodes would be points of failure.
A minimal Cassandra cluster should have at least three nodes, in which case a RF of two would allow one node to go down without bringing down the system. It sounds like you have a single cluster (rather than two data centers), so what you really need is one more node (3 total), RF=2, using the SimpleStrategy instead of NetworkTopologyStrategy.

Resources