I have a Cassandra Cluster with multiple datacenters and nodes
How can I generate token values for Multi-datacenter Cassandra cluster for Murmur3Partitioner?
I found a site which generates token for Murmur3Partitioner and RandomPartitioner but only for 1 DC.
https://www.geroba.com/cassandra/cassandra-token-calculator/
I will update here If found a better link.
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configGenTokens.html
You just need to make sure that each of the nodes in the each DC have the ranges spread evenly.
Related
I am a newbie in Cassandra.
In our production environment three node Cassandra clusters are running and serving production traffic but I have below mentioned doubts:-
1) All nodes are configured in different racks i.e rack 1, rack 2 and rack 3 in the same dc. Is it fine or does this configuration have some drawbacks?
2) We are using rf2 and network topology for all the keyspaces except system tables and these system tables are configured with rf2 and simplestrategy ..is it fine or does this need to be changed? should we increase the replication factor of system_auth? ..please let me know..
3) Now I want to add another node in the same dc, what will be the best procedure to do the same without impacting the live traffic?
Cassandra version is Apache cassandra 3.11.
Thanks in advance..
Ans 1) It seems good to have Cassandra nodes in different racks for availability and fault tolerance .
Ans 2) You must increase RF on system_auth so that you can avoid cqlsh login issue from other nodes.
Ans 3) You can add new node without affecting the live traffic on existing cluster. please follow below procedure.
http://cassandra.apache.org/doc/latest/operating/topo_changes.html
Cassandra is designed as a distributed system. Cassandra’s distributed architecture is specifically tailored for multiple-data center deployment. These features are robust and flexible enough that you can configure the cluster for optimal geographical distribution, for redundancy for fail-over and disaster recovery.
Multiple data center deployments are excellent for global solutions where in some applications are operational in one region and other applications in another region and yet using a single cluster of Cassandra which is working in multiple data centers across regions.
For single region applications, still having multiple data-centers is preferred option because it provides disaster recovery even in case one region goes down.
Ans 1) For a single DC Cassandra cluster , recommendation is to have 4 nodes with RF3. Rack 1 with 2 nodes and Rack 2 with 2 nodes. Remember that nodes in the same rack have faster network than nodes in different racks. With two nodes on the same Rack, queries with LOCAL_QUORUM will be faster as compared to queries on a cluster with all nodes on different racks.
If you are not concerned with the query latency , all nodes in different racks (3 racks) will give better disaster recovery as compared with two RACK deployment. Having said that, it's always recommended to use multi DC deployments for production cluster.
Ans 2) It’s always recommended to increase the replication factor of System_auth keyspace and change the replication class to NetworkTopologyStrategy. Please follow this documentation for more details https://docs.datastax.com/en/security/6.0/security/secSystemKeyspace.html
Ans 3) Yes, You can add a new node to existing cluster with ease without impacting the traffic. Please follow this documentation for more details: https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddNodeToCluster.html
I am looking to add 3 nodes to an existing 6 nodes cassandra cluster, but I'm a bit confused on how best to do this due to the token assignments.
Currently, the existing 6 node cluster is not using vNodes (this can't be changed) and is using RandomPartitioner and so the current tokens were added as per the token generator. The issue is that adding 3 nodes to a 6 node cluster means that the recalculated tokens would put the new node 7 with the same token as the current node 5.
What is the best practice here? Should I do a nodetool move on the existing nodes to add in the recalculated tokens, THEN bootstrap the new nodes with the correct config and tokens. Or do I add the new nodes with no token, and once bootstrapped, then nodetool move across all the nodes adding in the newly calculated tokens, starting from the second node (as first node is always 0 with RandomPartitioner).
I've done a lot of reading, but can't seem to find a scenario that covers this eventuality. And I can't add more than 3 nodes, long story...
Any help greatly received!
You need to recalculate the tokens for you whole cluster and assign the new tokens to the existing nodes. Detailed instructions can be found here. Since you are recalculating for the whole cluster, you should not have the issues that you are talking about.
I think that the best solution would be to add a new DC where you would use vnodes and Murmur3. After the data is replicated and you have moved all your clients to the new DC, you would decommission the old DC.
Can someone explain the working of vnodes allocation in Cassandra?
If we have a cluster of N nodes and a new node is added how are token ranges allocated to this new node?
Rebalancing a cluster is automatically accomplished when adding or removing nodes. When a node joins the cluster, it assumes responsibility for an even portion of data from the other nodes in the cluster. If a node fails, the load is spread evenly across other nodes in the cluster.
Here is some reading that might help you better understand how vnodes work and how ranges are being allocated - Virtual nodes in Cassandra 1.2
As I said above, Cassandra automatically handles the calculation of token ranges for each node in the cluster in proportion to their num_tokens value. Token assignments for vnodes are calculated by the org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocator class.
When a new node joins the cluster, it will inject it's own ranges and steal some rages from the existing nodes. Also this video might help
I am a little confused about the token assignment for multiple DC. When running nodetool ring, we can see all the tokens are/need to be different even for different DC's node. Does that mean all nodes in cluster form only one ring, or each DC's nodes form a ring in each DC?
That's right, the Cassandra token range spans the entire cluster, so there will only be one primary node responsible for any piece of data.
Managing data across multiple datacenters is handled by specifying the desired replication strategy, e.g. NetworkTopologyStrategy
Each DCs nodes form a ring in its own cluster. For multiple DCs, each DC has its own distinct partition range independent of each other. As #alec-collier mentioned NetworkTopologyStrategy will figure out the replicas for a partition within each DC.
Can I have RandomPartitioner in the cluster in datacenter1 and Murmur3Partitioner in the cluster in datacenter2?
No, you need to have the same partitioner on all nodes in the cluster.
If you are asking this because you want a way of migrating from RandomPartitioner to Murmur3Partitioner then it won't work unfortunately. I don't know of a method of moving to Murmur3Partitioner on a live cluster, but the benefit is small so it is unlikely to be worth doing.