Cassandra Virtual Nodes - cassandra

Although it is asked many times and answered many times, I did not find a good answer anyway.
Neither in forums nor in cassandra docs.
How do virtual nodes work?
Suppose a node having 256 virtual nodes.
And docs say they are distributed randomly.
(put away how that "randomly" done...I have another,more urgent question):
Is that right that every cassandra node ("physical") actually responsible for several distinct locations in the ring? (for 256 locations)? Does that mean the "physical" node sort of "spread" on the whole circle?
How in that case re-balancing works? If I add a new node?
The ring will get an additional 256 nodes.
How those additional nodes will divide the data with the old nodes?
Will they, basically, appear as additional "bicycle spokes" randomly spread through the whole ring?
A lot of info on the internet, but nobody makes a clear explanation...

Vnodes break up the available range of tokens into smaller ranges, defined by the num_tokens setting in the cassandra.yaml file. The vnode ranges are randomly distributed across the cluster and are generally non-contiguous. If we use a large number for num_tokens to break up the token ranges, the random distribution means it is less likely that we will have hot spots.Using statistical computation, the point where all clusters of any size always had a good token range balance was when 256 vnodes were used. Hence, the num_tokens default value of 256 was the recommended by the community to prevent hot spots in a cluster.
Ans 1:- It is a range of tokens based on num_tokens. if you have set 256 the you will get 256 token ranges which is default.
Ans 2:- Yes, when you are adding or removing the nodes the tokens will distribute again in the cluster based on vnodes configurations.
you may refer for more details are here https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html

LetsNoSQL answer is correct. See also https://stackoverflow.com/a/37982696/5209009. I'll only add a few more comments:
Yes, the "physical" node is spread on the token range.
As explained in the link, any new node will take 256 new token ranges, dividing some of the existing ones. There is no other rebalancing, it relies on randomness to achieve some rebalancing, that's why it's using a relatively large (256) number of tokens per node.
It's worth mentioning that there is another option. You can run vnodes with a smaller number of tokens per node (4-8) with a token allocation algorithm. Any new tokens will not be allocated randomly, a greedy algorithm will be used so that the new tokens will create a distribution that optimises the load on a given keyspace. It will simply divide in half the token ranges containing most of the data. Since it's not random it can work with a smaller number of tokens (4-8). It's not really relevant for small clusters, but for 100+ nodes it can be.
See https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30 and https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html.

Related

Cassandra Token Distribution

I am going through cassandra tutorials and come across this picture that represents multinode cassandra cluster -
Isnt total number of tokens ( in the above 256 ) should be distributed across all three nodes around 85 tokens each ?
No, the num_tokens parameter specifies how many tokens ranges each node will handle. From cassandra.yaml description:
This defines the number of tokens randomly assigned to this node on the ring. The more tokens, relative to other nodes, the larger the proportion of data that this node will store. You probably want all nodes to have the same number of tokens assuming they have equal hardware capability.
Otherwise, what would happen if you have cluster with more than 256 nodes? ;-)

Why and when to use Vnodes in Cassandra in real life production scenarios?

I understand that you don't have to rebalance the vnodes but when do we really use
it in production scenarios? does it function the same way as a physical single token node? If so, then why use single token nodes at all? Does vnodes help if I have large amount data and the cluster size (say 300 nodes)?
The main benefit of using vnodes is more evenly distributed data being streamed when bootstrapping a new node. Why? Well, when adding a new node, it will request for the data in its token range. Optimally, the data it requests would be spread out evenly across all nodes reducing the workload for all of the nodes sending the data to the bootstrapping node (and also speeding up the bootstrap process).
Once you have a high number of physical nodes, like your example of 300, it would seem this benefit would be reduced (assuming no hot spotting or data partitioning issues). I'm not aware of an actual guidelines referencing the number of nodes to use or not use vnodes other than what is in the documentation. Yes, it is seen in production.
More information can be found here:
http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/config/configVnodes.html
In addition to Chris' excellent answer, I'll make an addition. When you have a large cluster with vnodes, it is helpful to let Cassandra manage the token ranges. Without vnodes, you would end up having to size and re-specify the token range for each (existing and) new node yourself. With vnodes, Cassandra handles that for you.
Compare the difference in the steps listed in the documentation:
Adding a node without vnodes: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsAddRplSingleTokenNodes.html
vs.
Adding with vnodes: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_node_to_cluster_t.html

Can I use num_tokens as load factor in Cassandra?

If you run cassandra on machines with different sizes of hard disk (e.g. one with one TB, another with 2TB), can I use num_tokens as load factor?
I want to reduce the risk of one node running out of disk space and balance the usage of disk on different machines.
I know, the more data one node collects, the more probable it might become a hotspot. Apart from that, which other considerations do I need to take care of? Which limits or practical restrictions exist for the number of nodes?
Can I change the number of nodes later if disk space changes without trouble?
I would appreciate some advice on that topic because I have not found much information about that in google or at the website of cassandra.
EDIT: numnodes replaced by num_tokens
Are you referring to num_tokens settings? Yes, you can use a different number of tokens based on the hardware resources. Nodes with a larger number of tokens will see higher load and disk usage. Once set, the num_tokens setting cannot be changed at a later point without decommissioning the node.

How to ensure that consistent hashing works?

I'm going to implement consistent hashing over a bunch of nodes. Each node has a limited capacity (let's say 1GB). I starts with one node and when it's getting full I'm gonna add another node and use consistent hashing to redistribute the data and move forward by adding new nodes. However there are still chances that a node gets full. I know some nosql databases such as cassandra uses consistent hashing to do something similar to what i'm doing. How can I avoid nodes from overflowing using consistent hashing?
Cassandra does not use consistent hashing in a way you described.
Each table has a partition key (you can think about it as a primary key or first part of it in RDBMS terminology), this key is hashed using murmur3 algorithm. The whole hash space forms a continuos ring from lowest possible hash to the highest. After that this ring is divided into chunks (vnodes, 256 by default) and these chunks are fairly distributed among multiple nodes. Each node hosts not only it's own part of the ring, but also maintains replicated copy of other vnodes according to replication factor.
This way of doing things helps to solve a lot of problems:
balance data load among all cluster nodes, no specific node can be overloaded (data size, reads and writes are evenly distributed, no hot points)
if you add a new node to a cluster, it will handle it's own part of ring and pull required vnodes automatically from other nodes. No need to manual resharding.
if node fails, due to replication you won't miss any data because it is already stored on other nodes. In this case you can decomission failed nodes so all other nodes will redistribute failed ring part among them. No need to have complex switching scenarios for failed db nodes.
Of course, you can always implement similar DB behaviour on top of any RDBMS in your application layer, but it is always much harder and not error-prone than using already existing battle-tested solution.
I guess you know how keys gets moved from one node to another node, when a node is added or deleted. Coming to your question of how uniform distribution happens?
You can have your own logic here to make it happen. You keep on monitoring all the nodes in the hash if any node is getting hot(Handling more keys) insert another node before this node so that the load will be distributed among the old and the new nodes. Similar way if any of the the nodes are under utilised you can delete them so that load will be shift to the next node.
Hope this help..!!

how to efficiently manage cassandra initial token?

I'm new cassandra user. I know that there is initial token configuration and how to generate it.
The question is if I have an existen cluster with x nodes and I want to add additional node (one or more) should I reconfigure all the nodes to the new tokens (according to new generated values)?
Or is there more efficient way to manage this?
If you're looking for what the best practices are for handling such tasks, take a look at this section of the Cassandra 1.0 docs dedicated to token strategy.
Shortened version of your options, from the documentation:
Add capacity by doubling the cluster size -- [..] nodes can keep their existing token assignments, and new nodes are assigned tokens that bisect (or trisect) the existing token ranges.
Recalculate new tokens for all nodes and move nodes -- [..] you will have to recalculate tokens for the entire cluster. Existing nodes will have to have their new tokens assigned using nodetool move.
Add one node at a time and leave initial_token empty -- [..] splits the token range of the heaviest loaded node and places the new node into the ring at that position. [..] not result in a perfectly balanced ring, but it will alleviate hot spots.
link
If you were seeking a management solution Priam (from Netflix) might be worth looking at. It's open source and Apache-licensed, but requires some amount of configuration and is probably only worth investing [time] in for larger clusters.

Resources