Can someone explain the working of vnodes allocation in Cassandra?
If we have a cluster of N nodes and a new node is added how are token ranges allocated to this new node?
Rebalancing a cluster is automatically accomplished when adding or removing nodes. When a node joins the cluster, it assumes responsibility for an even portion of data from the other nodes in the cluster. If a node fails, the load is spread evenly across other nodes in the cluster.
Here is some reading that might help you better understand how vnodes work and how ranges are being allocated - Virtual nodes in Cassandra 1.2
As I said above, Cassandra automatically handles the calculation of token ranges for each node in the cluster in proportion to their num_tokens value. Token assignments for vnodes are calculated by the org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocator class.
When a new node joins the cluster, it will inject it's own ranges and steal some rages from the existing nodes. Also this video might help
Related
I am looking to add 3 nodes to an existing 6 nodes cassandra cluster, but I'm a bit confused on how best to do this due to the token assignments.
Currently, the existing 6 node cluster is not using vNodes (this can't be changed) and is using RandomPartitioner and so the current tokens were added as per the token generator. The issue is that adding 3 nodes to a 6 node cluster means that the recalculated tokens would put the new node 7 with the same token as the current node 5.
What is the best practice here? Should I do a nodetool move on the existing nodes to add in the recalculated tokens, THEN bootstrap the new nodes with the correct config and tokens. Or do I add the new nodes with no token, and once bootstrapped, then nodetool move across all the nodes adding in the newly calculated tokens, starting from the second node (as first node is always 0 with RandomPartitioner).
I've done a lot of reading, but can't seem to find a scenario that covers this eventuality. And I can't add more than 3 nodes, long story...
Any help greatly received!
You need to recalculate the tokens for you whole cluster and assign the new tokens to the existing nodes. Detailed instructions can be found here. Since you are recalculating for the whole cluster, you should not have the issues that you are talking about.
I think that the best solution would be to add a new DC where you would use vnodes and Murmur3. After the data is replicated and you have moved all your clients to the new DC, you would decommission the old DC.
I'm new to Cassandra and I'm confused about the concepts of nodes and vnodes.
Here's what I had read: The hierarchy of elements in Cassandra is:
Cluster - Data Centre - Rack-Server-Node
The node was described as a data storage layer within a server and the server was the actual physical machine containing the Cassandra software.
From what I could understand, it seems to me that vnodes are different/more efficient than nodes in certain cases.
However I'm having trouble placing them in this hierarchy.
Is vnode just a different kind of node in the above hierarchy.
or
is it that after the concept of vnode was introduced, the element called server in the above hierarchy is now called a node and the one called node in the above hierarchy is now called a vnode!
You can see vnodes as a next step in the hierarchy you've described, after physical nodes.
Vnodes help redistribute data based on tokens when you are resizing your cluster and making data distribution much more flexible.
There's a good explanation from datastax site: https://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
EDIT: In old versions of Cassandra, tokens were splitted in the way that each server had one token(range) and it was replicated between physical machines based on replication factor. vnodes architecture (also used in riak for example) makes virtualization of the "node" layer, splitting the ring into high number of token ranges (vnodes) and each physical node (cassandra service) has number of vnodes running on it. Please review the link provided, there's very good explanation with examples.
Before Cassandra 1.2 each node was assigned a token. So adding/replacing a node implied some manual calculations of the initial_token property in cassandra.yaml and also significant data moves across the cluster.
Cassandra’s 1.2 release introduced the concept of virtual nodes, also
called vnodes for short. Instead of assigning a single token to a
node, the token range is broken up into multiple smaller ranges. Each
physical node is then assigned multiple tokens. By default, each node
will be assigned 256 of these tokens, meaning that it contains 256
virtual nodes. Virtual nodes have been enabled by default since 2.0.
From Cassandra, The definitive guide, Jeff Carpenter & Eben Hewitt.
Vnodes are good because you can adjust the number of vnodes on each Cassandra instance (node), depending on the machine capabilities by adjusting num_tokens property in the cassandra.yaml file.
Token assignments for vnodes are calculated by the org.apache.cassandra.dht.tokenallocator.ReplicationAwareTokenAllocator class.
In Cassandra, virtual nodes are created and distributed among nodes as given in http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2. But who does that process ? Creating the virtual nodes, distributing among peers. Is it some sort of leader ? How does it work ?
Also in case a node joins, virtual nodes are re-distributed. Lot more similar actions are present. Who does all those ?
Edit: Is it like when a node joins, it takes up some part of virtual nodes from existing cluster thus eliminating the need of leader ?
New node retrieves data about the cluster using seed nodes.
The new node will take his part of the cluster, based of num_tokens parameter (by default it will be distributed evenly between all nodes existing nodes), and will bootstrap it's part of data.
The rest of the cluster will be aware about the changes by "gossiping" - using gossip protocol.
Except the seed nodes part, there's no need for any "master" in the process.
Old nodes will not delete partitions automatically, you need to run nodetool cleanup on the old nodes after adding a new node.
Here's good article about it:
http://cassandra.apache.org/doc/latest/operating/topo_changes.html
I am a little confused about the token assignment for multiple DC. When running nodetool ring, we can see all the tokens are/need to be different even for different DC's node. Does that mean all nodes in cluster form only one ring, or each DC's nodes form a ring in each DC?
That's right, the Cassandra token range spans the entire cluster, so there will only be one primary node responsible for any piece of data.
Managing data across multiple datacenters is handled by specifying the desired replication strategy, e.g. NetworkTopologyStrategy
Each DCs nodes form a ring in its own cluster. For multiple DCs, each DC has its own distinct partition range independent of each other. As #alec-collier mentioned NetworkTopologyStrategy will figure out the replicas for a partition within each DC.
I have a 4 node cluster and I have upgraded all the nodes from an older version to Cassandra 1.2.8. Total data present in the cluster is of size 8 GB. Now I need to enable vNodes on all the 4 nodes of cluster without any downtime. How can I do that?
As Nikhil said, you need to increase num_tokens and restart each node. This can be done one at once with no down time.
However, increasing num_tokens doesn't cause any data to redistribute so you're not really using vnodes. You have to redistribute it manually via shuffle (explained in the link Lyuben posted, which often leads to problems), by decommissioning each node and bootstrapping back (which will temporarily leave your cluster extremely unbalanced with one node owning all the data), or by duplicating your hardware temporarily just like creating a new data center. The latter is the only reliable method I know of but it does require extra hardware.
In the conf/cassandra.yaml you will need to comment out the initial_token parameter, and enable the num_tokens parameter (by default 256 I believe). Do this for each node. Then you will have to restart the cassandra service on each node. And wait for the data to get redistributed throughout the cluster. 8 GB should not take too much time (provided your nodes are all in the same cluster), and read requests will still be functional, though you might see degraded performance until the redistribution of data is complete.
EDIT: Here is a potential strategy to migrate your data:
Decommission two nodes of the cluster. The token-space should get distributed 50-50 between the other two nodes.
On the two decommissioned nodes, remove the existing data, and restart the Cassandra daemon with a different cluster name and with the num_token parameters enabled.
Migrate the 8 GB of data from the old cluster to the new cluster. You could write a quick script in python to achieve this. Since the volume of data is small enough, this should not take too much time.
Once the data is migrated in the new cluster, decommission the two old nodes from the old cluster. Remove the data and restart Cassandra on them, with the new cluster name and the num_tokens parameter. They will bootstrap and data will be streamed from the two existing nodes in the new cluster. Preferably, only bootstrap one node at a time.
With these steps, you should never face a situation where your service is completely down. You will be running with reduced capacity for some time, but again since 8GB is not a large volume of data you might be able to achieve this quickly enough.
TL;DR;
No you need to restart servers once the config has been edited
The problem is that enabling vnodes means a lot of the data is redistributed randomly (the docs say in a vein similar to the classic ‘nodetool move’