Cassandra Vnodes and token Ranges

Cassandra Vnodes and token Ranges - cassandra

I know that Vnodes form many token ranges for each node by setting num_tokens in cassandra.yaml file.
say for example (a), i have 6 nodes, each node i have set num_token=256. How many virtual nodes are formed among these 6 nodes that is, how many virtual nodes or sub token ranges contained in each physical node.
According to my understanding, when every node has assigned num_token as 256, then it means that all the 6 nodes contain 256 vnodes each. Is this statement true? if not then, how vnodes form the range of tokens (obviously random) in each node. It would be really convenient if someone can explain me with the example mentioned as (a).
what is the Ring of Vnodes signify in this url:=> http://docs.datastax.com/en/cassandra/3.x/cassandra/images/arc_vnodes_compare.png (taken from: http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 )

Every partition key in Cassandra is converted to a numerical token value using the MurMur3 hash function. The token range is between -2^63 to +2^63 -1
num_token defines how many token ranges are assigned to a node. this is the same as the signed java long. Each node calculates 256 (num_tokens) random values in the token range and informs other nodes what they are, thus when a node needs to coordinate a request for a specific token it knows which nodes are responsible for it, according to the Replication Factor and DC/rack placement.
A better description for this feature would be "automatic token range assignment for better streaming capabilities", calling it "virtual" is a bit confusing.
In your case you have 6 nodes, each set with 256 token ranges so you have 6*256 token ranges and each psychical node contains 256 token ranges.
For example consider 2 nodes with num_tokens set to 4 and token range 0 to 100.
Node 1 calculates tokens 17, 35, 77, 92
Node 2 calculates tokens 4, 25, 68, 85
The ring shows the distribution of token ranges in this case
Node 2 is responsible for token ranges 4-17, 25-35, 68-77, 85-92 and node 1 for the rest.

Related

Adding new DC to a cluster that uses initial_token/allocate_tokens_for_keyspace allocation scheme

We have a 3 DC cluster running Cassandra 3.10. Each DC has 24 nodes total with 8 tokens per node and 3 seed nodes. We use Murmur3Partitioner.
In order to ensure better data distribution, the cluster was created using tokens allocation approach where you manually specify initial_token for seed nodes and use allocate_tokens_for_keyspace for non seed nodes.
Now I need to add another DC to the cluster using the same tokens allocation approach, but I can't figure out how to calculate initial_token for the new seed nodes. My naive approach was to copy token values from one of the existing DCs only to discover that initial token values must be unique across the whole cluster.
So, now I'm kinda lost on how to proceed. Any help will be appreciated, thanks.

Correct, token range assignments must be unique.
In the days before VNodes and automatic token range calculations, we used to have to calculate token ranges manually. For multiple data centers, we would use the same token ranges offset by 1.
Example: If you use num_tokens: 4 and have 6 nodes in your DC, your nodes' token ranges might look something like this:
node1: 8454757700450209793, -5380300354831952895, -768614336404564991, 3843071682022821889
node2: -9223372036854775807, -4611686018427387903, 1, 4611686018427387905
node3: -8454757700450211199, -3843071682022823935, 768614336404563969, 5380300354831951873
node4: -7686143364045646591, -3074457345618258943, 1537228672809127937, 6148914691236515841
node5: -6917529027641081855, -2305843009213693951, 2305843009213693953, 6917529027641081857
node6: -6148914691236517375, -1537228672809129983, 3074457345618257921, 7686143364045645825
If you added a second DC, then the ranges for the new 6 nodes in the other DC would look like this:
node1: 8454757700450209794, -5380300354831952894, -768614336404564990, 3843071682022821890
node2: -9223372036854775806, -4611686018427387902, 2, 4611686018427387906
node3: -8454757700450211198, -3843071682022823934, 768614336404563970, 5380300354831951874
node4: -7686143364045646590, -3074457345618258942, 1537228672809127938, 6148914691236515842
node5: -6917529027641081854, -2305843009213693950, 2305843009213693954, 6917529027641081858
node6: -6148914691236517374, -1537228672809129982, 3074457345618257922, 7686143364045645826
If you'd have to do this for another DC, then you would offset their starting token ranges by 2.

Even Data distribution in Cassandra

I'm new to Cassandra, and I'm stuck at one point.
Consider I have a 5 node cluster with an RF=1 (for simplicity)
Token Ranges
==============
N1 : 1-100
N2 : 101-200
N3 : 201-300
N4 : 301-400
N5 : 401-500
I have a keyspace with 10 partition keys:
ID (PartitionKey) | Name
------------------------
1 Joe
2 Sarah
3 Eric
4 Lisa
5 Kate
6 Agnus
7 Lily
8 Angela
9 Rodger
10 Chris
10 partition keys ==> implies ==> 10 hash values
partitionkey ==> token generated
=================================
1 289 (goes on N3)
2 56 (goes on N1)
3 78 (goes on N1)
4 499 (goes on N5)
5 376 (goes on N4)
6 276 (goes on N3)
7 2 (goes on N1)
8 34 (goes on N1)
9 190 (goes on N2)
10 68 (goes on N1)
If this is the case, then:
N1 has the partition keys : 2,3,7,8,10
N2 has the partition keys : 9
N3 has the partition keys : 1,6
N4 has the partition keys : 5
N5 has the partition keys : 4
So we see that N1 is loaded compared to others, the other nodes (as per my understanding).
Please help me understand how data is evenly distributed in Cassandra, w.r.t Partitioners and consistent hashing.

There is some truth to what you're posting here, mainly because data distribution via hashing is tough with smaller numbers. But let's add one assumption... Let's say we use vNodes, with num_tokens: 4* set in the cassandra.yaml.
So with this new assumption, token range distribution likely looks more like this:
Token Ranges
==============
N1 : 1-25, 126-150, 251-275, 376-400
N2 : 26-50, 151-175, 276-300, 401-425
N3 : 51-75, 176-200, 301-325, 426-450
N4 : 76-100, 201-225, 326-350, 451-475
N5 : 101-125, 226-250, 351-375, 476-500
Given this distribution, your keys are now placed like this:
N1 has the partition keys : 5, 7
N2 has the partition keys : 1, 6, 8
N3 has the partition keys : 2, 9, 10
N4 has the partition keys : 3
N5 has the partition keys : 4
Now figure-in that there is a random component to the range allocation algorithm, and the actual distribution could be even better.
As with all data sets, the numbers get better as the amount of data increases. I'm sure that you'd see better distribution with 1000 partition keys vs. 10.
Also, as the size of your data set increases, data distribution will benefit from new nodes being added with setting allocate_tokens_per_keyspace. This will allow the token allocation algorithm to make smart decisions (less random) about token range assignment based on your keyspace's replication factor.
*Note: Using vNodes with num_tokens: 4 is considered by many Cassandra experts to be an optimal production setting. With the new algorithm, the default of 256 tokens is quite high.

Selecting the partition key is very important in having even distribution of data among all the nodes. The partition key is supposed to be something that has very high cardinality.
For example, in a 10 node cluster, selecting state of a specific country as partition key may not be very ideal since there’s very high chance of creating hotspots, especially when the number of records itself may not be even across states. Whereas choosing something like zip code may be better or even better than that would be something like customer name or ordernumber.
You can explore having a composite partition key if it helps your use case.

In Cassandra data is distributing based on partition and hashing algorithm. We have many other parameters to configure for data distribution and replication such as replication factor, Replication strategy, Snitch etc. Below is the standard recommended document.
https://docs.datastax.com/en/cassandra-oss/2.2/cassandra/architecture/archDataDistributeAbout.html

Nodetool load and own stats

We are running 2 nodes in a cluster - replication factor 1.
After writing a burst of data, we see the following via node tool status.
Node 1 - load 22G (owns 48.2)
Node 2 - load 17G (owns 51.8)
As the payload size per record is exactly equal - what could lead to a node showing higher load despite lower ownership?

Nodetool status uses the Owns column to indicate the effective percentage of the token range owned by the nodes. While GB is Size of your records

Dont see anything wrong here. Your data is almost evenly distributed around your two nodes which is exactly what you want for perfekt performance.

Best way to store high frequency, periodic time-series data?

I have created an MVP for a nodejs project, following are some of the features that are relevant to the question I am about to ask:
1-The application has a list of IP addresses with CRUD actions.
2-The application will ping each IP address after every 5 seconds.
3- And display against each IP address it's status i.e alive or dead and the uptime if alive
I created a working MVP on nodejs with the help of the library net-ping, express, mongo and angular. Now I have a new feature request that is:
"to calculate the round trip time(latency) for each ping that is generated for each IP address and populate a bar chart or any type of chart that will display the RTT(latency) history(1 months-1 year) of every connection"
I need to store the response of each ping in the database, Assuming the best case that if each document that I will store is of size 0.5 kb, that will make 9.5MB data to be stored in each day,285MB in each month and 3.4GB in a year for a single IP address and I am going to have 100-200 IP addresses in my application.
What is the best solution (including those which are paid) that will suit the best for my requirements considering the app can scale more?

Time series data require special treatment from a database perspective as they introduce challenges to the traditional database management from capacity, query performance, read/write optimisation targets, etc.
I wouldn't recommend you store this data in a traditional RDBMS, or object/document database.
Best option is to use a specialised time-series database engine, like InfluxDB, that can support downsampling (aggregation) and raw data retention rules

So I changed The schema design for the Time-series data after reading this and that reduced the numbers in my calculation of size massively
previous Schema looked like this:
{
timestamp: ISODate("2013-10-10T23:06:37.000Z"),
type: "Latency",
value: 1000000
},
{
timestamp: ISODate("2013-10-10T23:06:38.000Z"),
type: "Latency",
value: 15000000
}
Size of each document: 0.22kb
number of document created in an hour= 720
size of data generated in an hour=0.22*720 = 158.4kb
size of data generated by one IP address in a day= 158 *24 = 3.7MB
Since every next time_Stamp is just the increment of 5 seconds from the previous one, the schema can be optimized to cut the redundant data.
The new schema looks like this :
{
timestamp_hour: ISODate("2013-10-10T23:06:00.000Z"),// will contain hours
type: “Latency”,
values: {//will contain data for all pings in the specific hour
0: 999999,
…
37: 1000000,
38: 1500000,
…
720: 2000000
}
}
Size of each document: 0.5kb
number of document created in an hour= 1
size of data generated in an hour= 0.5kb
size of data generated by one IP address in a day= 0.5 *24 = 12kb
So I Am assuming the size of the data will not be an issue anymore, and I although there is a debate for what type of storage should be used in such scenarios to ensure best performance but I am going to trust mongoDB in my case.

Partitioning for cassandra nodes, using Murmur3Partitioner

In my project, I use cassandra 2.0, and have 3 database servers.
2 of 3 servers has 2 TB of hard drive, the last has just 200 GB. So, I want the 2 servers response for higher load than the last one.
Cassandra: I use Murmur3Partitioner to partition the data.
My question is: how can I calculate the initial_token for each cassandra instance?
Thanks for your help :)

If you are using a somewhat recent version of Cassandra (2.x) then you can configure the number of tokens a node should hold relative to other nodes in the cluster. There is no need to specify token range boundaries via the initial_token any more. Instead you give a node a "weight" through the num_tokens parameter. As the capacity of your smaller node is roughly 1/10th of the big ones, adjust the weight of that node accordingly. The default weight is 256. So you could start with a weight of 25 for the smaller node and try and see whether it works OK that way.

Murmur3Partitioner : Uniformly distribute the data across the clusters based on the MurmurHash hash value.
Murmur3Partitioner uses a maximum possible range of hash values from -263 to +263-1. Here is the formula to calculate tokens:
python -c 'print [str(((264 / number_of_tokens) * i) - 263) for i in range(number_of_tokens)]'
For example, to generate tokens for 10 nodes:
python -c 'print [str(((264 / 10) * i) - 263) for i in range(10)]'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string