I have a cassandra cluster which is 21 nodes large and the seed node is one of the node whicg is seeing high disk usage, high memory usage and also services lot of connection. Currently the num_tokens=256 and across all node.
My question is since seed nodes have the responsibility of processing the gossip info, do that need to take lesser data than the other nodes. Eg: keep num_tokens=64 for seed nodes and num_tokens=256 for other nodes. I could not find any information related to this. Does anyone have information.
The purpose of seed node is to bootstrap the gossiping process for the new nodes. Once the node has joined the cluster, seed node doesn't have much importance.
Your issue might be with the data model. Your data might have hot partitions and those hot partitions might be on the seed node which will cause more requests(reads/writes) on that specific node.
It is just a coincidence that the node is seed node.
Related
Does Cassandra maintains RF when a node goes down. For e.g. if number of nodes is 5 and RF is 2 then when a single node goes down, does the remaining replica copies it's data to some other node to maintain the RF of 2?
In the Datastax's documentation, it's mentioned that "If a node fails, the load is spread evenly across other nodes in the cluster". Does this mean that migration of data happens when a node goes down? Is this a feature available only in Datastax's Cassandra and not Apache Cassandra?
No, instead a "hint" will be stored in the coordinator node and will get eventually written to the node which owns the token range when the node comes back up - the write will succeed depending on your consistency level. So in the above example the write will succeed if you are writing with consistency level as ONE.
If the node is down only for short period - the node will receive the data back from hints from other nodes when it comes back. But if you decommission a node, then the data gets replicated to other nodes and the other nodes will have the new token ranges (same case when a node is added to the cluster as well).
Over time the data in one replica can become inconsistent with others and the repair process helps Cassandra in fixing them - https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesTOC.html
This is applicable in Apache Cassandra as well.
I have a cluster 4 cassandra nodes. I have recently added a new node but data processing is taking too long. Is there a way to make this process faster ? output of nodetool
Less data per node. Your screenshot shows 80TB per node, which is insanely high.
The recommendation is 1TB per node, 2TB at most. The logic behind this is bootstrap times get too high (as you have noticed). A good Cassandra ring should be able to rapidly recover from node failure. What happens if other nodes fail while the first one is rebuilding?
Keep in mind that the typical model for Cassandra is lots of smaller nodes, in contrast to SQL where you would have a few really powerful servers. (Scale out vs scale up)
So, I would fix the problem by growing your cluster to have 10X - 20X the number of nodes.
https://groups.google.com/forum/m/#!topic/nosql-databases/FpcSJcN9Opw
In Cassandra Cluster, how can we ensure all nodes are having almost equal data, instead one node has more data, another has very less.
If this scenario occurs, what are the best practices
Thanks
It is ok to expect a slight variation of 5-10%. The most common causes are the distribution of your partitions may not be truly random (more partitions on some nodes) and there may be a large variation in the size of the partitions (smallest partition is a few kilobytes but largest partition is 2GB).
There are also 2 other possible scenarios to consider.
SINGLE-TOKEN CLUSTER
If the tokens are not correctly calculated, some nodes may have a larger token range compared to others. Use the token generation tool to get a list of tokens that is correctly distributed around the ring.
If the cluster is deployed with DataStax Enterprise, the easiest way is to rebalance your cluster with OpsCenter.
VNODES CLUSTER
Confirm that you have allocated the same number of tokens in cassandra.yaml with the num_tokens directive.
Unless you are using ByteOrderedPartitioner for your cluster that should not happen. See DataStax documentation here for more information about available partitioners and why it should not (normally) happen.
Installing Cassandra in a single node to run some tests, we noticed that we were using a RF of 3 and everything was working correctly.
This is of course because that node has 256 vnodes (by default) so the same data can be replicated in the same node in different vnodes.
This is worrying because if one node were to fail, you'd lose all your data even though you thought the data was replicated in different nodes.
How can I be sure that in a standard installation (with a ring with several nodes) the same data will not be replicated in the same "physical" node? Is there a setting to avoid Cassandra from using the same node for replicating data?
Replication strategy is schema dependent. You probably used the SimpleStrategy with RF=3 in your schema. That means that each piece of data will be placed on the node determined by the partition key, and successive replicas will be placed on the successive nodes. In your case, the successive node is the same physical node, hence you get 3 copies of your data there.
Increasing the number of nodes solves your problem. In general, your data will be placed in different physical nodes when your replication factor RF is less than/equal to your number of nodes N.
The other solution is to switch replication strategy and use the NetworkTopologyStrategy, usually used in multi datacenter clusters, and where you can specify how many replicas you want in each data center. This strategy
places replicas in the same data center by walking the ring clockwise
until reaching the first node in another rack. NetworkTopologyStrategy
attempts to place replicas on distinct racks because nodes in the same
rack (or similar physical grouping) often fail at the same time due to
power, cooling, or network issues.
Look at DataStax documentation for more information.
Without vnodes each physical node owns a single token range. With vnodes each physical node will own multiple, non-consecutive token ranges (aka a vnode), and furthermore vnodes are randomly assigned to physical nodes.
Which means that even when data gets replicated on the vnodes right next to the primary replica's node (i.e. when using SimpleStrategy) the replicas will - with high probability but not guaranteed - be on different physical nodes.
This random assignment can be seen in the output of nodetool ring.
More info can be found here.
Cassandra stores replicas on different nodes in the same keyspace. It would be nonsensical to have multiple replicas in the same keyspace. If the replication factor exceeds the number of nodes, than the number of nodes is your replication factor.
But, why is this not an error? Well, this allows for provisioning more nodes later.
As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later.
I have 3 nodes in cluster
Node1 = 127.0.0.1:9160
Node2 = 127.0.0.2:9161
Node3 = 127.0.0.3:9162
I want to use only one node(node1) for insertion. Other two nodes should be used for fault tolerance on writing millions of records. i.e. when node1 is down either node2 or node3 should take care of writing.For that I formed a cluster with replication factor of 2 and added seed nodes properly in cassandra.yalm file. It is working fine. But due to partition whenever I write the data to the node 1, rows are getting scattered across all the node in the cluster. So is there any way to use the nodes for only replication in the cluster?...Or is there any way to disable the partitioning?...
thanks in advance..
No. Cassandra is a fully distributed system.
What are you trying to achieve here? We have a 6 node cluster with RF=3 and since PlayOrm fixed the config bug they had in astyanax, even if we start getting one slow node, it automatically starts going to the other nodes to keep the system fast. Why would you want to avoid great features like that???? IF your primary node gets slow you would be screwed in your situation.
If you describe your use-case better, we might be able to give you better ideas.