Cassandra multi node balancing - cassandra

I have added a new node into the cluster and was expecting the data on Cassandra to balance itself across nodes.
node status yields
$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.128.0.7 270.75 GiB 256 48.6% 1a3f6faa-4376-45a8-9c20-11480ae5664c rack1
UN 10.128.0.14 414.36 KiB 256 51.4% 66a89fbf-08ba-4b5d-9f10-55d52a199b41 rack1
Load of node 2 is just 400KB, we have time series data and query on that. how can I rebalance the load between these clusters?
configuration for both nodes are
cluster_name: 'cluster1'
- seeds: "node1_ip, node2_ip"
num_tokens: 256
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap: false
thank you for your time :)

I have added a new node into the cluster and was expecting the data on Cassandra to balance itself across nodes.
Explicitly setting `auto_bootstrap: false' tells it not to do that.
how can I rebalance the load?
Set your keyspace to a RF of 2.
Run nodetool -h 10.128.0.14 repair.
-Or-
Take the 10.128.0.14 out of the cluster.
Set auto_bootstrap: true (or just remove it).
And start the node up. It should join and stream data.
Pro-tip: With a data footprint of 270GB, you should have been running with more than one node to begin with. It would have been much easier to start with 3 nodes (which is probably the minimum you should be running on).

Related

Elassandra replication information and rack configuration

I recently started working with an Elassandra cluster with two data centers which have been configured using NetworkTopologyStrategy.
Cluster details : Elassandra 6.2.3.15 = Elasticsearch 6.2.3 + Cassandra 3.11.4
Datacenter: DC1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <ip1> 50 GiB 256 ? 6cab1f4c-8937-437d-b010-0a5677443dc3 rack1
UN <ip2> 48 GiB 256 ? 6c9e7ad5-a642-4c0d-8b77-e78d821d904b rack1
UN <ip3> 50 GiB 256 ? 7e493bc6-c8a5-471e-8eee-3f3fe985b90a rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <ip4> 47 GiB 256 ? c49c1203-cc38-41a2-b9c8-2b42bc907c17 rack1
UN <ip5> 67 GiB 256 ? 0d9f31bc-9690-49b6-9d88-4fb30c1b6c0d rack1
UN <ip6> 88 GiB 256 ? 80c4d60d-185f-457a-ae9b-2eb611735f07 rack1
schema info
CREATE KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true;
The DC2 is kind of a Disaster Recovery site and in an ideal world, we should be able to use only that in case of a disaster.
With the very limited knowledge I have, I strongly suspect that we need
to modify the rack configuration to have a 'proper' D/R cluster (So
that data in DC1 gets replicated in DC2) or am I getting this
wrong? If so, is there a standard guideline to follow?
When there are multiple DCs, does Cassandra automatically replicate this regardless of rack configurations? (Are racks kind of additional fail proof?)
DC2 has more data than DC1. Is this purely related to hash function?
Is there any other things that can be rectified in this cluster?
Many thanks!
These replication settings mean that the data for your keyspace is replicated in real time between the 2 DCs with each DC having 3 replicas (copies):
CREATE KEYSPACE my_keyspace WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC1': '3',
'DC2': '3'
}
Replication in Cassandra happens in real time -- any writes sent to one DC is sent to all other DCs at the same time. Unlike traditional RDBMS or configurations with primary/secondary or active/DR, Cassandra replication is instantaneous and immediate.
The logical Cassandra racks are for additional redundancy mechanism. If you have C* nodes deployed in different (a) physical racks, or (b) public cloud availability zones, Cassandra will distribute the replicas to separate racks so each rack has a full copy of the data. With a replication factor of 3 in the DC, if a rack goes down for whatever reason then there's still full copies of the data in the remaining 2 racks and read/write requests with a consistency of LOCAL_QUORUM (or lower) will not be affected.
I've explained this in a bit more detail in this post -- https://community.datastax.com/questions/1128/.
If you're new to Cassandra, we recommend https://www.datastax.com/dev which has links to short hands-on tutorials where you can quickly learn the basics of Cassandra -- all free. This tutorial is a good place to start -- https://www.datastax.com/try-it-out. Cheers!

Cassandra removing node and use it seperatel

I have a few questions about Cassandra, waiting for suggestions from experts. Thank you.
I'm using 3 nodes with replication factor 3. And all nodes own %100 of data.
[root#datab ~]# nodetool status mydatabase
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.1 11.83 GiB 8 100.0% c0fb9cfc-b20a-4c65-93b6-8e14107cc411 r1
UN 192.168.1.2 20.13 GiB 8 100.0% dd940c60-9645-45ca-894d-f0af49a7bd83 r1
UN 192.168.1.3 17.27 GiB 8 100.0% 3031587e-5354-4342-9ddc-e5696985fc8c r1
I want to remove node 192.168.1.3 and use this server as separated for testing purpose, I mean I want to keep %100 of data till removing the node.
I tried to decommission, but I couldn't use as separated.
For example, I have a table with 100gb data in it and select all queries returning slower. This 3 nodes running on separated hardware (servers). If I add 10 nodes for each server with docker, it makes queries run faster?
For 3 node, what is the difference between having replication factor 2 and 3? 3 node with replication factor keeps %100 data, but whenever I alter replication factor to 2, data percentages are getting down in few seconds, with factor 2 if I lose one of server, do I lost any data?
Whats the proper step to remove 1 node from the dc1 ?
changing factor to 2 ?
removenode ID ?
or first remove node than change factor ?
Thank you !!

Third Cassandra node has different load

We had a cassandra cluster with 2 nodes in the same datacenter with a keyspace replication factor of 2 for keyspace "newts". If i ran nodetool status i could see that the load was somewhat the same between the two nodes and each node sharing 100%.
I went ahead and added a third node and i can see all three nodes in the nodetool status output. I increased the replication factor to three since i now have three nodes and ran "nodetool repair" on the third node. However when i now run nodetool status i can see that the load between the three nodes differ but each node owns 100%. How can this be and is there something im missing here?
nodetool -u cassandra -pw cassandra status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 84.19.159.94 38.6 GiB 256 100.0% 2d597a3e-0120-410a-a7b8-16ccf9498c55 rack1
UN 84.19.159.93 42.51 GiB 256 100.0% f746d694-c5c2-4f51-aa7f-0b788676e677 rack1
UN 84.19.159.92 5.84 GiB 256 100.0% 8f034b7f-fc2d-4210-927f-991815387078 rack1
nodetool status newts output:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 84.19.159.94 38.85 GiB 256 100.0% 2d597a3e-0120-410a-a7b8-16ccf9498c55 rack1
UN 84.19.159.93 42.75 GiB 256 100.0% f746d694-c5c2-4f51-aa7f-0b788676e677 rack1
UN 84.19.159.92 6.17 GiB 256 100.0% 8f034b7f-fc2d-4210-927f-991815387078 rack1
As you added a node and there are now three nodes and increased your replication factor to three - each node will have a copy of your data and so own 100% of your data.
The different volume for "Load" can result from not running nodetool cleanup after adding your third node on the two old nodes - old data in your sstables won't be removed when adding the node (but later after a cleanup and/or compaction):
Load - updates every 90 seconds The amount of file system data under
the cassandra data directory after excluding all content in the
snapshots subdirectories. Because all SSTable data files are included,
any data that is not cleaned up, such as TTL-expired cell or
tombstoned data) is counted.
(from https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsStatus.html)
You just run nodetool repair on all 3 nodes and run nodetool cleanup one by one on existing nodes then restart the node one after another seems it works.

Reshuffle data evenly across Cassandra ring

I have three-node ring of Apache Cassandra 2.1.12. I inserted some data when it was 2-node ring and then added one more 172.16.5.54 node in the ring. I am using the vnode in my ring. The problem is data is not distributed evenly where as ownership seems distributed evenly. So, how to redistribute the data aross the ring. I have tried with nodetool repair and nodetool cleanup but still no luck.
Moreover, what does this load and own column signify in the nodetool status output.
Also, If out of these three-node if i import the data from one of the node from the file. So, CPU utilization goes upto 100% and finally data on the rest of the two nodes get distributed evenly but not on import running node. Why is it so?
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.16.5.54 1.47 MB 256 67.4% 40d07f44-eef8-46bf-9813-4155ba753370 rack1
UN 172.16.4.196 165.65 MB 256 68.3% 6315bbad-e306-4332-803c-6f2d5b658586 rack1
UN 172.16.3.172 64.69 MB 256 64.4% 26e773ea-f478-49f6-92a5-1d07ae6c0f69 rack1
The columns in the output are explained for cassandra 2.1.x in this doc. The load is the amount of file system data in the cassandra data directories. It seems unbalanced across your 3 nodes, which might imply that your partition keys are clustering on a single node (172.16.4.196), sometimes called a hot spot.
The Owns column is "the percentage of the data owned by the node per datacenter times the replication factor." So I can deduce your RF=2 because each node Owns roughly 2/3 of the data.
You need to fix your partition keys of tables.
Cassandra distributes the data based on partition keys to nodes (using hash partitioning range).
So, for some reason you have alot of data for few partition key value, and almost non for rest partition key values.

how to rebalance cassandra cluster after adding new node

I had a 3 node cassandra cluster with replication factor of 2. The nodes were running either dsc1.2.3 or dsc1.2.4. Each node had num_token value of 256 and initial_token was commented. This 3 node cluster was perfectly balanced i.e. each owned around 30% of the data.
One of the nodes crashed so I started a new node and nodetool removed the node that had crashed. The new node got added to the cluster but the two older nodes have most of the data now (47.0% and 52.3%) and the new node has just 0.7% of the data.
The output of nodetool status is
Datacenter: xx-xxxx
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.xxx.xxx.xxx 649.78 MB 256 47.0% ba3534b3-3d9f-4db7-844d-39a8f98618f1 1c
UN 10.xxx.xxx.xxx 643.11 MB 256 52.3% 562f7c3f-986a-4ba6-bfda-22a10e384960 1a
UN 10.xxx.xxx.xxx 6.84 MB 256 0.7% 5ba6aff7-79d2-4d62-b5b0-c5c67f1e1791 1c
How do i balance this cluster?
You didn't mention running a repair on the new node, if indeed you haven't yet done that it's likely the cause of your lack of data on the new node.
Until you run a nodetool repair the new node will only hold the new data that gets written to it or the data that read-repair pulls in. With vnodes you generally shouldn't need to re-balance, if I'm understanding vnodes correctly, but I haven't personally yet moved to using vnodes so I may be wrong about that.
It looks like your new node hasn't bootstrapped. Did you add auto_bootstrap=true to your cassandra.yaml?
If you don't want to bootstrap, you can run nodetool repair on the new node and then nodetool cleanup on the two others until the distribution is fair.

Resources