Replication Factor 3 but ownership total only 100% rather than 300% - cassandra

I'm having a strange issue with some Cassandra clusters on a set of Solaris servers. Each of the clusters has 3 servers with its replication factor set to 3.
[admin#unknown] describe resolve;
WARNING: CQL3 tables are intentionally omitted from 'describe' output.
See https://issues.apache.org/jira/browse/CASSANDRA-4377 for details.
Keyspace: resolve:
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Durable Writes: true
Options: [replication_factor:3]
Column Families:
But when we run the "nodetool ring" command it is reporting that each server owns only 33.33% of the data.
Datacenter: datacenter1
==========
Address Rack Status State Load Owns Token
113427455640312821154458202477256070484
10.46.36.187 rack1 Up Normal 44.78 GB 33.33% 0
10.46.36.189 rack1 Up Normal 39.59 GB 33.33% 56713727820156410577229101238628035242
10.46.36.190 rack1 Up Normal 34.78 GB 33.33% 113427455640312821154458202477256070484
In all other clusters with the same settings they report 100% ownership. From this it appears that the replication factor being used is still 1. The other odd thing is that the nodetool ring is not printing out the "Replicas" line when our other clusters do. All our clusters are currently version 1.2.5.
I've tried running the "nodetool repair" command on all the nodes and re-ran the "update keyspace" command to set the replication_factor but the ownership percentage remains unchanged. Is there anything else I can look at or check to see why this is happening?
Edit:
This is what I normally see in my other clusters:
Datacenter: datacenter1
==========
Replicas: 3
Address Rack Status State Load Owns Token
113427455640312821154458202477256070484
10.50.2.65 rack1 Up Normal 126.65 KB 100.00% 0
10.50.2.66 rack1 Up Normal 122.15 KB 100.00% 56713727820156410577229101238628035242
10.50.2.67 rack1 Up Normal 122.29 KB 100.00% 113427455640312821154458202477256070484

You are misunderstanding the output. The owns information is just telling you, in terms of tokens, how much of the ring (%) is handled by the specific node.
The replication factor has nothing to do with this information: each node is responsible for 1/3 of the possible generated partition key tokens. Give a look at the token column.
If you want to verify your rf write some data with quorum cl , take a node out of the cluster, ask for the data with
quorum cl
HTH
Carlo

Related

I add nodes(10 nodes) but cassandra-stress result is slower than single node?

Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.170.128 317.66 MiB 256 62.4% 45e953bd-5cca-44d9-ba26-99e0db28398d rack1
UN 192.168.170.129 527.05 MiB 256 60.2% e0d2faec-9714-49cf-af71-bfe2f2fb0783 rack1
UN 192.168.170.130 669.08 MiB 256 60.6% eaa1e39b-2256-4821-bbc8-39e47debf5e8 rack1
UN 192.168.170.132 537.11 MiB 256 60.0% 126e151f-92bc-4197-8007-247e385be0a6 rack1
UN 192.168.170.133 417.6 MiB 256 56.8% 2eb9dd83-ab44-456c-be69-6cead1b5d1fd rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.170.136 386.12 MiB 256 41.0% 2e57fac6-95db-4dc3-88f7-936cd8038cac rack1
UN 192.168.170.137 518.74 MiB 256 40.9% b6d61651-7c65-4ac9-a5b3-053c77cfbd37 rack1
UN 192.168.170.138 554.43 MiB 256 38.6% f1ba3e80-5dac-4a22-9025-85e868685de5 rack1
UN 192.168.170.134 153.76 MiB 256 40.7% 568389b3-304b-4d8f-ae71-58eb2a55601c rack1
UN 192.168.170.135 350.76 MiB 256 38.7% 1a7d557b-8270-4181-957b-98f6e2945fd8 rack1
CREATE KEYSPACE grudb WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '2'} AND durable_writes = true;
That's my setting.
CL IS ONE.
In general, a 10-node cluster can sustain a higher throughput, but whether or not this actually translates to higher "cassandra-stress" scores, depends on what exactly you're doing:
First, you need to ensure that the cassandra-stress client is not your bottleneck. For example if the machine running cassandra-stress is at 100% CPU or network utilization, you will never get a better score even if you have 100 server nodes.
Second, you need to ensure that cassandra-stress's concurrency is high enough. In the extreme case, if cassandra-stress sends just one request after another, all you're doing is measuring latency, not throughput. Moreover, it doesn't help if you have 100 nodes if you only send one request at a time to one them. So please try increase cassandra-stress's concurrency to see if that makes any difference.
Now that we got the potential cassandra-stress issues out of the way, let's look at the server. You didn't simply increase your cluster from 1 node to 10 nodes. If you just did that, you'd rightly be surprised if performance didn't increase. But you did something else: You increased to 10 nodes, but also greatly increased the work of writes - in your setup each write needs to go to 5 nodes (!), 3 on one DC and 2 on the other (those are the RFs you configured). So even in the best case, you can't expect write throughput to be more than twice better on this cluster than a single node. Actually, because of all the overhead of this replication, you'll expect even less than twice the performance - so having similar performance is not surprising.
The above estimate was for write performance. For read performance, since you said you're using CL=ONE (you can use CL=LOCAL_ONE, by the way), read throughput should indeed scale linearly with the cluster's size. If it does not, I am guessing you have a problem with the setup like I described above (client bottlenecked or using too little concurrency).
Please try to run read and write benchmarks separately to better understand which of them is the main scalability problem.

Third Cassandra node has different load

We had a cassandra cluster with 2 nodes in the same datacenter with a keyspace replication factor of 2 for keyspace "newts". If i ran nodetool status i could see that the load was somewhat the same between the two nodes and each node sharing 100%.
I went ahead and added a third node and i can see all three nodes in the nodetool status output. I increased the replication factor to three since i now have three nodes and ran "nodetool repair" on the third node. However when i now run nodetool status i can see that the load between the three nodes differ but each node owns 100%. How can this be and is there something im missing here?
nodetool -u cassandra -pw cassandra status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 84.19.159.94 38.6 GiB 256 100.0% 2d597a3e-0120-410a-a7b8-16ccf9498c55 rack1
UN 84.19.159.93 42.51 GiB 256 100.0% f746d694-c5c2-4f51-aa7f-0b788676e677 rack1
UN 84.19.159.92 5.84 GiB 256 100.0% 8f034b7f-fc2d-4210-927f-991815387078 rack1
nodetool status newts output:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 84.19.159.94 38.85 GiB 256 100.0% 2d597a3e-0120-410a-a7b8-16ccf9498c55 rack1
UN 84.19.159.93 42.75 GiB 256 100.0% f746d694-c5c2-4f51-aa7f-0b788676e677 rack1
UN 84.19.159.92 6.17 GiB 256 100.0% 8f034b7f-fc2d-4210-927f-991815387078 rack1
As you added a node and there are now three nodes and increased your replication factor to three - each node will have a copy of your data and so own 100% of your data.
The different volume for "Load" can result from not running nodetool cleanup after adding your third node on the two old nodes - old data in your sstables won't be removed when adding the node (but later after a cleanup and/or compaction):
Load - updates every 90 seconds The amount of file system data under
the cassandra data directory after excluding all content in the
snapshots subdirectories. Because all SSTable data files are included,
any data that is not cleaned up, such as TTL-expired cell or
tombstoned data) is counted.
(from https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsStatus.html)
You just run nodetool repair on all 3 nodes and run nodetool cleanup one by one on existing nodes then restart the node one after another seems it works.

Cassandra - can't understand why this simple setup won't work? [duplicate]

This question already has answers here:
Understanding Local_Quorum
(2 answers)
Closed 6 years ago.
I am a newbie to cassandra. I try those two simple setup: single data center cluster. But I can't understand why the second one won't work?
All nodes are installed cassandra 3.3 with configurations are as link:
https://docs.datastax.com/en/cassandra/3.x/cassandra/initialize/initSingleDS.html
SETUP 1:
Cluster size of 3 nodes: 192.168.1.201, 192.168.1.202, and 192.168.1.203
Replication factor of: 2
Write consistency level of: QUORUM (2 nodes)
Read consistency level of: QUORUM (2 nodes)
Datacenter: dc1
===============
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.1.201 4.57 MB 256 ? a0138a81-45f9-4df5-af97-362c1bd2e242 rack1
UN 192.168.1.202 1.67 MB 256 ? e8a73b59-8852-4e3d-951e-bf8e231d6b5f rack1
UN 192.168.1.203 4.87 MB 256 ? 7b02c94c-14c5-4b34-8a0d-dc16dec8c1f9 rack1
The all 3 nodes are up!
SETUP 2:
Cluster size of 4 nodes - 192.168.1.201, 192.168.1.202, 192.168.1.203, and 192.168.1.204
Replication factor of: 2
Write consistency level of: QUORUM (2 nodes)
Read consistency level of: QUORUM (2 nodes)
Datacenter: dc1
===============
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.1.201 3.87 MB 256 ? a0138a81-45f9-4df5-af97-362c1bd2e242 rack1
UN 192.168.1.202 2.54 MB 256 ? 42bcba40-3941-43af-b694-06c1d4f615cc rack1
UN 192.168.1.203 3.77 MB 256 ? 7b02c94c-14c5-4b34-8a0d-dc16dec8c1f9 rack1
DN 192.168.1.204 1.67 MB 256 ? e8a73b59-8852-4e3d-951e-bf8e231d6b5f rack1
As you can see, node 192.168.1.204 is down (I forced it down for testing purpose), and the other 3 nodes are still up!
But, it wont work. Everytime I run the query, it return with an error (using DevCenter GUI tool):
"Not enough replicas avaible for query at consistency QUORUM (2 required but only 1 alive)"
If I use nodetool to removenode 192.168.1.204, and SETUP 2 becomes SETUP 1 then it works again.
I thinks, the SETUP 2 should run well as the SETUP 1?
Can someone explains why?
To achieve a quorum (more than half) you need the number of replicas / 2 + 1.
You have a 4 node cluster, with 3 up. With a replication factor of 2 you need both nodes up (2/2+1 = 2 of the 2 replicas) for a quorum to succeed. If a piece of data you have would belong on the downed node you would not be able to satisfy the quorum requirement so you get that error. With a consistency level of ONE it would work however. In order to have 1 node down and still be able to achieve a quorum you need to set the replication factor to at least 3 (3/2+1 = 2 of the 3 required).

Cassandra nodetool status shows ownership as 200%

After setting up a 3 node cassandra cluster (cassandra version - 2.1.9), I ran the "nodetool status" command. I realized that the effective ownership % sums up to 200%.
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN <IP> 105.35 KB 256 67.4% <HostID> rack1
UN <IP> 121.92 KB 256 63.3% <HostID> rack1
UN <IP3> 256.11 KB 256 69.3% <HostID> rack1
Does any one know why would we get a 200% ownership? Is it because of some replication factor? If so, how do I find out about that?
Thanks!
This is dependent on the replication factor of the keyspace you are displaying.
For example, if you create a keyspace like this:
CREATE KEYSPACE test_keyspace WITH replication = {'class':
'NetworkTopologyStrategy', 'datacenter1': 2 };
And then display the status of that keyspace:
nodetool status test_keyspace
Then the Owns column will sum to 200%.
If you used a replication factor of 3, it would sum to 300%, and if you used a replication factor of 1, it would sum to 100%.
To see how a keyspace is defined, go into cqlsh and enter desc keyspace test_keyspace

what is the meaning of owns% in cassandra and how to change it?

I've configured the cannsadra cluster (cassandra-1.1) of 4 instances.
I have 2 pc's and i'm running 2 instances at each pc.
The pc's are identical, and have 20G ram.
But, when I'm running nodetool it show me different Owns %. The question is WHY?
./bin/nodetool -p 8001 ring
Note: Ownership information does not include topology, please specify a keyspace.
Address DC Rack Status State Load Owns Token 51042355038140769519506191114765231718
172.16.40.32 datacenter1 rack1 Up Normal 11.12 KB 70.00% 0
127.0.0.2 datacenter1 rack1 Up Normal 11.31 KB 10.00% 17014118346046923173168730371588410572
172.16.40.202 datacenter1 rack1 Up Normal 6.7 KB 10.00% 34028236692093846346337460743176821145
127.0.0.3 datacenter1 rack1 Up Normal 11.18 KB 10.00% 51042355038140769519506191114765231718
my free -m looks on both machines:
total used free shared buffers cached
Mem: 20119 9621 10497 0 281 7925
-/+ buffers/cache: 1414 18704
Swap: 2894 2 2892
The percentage is determined by the token distribution across the nodes. The token range for Cassandra go from 0 to 2^127 (170141183460469231731687303715884105728). Your ring's tokens are not evenly distributed between 0 and 2^127 so that is why you have one node with 70% ownership. You can use nodetool move to get your ring in balance.
There is a simple python script on the Cassandra wiki that will generate evenly balanced tokens. I also wrote a simple tool to help visualize your ring topology.

Resources