Cassandra nodetool status shows ownership as 200% - cassandra

After setting up a 3 node cassandra cluster (cassandra version - 2.1.9), I ran the "nodetool status" command. I realized that the effective ownership % sums up to 200%.
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN <IP> 105.35 KB 256 67.4% <HostID> rack1
UN <IP> 121.92 KB 256 63.3% <HostID> rack1
UN <IP3> 256.11 KB 256 69.3% <HostID> rack1
Does any one know why would we get a 200% ownership? Is it because of some replication factor? If so, how do I find out about that?
Thanks!

This is dependent on the replication factor of the keyspace you are displaying.
For example, if you create a keyspace like this:
CREATE KEYSPACE test_keyspace WITH replication = {'class':
'NetworkTopologyStrategy', 'datacenter1': 2 };
And then display the status of that keyspace:
nodetool status test_keyspace
Then the Owns column will sum to 200%.
If you used a replication factor of 3, it would sum to 300%, and if you used a replication factor of 1, it would sum to 100%.
To see how a keyspace is defined, go into cqlsh and enter desc keyspace test_keyspace

Related

Elassandra replication information and rack configuration

I recently started working with an Elassandra cluster with two data centers which have been configured using NetworkTopologyStrategy.
Cluster details : Elassandra 6.2.3.15 = Elasticsearch 6.2.3 + Cassandra 3.11.4
Datacenter: DC1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <ip1> 50 GiB 256 ? 6cab1f4c-8937-437d-b010-0a5677443dc3 rack1
UN <ip2> 48 GiB 256 ? 6c9e7ad5-a642-4c0d-8b77-e78d821d904b rack1
UN <ip3> 50 GiB 256 ? 7e493bc6-c8a5-471e-8eee-3f3fe985b90a rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <ip4> 47 GiB 256 ? c49c1203-cc38-41a2-b9c8-2b42bc907c17 rack1
UN <ip5> 67 GiB 256 ? 0d9f31bc-9690-49b6-9d88-4fb30c1b6c0d rack1
UN <ip6> 88 GiB 256 ? 80c4d60d-185f-457a-ae9b-2eb611735f07 rack1
schema info
CREATE KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true;
The DC2 is kind of a Disaster Recovery site and in an ideal world, we should be able to use only that in case of a disaster.
With the very limited knowledge I have, I strongly suspect that we need
to modify the rack configuration to have a 'proper' D/R cluster (So
that data in DC1 gets replicated in DC2) or am I getting this
wrong? If so, is there a standard guideline to follow?
When there are multiple DCs, does Cassandra automatically replicate this regardless of rack configurations? (Are racks kind of additional fail proof?)
DC2 has more data than DC1. Is this purely related to hash function?
Is there any other things that can be rectified in this cluster?
Many thanks!
These replication settings mean that the data for your keyspace is replicated in real time between the 2 DCs with each DC having 3 replicas (copies):
CREATE KEYSPACE my_keyspace WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC1': '3',
'DC2': '3'
}
Replication in Cassandra happens in real time -- any writes sent to one DC is sent to all other DCs at the same time. Unlike traditional RDBMS or configurations with primary/secondary or active/DR, Cassandra replication is instantaneous and immediate.
The logical Cassandra racks are for additional redundancy mechanism. If you have C* nodes deployed in different (a) physical racks, or (b) public cloud availability zones, Cassandra will distribute the replicas to separate racks so each rack has a full copy of the data. With a replication factor of 3 in the DC, if a rack goes down for whatever reason then there's still full copies of the data in the remaining 2 racks and read/write requests with a consistency of LOCAL_QUORUM (or lower) will not be affected.
I've explained this in a bit more detail in this post -- https://community.datastax.com/questions/1128/.
If you're new to Cassandra, we recommend https://www.datastax.com/dev which has links to short hands-on tutorials where you can quickly learn the basics of Cassandra -- all free. This tutorial is a good place to start -- https://www.datastax.com/try-it-out. Cheers!

Third Cassandra node has different load

We had a cassandra cluster with 2 nodes in the same datacenter with a keyspace replication factor of 2 for keyspace "newts". If i ran nodetool status i could see that the load was somewhat the same between the two nodes and each node sharing 100%.
I went ahead and added a third node and i can see all three nodes in the nodetool status output. I increased the replication factor to three since i now have three nodes and ran "nodetool repair" on the third node. However when i now run nodetool status i can see that the load between the three nodes differ but each node owns 100%. How can this be and is there something im missing here?
nodetool -u cassandra -pw cassandra status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 84.19.159.94 38.6 GiB 256 100.0% 2d597a3e-0120-410a-a7b8-16ccf9498c55 rack1
UN 84.19.159.93 42.51 GiB 256 100.0% f746d694-c5c2-4f51-aa7f-0b788676e677 rack1
UN 84.19.159.92 5.84 GiB 256 100.0% 8f034b7f-fc2d-4210-927f-991815387078 rack1
nodetool status newts output:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 84.19.159.94 38.85 GiB 256 100.0% 2d597a3e-0120-410a-a7b8-16ccf9498c55 rack1
UN 84.19.159.93 42.75 GiB 256 100.0% f746d694-c5c2-4f51-aa7f-0b788676e677 rack1
UN 84.19.159.92 6.17 GiB 256 100.0% 8f034b7f-fc2d-4210-927f-991815387078 rack1
As you added a node and there are now three nodes and increased your replication factor to three - each node will have a copy of your data and so own 100% of your data.
The different volume for "Load" can result from not running nodetool cleanup after adding your third node on the two old nodes - old data in your sstables won't be removed when adding the node (but later after a cleanup and/or compaction):
Load - updates every 90 seconds The amount of file system data under
the cassandra data directory after excluding all content in the
snapshots subdirectories. Because all SSTable data files are included,
any data that is not cleaned up, such as TTL-expired cell or
tombstoned data) is counted.
(from https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsStatus.html)
You just run nodetool repair on all 3 nodes and run nodetool cleanup one by one on existing nodes then restart the node one after another seems it works.

Cassandra multi-region settings/optimization

I configured two DC with replication in two regions (NCSA and EMEA) using Janusgraph (Gremlin/Cassandra/Elasticsearch). The replication work well and everything, however the performance are not that great.
I get time of around 250ms just for a read on a node on NCSA (vs 30ms when I have only one 1 DC / 1 Node) and for a write it is around 800ms.
I tried to modify some configuration:
storage.cassandra.replication-factor
storage.cassandra.read-consistency-level
storage.cassandra.write-consistency-level
Is there any other settings/configurations that I could modify in order to get better performance for a multi-region setup or that kind of performance is expected with Janusgraph/Cassandra?
Thanks
The lowest time I was able to get were with
storage.replication-strategy-class=org.apache.cassandra.locator.NetworkTopologyStrategy
storage.cassandra.replication-factor=6
storage.cassandra.read-consistency-level=ONE
storage.cassandra.write-consistency-level=ONE
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.130.xxx.xxx 184.02 KB 256 100.0% 7c4c23f4-0112-4023-8af1-81a179f68973 RAC2
UN 10.130.xxx.xxx 540.67 KB 256 100.0% 193f0814-649f-4450-8b2e-85344f2c3cf2 RAC3
UN 10.130.xxx.xxx 187.47 KB 256 100.0% fbbc42d6-a061-4604-935e-dbe1155d4017 RAC1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.30.xxx.xxx 93.3 KB 256 100.0% e7221808-ccb4-414a-b5b6-6e578ecb6f25 RAC3
UN 10.30.xxx.xxx 287.62 KB 256 100.0% ca868262-4b5d-44d6-80f9-25439f8d2611 RAC2
UN 10.30.xxx.xxx 282.27 KB 256 100.0% 82d0f75d-635c-4016-84ca-ef9d1afda066 RAC1
Janusgraph comes with different caches levels, activate some of them may help.
About ConsistencyLevel, in a multi-dc configuration LOCAL_xxx values will provide better performances but for safety I will initialize the name of the local or closest Cassandra datacenter. (configuration parameter : storage.cassandra.astyanax.local-datacenter)
Are you able to say where the time is spent (in the Cassandra layer on in the JanusGraph layer)? To know what is the response time of Cassandra, you can run nodetool proxyhistograms which shows the full request latency recorded by the coordinator.

distribute graph data in cassandra

I am loading some graph data sing titan API and configured cassandra as the storage backend. My graph data has around 1 million vertices. I want this data to be distributed across N cassandra nodes.
So for this, I configured 3 nodes in same system with IPs for each node as 127.0.0.1 , 127.0.0.2 and 127.0.0.3. The output of nodetool status shows all 3 IPs and load shared equally.
I tried loading a graph but the whole data is replicated in all 3 nodes (1M vertices in node1, 1M vertices in node2 and 1M vertices in node3). I want the data to be distributed across all 3 nodes, like 1M/3 in node1, 1M/3 in node2 and 1M/3 in node3.
output of DESCRIBE KEYSPACE TITAN:
CREATE KEYSPACE titan WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
output of nodetool status:
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 7.79 MB 1 ? f5a689f0-f4c1-4f68-ab81-58066e986cd4 rack1
UN 127.0.0.2 229.79 KB 1 ? b6940e7e-b6eb-4d1f-959e-b5bd0f5cea15 rack1
UN 127.0.0.3 7.11 MB 1 ? a3244b16-a73c-4801-868f-05de09615ed9 rack1
Can someone please share me the details on correct configurations to share the load. Please correct me in case anything is wrong.
Thanks,
Hari

Replication Factor 3 but ownership total only 100% rather than 300%

I'm having a strange issue with some Cassandra clusters on a set of Solaris servers. Each of the clusters has 3 servers with its replication factor set to 3.
[admin#unknown] describe resolve;
WARNING: CQL3 tables are intentionally omitted from 'describe' output.
See https://issues.apache.org/jira/browse/CASSANDRA-4377 for details.
Keyspace: resolve:
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Durable Writes: true
Options: [replication_factor:3]
Column Families:
But when we run the "nodetool ring" command it is reporting that each server owns only 33.33% of the data.
Datacenter: datacenter1
==========
Address Rack Status State Load Owns Token
113427455640312821154458202477256070484
10.46.36.187 rack1 Up Normal 44.78 GB 33.33% 0
10.46.36.189 rack1 Up Normal 39.59 GB 33.33% 56713727820156410577229101238628035242
10.46.36.190 rack1 Up Normal 34.78 GB 33.33% 113427455640312821154458202477256070484
In all other clusters with the same settings they report 100% ownership. From this it appears that the replication factor being used is still 1. The other odd thing is that the nodetool ring is not printing out the "Replicas" line when our other clusters do. All our clusters are currently version 1.2.5.
I've tried running the "nodetool repair" command on all the nodes and re-ran the "update keyspace" command to set the replication_factor but the ownership percentage remains unchanged. Is there anything else I can look at or check to see why this is happening?
Edit:
This is what I normally see in my other clusters:
Datacenter: datacenter1
==========
Replicas: 3
Address Rack Status State Load Owns Token
113427455640312821154458202477256070484
10.50.2.65 rack1 Up Normal 126.65 KB 100.00% 0
10.50.2.66 rack1 Up Normal 122.15 KB 100.00% 56713727820156410577229101238628035242
10.50.2.67 rack1 Up Normal 122.29 KB 100.00% 113427455640312821154458202477256070484
You are misunderstanding the output. The owns information is just telling you, in terms of tokens, how much of the ring (%) is handled by the specific node.
The replication factor has nothing to do with this information: each node is responsible for 1/3 of the possible generated partition key tokens. Give a look at the token column.
If you want to verify your rf write some data with quorum cl , take a node out of the cluster, ask for the data with
quorum cl
HTH
Carlo

Resources