Data not distributed across cluster in Cassandra - cassandra

We are using a 3 node cluster with REPLICATION = {'class':'SimpleStrategy' , 'replication_factor':1 }
But when we are inserting data , the same row is present in all three nodes (I see it when I run it on each node individually)
When I run nodetool status (I see the below) :
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.31.46.89 6.43 MiB 256 32.8% 2db6dc5c-9d05-4dc7-9bf5-ea9e3c406267 rack1
UN 172.31.47.150 13.17 MiB 256 32.1% eb10cc48-6117-427c-9151-48cb6761a5e6 rack1
DN 172.31.45.131 12.73 MiB 256 35.1% cc33fc04-a02f-41e2-a00b-3835a0d98cb5 rack1
Can anyone help me to understand why data is present in all nodes???

Cassandra is masterless and when you make a query to any node in the cluster it will request the appropriate replica to answer your query. The data will not be stored on all nodes with RF=1. If really want to verify it look at your data/keyspace/table directory and use the sstabledump on the Data file.

Data will not be stored on all nodes when RF=1. Instead when you connect with any node it act as a coordinator node and fetch data from node responsible for the data and provides the response.
The coordinator only stores data locally (on a write) if it ends up being one of the nodes responsible for the data's token range.

Related

Why is the load different on a 3 node cluster with RF 3?

I have a 3 node Cassandra cluster with a replication factor of 3.
This means that all data should be replication on to all 3 nodes.
The following is the output of nodetool status:
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.0.1 27.66 GB 256 100.0% 2e89198f-bc7d-4efd-bf62-9759fd1d4acc RAC1
UN 192.168.0.2 28.77 GB 256 100.0% db5fd62d-3381-42fa-84b5-7cb12f3f946b RAC1
UN 192.168.0.3 27.08 GB 256 100.0% 1ffb4798-44d4-458b-a4a8-a8898e0152a2 RAC1
This is a graph of disk usage over time on all 3 of the nodes:
My question is why do these sizes vary so much? Is it that compaction hasn't run at the same time?
I would say several factors could play a role here.
As you note, compaction will not run at the same time, so the number and contents of the SSTables will be somewhat different on each node.
The memtables will also not have been flushed to SSTables at the same time either, so right from the start, each node will have somewhat different SSTables.
If you're using compression for the SSTables, given that their contents are somewhat different, the amount of space saved by compressing the data will vary somewhat.
And even though you are using a replication factor of three, I would imagine the storage space for non-primary range data is slightly different than the storage space for primary range data, and it's likely that more primary range data is being mapped to one node or the other.
So basically unless each node saw the exact same sequence of messages at exactly the same time, then they wouldn't have exactly the same size of data.

Now getting error "message="Cannot achieve consistency level ONE" info={'required_replicas': 1, 'alive_replicas': 0, 'consistency': 'ONE'}"

Cassandra version: dsc-cassandra-2.1.9
Had 3 nodes, one of which was down for a long time. Brought it back up and decomissioned it. Then did a nodetool removenode.
When I try to make a cql query I get the above error.
Initially thought this might be because replication strategy was SimpleStrategy. So did a ALTER KEYSPACE history WITH REPLICATION =
{'class' : 'NetworkTopologyStrategy', 'dc1' : 2};
and changed the endpoint_snitch: GossipingPropertyFileSnitch instead of SimpleSnitch
did a nodetool repair on both nodes and restarted the cassandra services
But the problem is still there. What do I do?
EDIT 1: Nodetool status of machine A
-- Address Load Tokens Owns Host ID Rack
UN 192.168.99.xxx 19.8 GB 256 ? xxxxxxxx-xxxx-xxx-xxxx-xxxxx4ea RAC1
UN 192.168.99.xxx 18.79 GB 256 ? xxxxxxxx-xxxx-xxx-xxxx-xxxxxx15 RAC1
nodetool status output of machine B
-- Address Load Tokens Owns Host ID Rack
UN 192.168.99.xxx 19.8 GB 256 ? xxxxxxxx-xxxx-xxx-xxxx-xxxxxxxx4ea RAC1
UN 192.168.99.xxx 18.79 GB 256 ? xxxxxxxx-xxxx-xxx-xxxx-xxxxxxxxf15 RAC1
What is weird is that under the columns Owns you have no %, only the ?. This same issue occured to me in the past when I bootstrapped a new C* cluster and was using SimpleStrategy and SimpleSnitch. I did like you an ALTER KEYSPACE to switch to NetworkTopology and GossipingPropertyFileSnitch but it did not solve my issue so I rebuilt the cluster from scratch (fortunately I had no data inside)
If you have a backup of your data somewhere, just try to rebuild the 2 nodes from scratch.
Otherwise, consider backing up your sstable files on one node and rebuild the cluster and put the sstables back. Be careful because some file name/folder renaming may be necessary

Reshuffle data evenly across Cassandra ring

I have three-node ring of Apache Cassandra 2.1.12. I inserted some data when it was 2-node ring and then added one more 172.16.5.54 node in the ring. I am using the vnode in my ring. The problem is data is not distributed evenly where as ownership seems distributed evenly. So, how to redistribute the data aross the ring. I have tried with nodetool repair and nodetool cleanup but still no luck.
Moreover, what does this load and own column signify in the nodetool status output.
Also, If out of these three-node if i import the data from one of the node from the file. So, CPU utilization goes upto 100% and finally data on the rest of the two nodes get distributed evenly but not on import running node. Why is it so?
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.16.5.54 1.47 MB 256 67.4% 40d07f44-eef8-46bf-9813-4155ba753370 rack1
UN 172.16.4.196 165.65 MB 256 68.3% 6315bbad-e306-4332-803c-6f2d5b658586 rack1
UN 172.16.3.172 64.69 MB 256 64.4% 26e773ea-f478-49f6-92a5-1d07ae6c0f69 rack1
The columns in the output are explained for cassandra 2.1.x in this doc. The load is the amount of file system data in the cassandra data directories. It seems unbalanced across your 3 nodes, which might imply that your partition keys are clustering on a single node (172.16.4.196), sometimes called a hot spot.
The Owns column is "the percentage of the data owned by the node per datacenter times the replication factor." So I can deduce your RF=2 because each node Owns roughly 2/3 of the data.
You need to fix your partition keys of tables.
Cassandra distributes the data based on partition keys to nodes (using hash partitioning range).
So, for some reason you have alot of data for few partition key value, and almost non for rest partition key values.

Replication Factor 3 but ownership total only 100% rather than 300%

I'm having a strange issue with some Cassandra clusters on a set of Solaris servers. Each of the clusters has 3 servers with its replication factor set to 3.
[admin#unknown] describe resolve;
WARNING: CQL3 tables are intentionally omitted from 'describe' output.
See https://issues.apache.org/jira/browse/CASSANDRA-4377 for details.
Keyspace: resolve:
Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Durable Writes: true
Options: [replication_factor:3]
Column Families:
But when we run the "nodetool ring" command it is reporting that each server owns only 33.33% of the data.
Datacenter: datacenter1
==========
Address Rack Status State Load Owns Token
113427455640312821154458202477256070484
10.46.36.187 rack1 Up Normal 44.78 GB 33.33% 0
10.46.36.189 rack1 Up Normal 39.59 GB 33.33% 56713727820156410577229101238628035242
10.46.36.190 rack1 Up Normal 34.78 GB 33.33% 113427455640312821154458202477256070484
In all other clusters with the same settings they report 100% ownership. From this it appears that the replication factor being used is still 1. The other odd thing is that the nodetool ring is not printing out the "Replicas" line when our other clusters do. All our clusters are currently version 1.2.5.
I've tried running the "nodetool repair" command on all the nodes and re-ran the "update keyspace" command to set the replication_factor but the ownership percentage remains unchanged. Is there anything else I can look at or check to see why this is happening?
Edit:
This is what I normally see in my other clusters:
Datacenter: datacenter1
==========
Replicas: 3
Address Rack Status State Load Owns Token
113427455640312821154458202477256070484
10.50.2.65 rack1 Up Normal 126.65 KB 100.00% 0
10.50.2.66 rack1 Up Normal 122.15 KB 100.00% 56713727820156410577229101238628035242
10.50.2.67 rack1 Up Normal 122.29 KB 100.00% 113427455640312821154458202477256070484
You are misunderstanding the output. The owns information is just telling you, in terms of tokens, how much of the ring (%) is handled by the specific node.
The replication factor has nothing to do with this information: each node is responsible for 1/3 of the possible generated partition key tokens. Give a look at the token column.
If you want to verify your rf write some data with quorum cl , take a node out of the cluster, ask for the data with
quorum cl
HTH
Carlo

how to rebalance cassandra cluster after adding new node

I had a 3 node cassandra cluster with replication factor of 2. The nodes were running either dsc1.2.3 or dsc1.2.4. Each node had num_token value of 256 and initial_token was commented. This 3 node cluster was perfectly balanced i.e. each owned around 30% of the data.
One of the nodes crashed so I started a new node and nodetool removed the node that had crashed. The new node got added to the cluster but the two older nodes have most of the data now (47.0% and 52.3%) and the new node has just 0.7% of the data.
The output of nodetool status is
Datacenter: xx-xxxx
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.xxx.xxx.xxx 649.78 MB 256 47.0% ba3534b3-3d9f-4db7-844d-39a8f98618f1 1c
UN 10.xxx.xxx.xxx 643.11 MB 256 52.3% 562f7c3f-986a-4ba6-bfda-22a10e384960 1a
UN 10.xxx.xxx.xxx 6.84 MB 256 0.7% 5ba6aff7-79d2-4d62-b5b0-c5c67f1e1791 1c
How do i balance this cluster?
You didn't mention running a repair on the new node, if indeed you haven't yet done that it's likely the cause of your lack of data on the new node.
Until you run a nodetool repair the new node will only hold the new data that gets written to it or the data that read-repair pulls in. With vnodes you generally shouldn't need to re-balance, if I'm understanding vnodes correctly, but I haven't personally yet moved to using vnodes so I may be wrong about that.
It looks like your new node hasn't bootstrapped. Did you add auto_bootstrap=true to your cassandra.yaml?
If you don't want to bootstrap, you can run nodetool repair on the new node and then nodetool cleanup on the two others until the distribution is fair.

Resources