Cassandra partitioned despite having connectivity and matching schema - cassandra

I have a 3 node cassandra (3.0.15) cluster, where nodes seem to have partitioned. nodetool status and nodetool describecluster outputs are inconsistent when seen from different nodes. Gossip information (nodetool gossipinfo) seems to be up to date on all nodes. No errors in the logs other than read timeouts which seems to be due from the partitioning.
Attempts to resolve this that didn't work:
Rolling restart
Full cluster restart
disable/enable gossip on each node
Node1 (192.168.2.247):
$ nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
59ffe9aa-aca7-34b3-8c5e-b736d221b922: [192.168.2.248, 192.168.2.247]
UNREACHABLE: [192.168.2.249]
$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.2.248 210.23 GB 256 67.7% f6593c49-6739-4b8c-a8d9-8321915a660d rack1
DN 192.168.2.249 190.9 GB 256 68.6% 6ac211c2-bab1-4e2b-bc84-18d911e005d0 rack1
UN 192.168.2.247 157.82 GB 256 63.7% eb462857-cbfb-4e41-8be9-5d241d273b81 rack1
Node2 (192.168.2.248):
$ nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
59ffe9aa-aca7-34b3-8c5e-b736d221b922: [192.168.2.248, 192.168.2.249, 192.168.2.247]
$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.2.248 210.24 GB 256 67.7% f6593c49-6739-4b8c-a8d9-8321915a660d rack1
UN 192.168.2.249 190.9 GB 256 68.6% 6ac211c2-bab1-4e2b-bc84-18d911e005d0 rack1
UN 192.168.2.247 157.82 GB 256 63.7% eb462857-cbfb-4e41-8be9-5d241d273b81 rack1
Node3 (192.168.2.249):
$ nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
59ffe9aa-aca7-34b3-8c5e-b736d221b922: [192.168.2.248]
UNREACHABLE: [192.168.2.249, 192.168.2.247]
$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.2.248 210.24 GB 256 67.7% f6593c49-6739-4b8c-a8d9-8321915a660d rack1
UN 192.168.2.249 190.92 GB 256 68.6% 6ac211c2-bab1-4e2b-bc84-18d911e005d0 rack1
UN 192.168.2.247 157.84 GB 256 63.7% eb462857-cbfb-4e41-8be9-5d241d273b81 rack1

Related

Cassandra multi dc replication nodetool rebuild issue and schema mismatch

Team,
I have a Cassandra cluster of 6 nodes and I've added a new datacenter to the existing setup of 5 nodes. I’ve followed all the steps but getting the below error when I run nodetool rebuild on the new dc’s nodes.
nodetool rebuild -- datacenter1
nodetool: Unable to find sufficient sources for streaming range (389537640310009811,390720100939923035] in keyspace system_distributed
See 'nodetool help' or 'nodetool help <command>'.
Nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.21.201.205 1.75 GiB 256 16.5% ff0accd4-c33a-4984-967f-3ec763fe5414 rack1
UN 172.21.201.45 1.55 GiB 256 17.0% d3ac5afa-d561-43ee-89e2-db1d20c59b38 rack1
UN 172.21.201.44 2.37 GiB 256 17.0% 73d8e6c6-0aa3-4a91-80fc-8c7068c78a64 rack1
UN 172.21.201.207 1020.15 MiB 256 16.0% 5751ea7d-b339-43b3-bcfe-89fcbc60dea0 rack1
UN 172.21.201.46 1.64 GiB 256 17.0% 1c1afbfc-6a4b-40f0-a4c3-1eaa543eb2d5 rack1
UN 172.21.201.206 1.13 GiB 256 17.2% b11bfef9-e708-45cc-9ab3-e52983834096 rack1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.41.6.155 983.91 KiB 256 17.3% bf7244bb-70dc-4d91-8131-cbe4886f09e7 rack1
UN 10.41.6.157 946.36 KiB 256 15.5% 5499e7cc-db23-4163-8f0c-8f437f61bd6f rack1
UN 10.41.6.156 1.14 MiB 256 15.3% f27e94a6-7e1c-4177-9f88-36d821a7808d rack1
UN 10.41.6.159 659.3 KiB 256 17.3% 453e97df-5b83-4798-9e5e-a13bbb33acee rack1
UN 10.41.6.158 909.51 KiB 256 18.2% a4bc046a-e2ef-4fd4-9ab7-18be642a4d5a rack1
UN 10.41.6.160 1.08 MiB 256 15.5% 267cf9d0-cd55-4186-a737-998443125b19 rack1
node tool describe cluster status
#nodetool describecluster
Cluster Information:
Name: Recommendation
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
ea63e099-37c5-3d7b-9ace-32f4c833653d: [10.41.6.155, 10.41.6.157, 10.41.6.156, 10.41.6.159, 10.41.6.158, 10.41.6.160]
fd507b64-3070-3ffd-8217-f45f2f188dfc: [172.21.201.205, 172.21.201.45, 172.21.201.44, 172.21.201.207, 172.21.201.206, 172.21.201.46]
OLD DC Cassandra version - 3.1.1
NEW DC Cassandra version - 3.11.4
Can someone quickly help me fix this issue?
Changing the replication factor of the system_distributed keyspace to NetworkTopologyStrategy to include both dcs worked for me.
Command executed
ALTER KEYSPACE system_distributed WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 2, 'dc1' : 2};
Output
system_distributed | True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '2', 'dc1': '2'}
After further research, I've learnt that the Cassandra version difference on both DC's caused this issue. When I matched the Cassandra version on both DC's the schema mismatch issue got resolved.

Cassandra UNREACHABLE node

I have Cassandra v3.9, 3 node cluster. Replication factor 2
10.0.0.11,10.0.0.12,10.0.0.13
10.0.0.11,10.0.0.12 are seed nodes
What could be the possible reasons for following error in /etc/cassandra/conf/debug.log
The error is
DEBUG [RMI TCP Connection(174)-127.0.0.1] 2017-07-12 04:47:49,002 StorageProxy.java:2254 - Hosts not in agreement. Didn't get a response from everybody: 10.0.0.13
UPDATE1:
At the time of the error here are some statistics from all the servers
[user1#ip-10-0-0-11 ~]$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.0.0.12 2.55 MiB 256 67.2% 83a68750-2238-4a6e-87be-03a3d7246824 rack1
UN 10.0.0.11 1.78 GiB 256 70.6% 052fda9d-0474-4dfb-b2f8-0c5cbec15266 rack1
UN 10.0.0.13 1.78 GiB 256 62.2% 86438dc9-77e0-43b2-a672-5b2e7cf216bf rack1
[user1#ip-10-0-0-11 ~]$ nodetool describecluster
Cluster Information:
Name: PiedmontCluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
3c8d9e82-c688-3d16-a3e9-b84894168283: [10.0.0.12, 10.0.0.11]
UNREACHABLE: [10.0.0.13]
[pnm#ip-10-0-0-13 ~]$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.0.0.12 2.55 MiB 256 67.2% 83a68750-2238-4a6e-87be-03a3d7246824 rack1
UN 10.0.0.11 1.78 GiB 256 70.6% 052fda9d-0474-4dfb-b2f8-0c5cbec15266 rack1
UN 10.0.0.13 1.78 GiB 256 62.2% 86438dc9-77e0-43b2-a672-5b2e7cf216bf rack1
[pnm#ip-10-0-0-13 ~]$ nodetool describecluster
Cluster Information:
Name: PiedmontCluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
3c8d9e82-c688-3d16-a3e9-b84894168283: [10.0.0.12, 10.0.0.13]
UNREACHABLE: [10.0.0.11]
[user1#ip-10-0-0-12 ~]$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.0.0.12 2.55 MiB 256 67.2% 83a68750-2238-4a6e-87be-03a3d7246824 rack1
UN 10.0.0.11 1.78 GiB 256 70.6% 052fda9d-0474-4dfb-b2f8-0c5cbec15266 rack1
UN 10.0.0.13 1.78 GiB 256 62.2% 86438dc9-77e0-43b2-a672-5b2e7cf216bf rack1
[user1#ip-10-0-0-12 ~]$ nodetool describecluster
Cluster Information:
Name: PiedmontCluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
3c8d9e82-c688-3d16-a3e9-b84894168283: [10.0.0.12, 10.0.0.11, 10.0.0.13]
The above mentioned error is in the /var/log/cassandra/debug.log on 10.0.0.11
Error in /var/log/cassandra/debug.php on 10.0.0.13 is
DEBUG [RMI TCP Connection(4)-127.0.0.1] 2017-07-13 02:31:23,846 StorageProxy.java:2254 - Hosts not in agreement. Didn't get a response from everybody: 10.0.0.11
ERROR [MessagingService-Incoming-/10.0.0.11] 2017-07-13 02:35:04,982 CassandraDaemon.java:226 - Exception in thread Thread[MessagingService-Incoming-/10.0.0.11,5,main]
java.lang.ArrayIndexOutOfBoundsException: 4
at org.apache.cassandra.db.filter.AbstractClusteringIndexFilter$FilterSerializer.deserialize(AbstractClusteringIndexFilter.java:74) ~[apache-cassandra-3.9.0.jar:3.9.0]
at org.apache.cassandra.db.SinglePartitionReadCommand$Deserializer.deserialize(SinglePartitionReadCommand.java:1041) ~[apache-cassandra-3.9.0.jar:3.9.0]
at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:696) ~[apache-cassandra-3.9.0.jar:3.9.0]
at org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:626) ~[apache-cassandra-3.9.0.jar:3.9.0]
at org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50) ~[apache-cassandra-3.9.0.jar:3.9.0]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:114) ~[apache-cassandra-3.9.0.jar:3.9.0]
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:190) ~[apache-cassandra-3.9.0.jar:3.9.0]
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) ~[apache-cassandra-3.9.0.jar:3.9.0]
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) ~[apache-cassandra-3.9.0.jar:3.9.0]
No error in /var/log/cassandra/debug.php on 10.0.0.12
Remember 10.0.0.11 & 10.0.0.12 are seed nodes
Thanks

Opscenter not connected to cluster

I have deployed Cassandra cluster(Datastax) in google cloud in multiple regions. When I deployed i checked OpsCenter, it was connected and able to see information. I have restarted (Shutdown and started) the cassandra VM's and OpsCenter, then I checked the OpsCenter. It is showing it is not connected to Cluster.
Check the OpsCenter log file and found the following error message.
Error Message
[opscenterd] ERROR: Unhandled error in Deferred: There are no clusters with name or ID 'Test_Cluster'
nodetool status
Datacenter: asia-east1-a
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.140.0.2 31.07 MB 64 ? d52bfd45-26e3-4c22-9369-caf5f7aa567a asia-east1-a
UN 10.140.0.3 31.05 MB 64 ? 35da9f1a-b897-4236-a735-366a6f4e5fa2 asia-east1-a
Datacenter: europe-west1-b
==========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.132.0.2 212.07 KB 64 ? 0e29e8a5-a94e-4843-8515-4ed1ad751009 europe-west1-b
UN 10.132.0.3 195.63 KB 64 ? e1b130ad-52e2-4450-88ef-aa7032d93979 europe-west1-b
Datacenter: us-east1-b
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.142.0.2 295.4 KB 64 ? 032df12d-f45a-4d57-b6fc-aa4c3f516212 us-east1-b
UN 10.142.0.3 214.28 KB 64 ? 714242e5-4c66-4e90-8816-9328c0c311ff us-east1-b
Checked Datastax agent
sudo service datastax-agent status
DataStax Agent datastax-agent is running
Faced same issue when I tried to deploy new cluster and shutdown and restarted.

Cassandra nodes appearing in different datacenters

I am having trouble with three nodes in Cassandra, each of them in an individual computer, as I am trying to set up my first Cassandra structure. I have set up everything as in the Datastax documentation, and I have the same configuration in the different cassandra.yaml of each machine (changing the relative ips). The thing is that after configuring everything, each computer sees each other as DN, and each machine (localhost) appears as UN, with the difference that in the .101 computer I can see two different datacenters, while in the other computers only one datacenter appears.
So in my 192.168.1.101 machine when I type
sudo nodetool status
I get this output:
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DN 192.168.1.200 ? 256 ? 968d5d1e-a113-40ce-9521-e392a927ea5e r1
DN 192.168.1.102 ? 256 ? fc5c2dbe-8834-4040-9e77-c3d8199b6767 r1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 446.13 KB 256 ? 6d28d540-2b44-4522-8612-b5f70a3d7d52 rack1
While when I type "nodetool status" in one of the other two machines, I get this output:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
DN 192.168.1.200 ? 256 ? 968d5d1e-a113-40ce-9521-e392a927ea5e rack1
UN 127.0.0.1 506,04 KB 256 ? fc5c2dbe-8834-4040-9e77-c3d8199b6767 rack1
DN 192.168.1.101 ? 256 ? 6d28d540-2b44-4522-8612-b5f70a3d7d52 rack1
In OpsCenter I can only see my 192.168.1.101 machine:
... Which makes me think that something's odd in the yaml file of this machine and the others, but I have checked several times and it seems that the configuration is the same in the other computers. Enpoint_snitch is set to "GossipingPropertyFileSnitch".
Any tips on how to solve the reason why all the other nodes appear as Down Normal and why I am getting two datacenters would be highly appreaciated. It's driving me crazy!
Thanks for reading.
Any tips on how to solve the reason why all the other nodes appear as Down Normal and why I am getting two datacenters would be highly appreaciated. It's driving me crazy!
On each machine, edit the $CASSANDRA_HOME/conf/cassandra-rackdc.properties file to set:
dc=dc1
rack=rack1
nodetool status shows that you set the wrong DC name for 2 nodes (DC1 instead of dc1). It's case sensitive
It looks like some of the installed nodes were dead, so I deleted the nodes that were not the local machine in each of the nodes, ie:
nodetool removenode 968d5d1e-a113-40ce-9521-e392a927ea5e
nodetool removenode fc5c2dbe-8834-4040-9e77-c3d8199b6767
and after that I got the right output when I executed nodetools status
[machine1]~$ sudo nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 286.93 KB 256 ? 6d28d540-2b44-4522-8612-b5f70a3d7d52 rack1
[machine2]~$ sudo nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 268.45 KB 256 ? fc5c2dbe-8834-4040-9e77-c3d8199b6767 rack1
And made sure that the parameters cluster_name, seeds, listen_address and rpc_address were right.
cluster_name: 'Test Cluster'
seeds: "192.168.1.101, 192.168.1.102"
listen_address: 192.168.1.101
rpc_address: 192.168.1.101
Changing listen_address and rpc_address to the corresponding ip of each machine in their corresponding cassandra.yaml file.
After that I got the right output (I am using only 2 machines for the nodes now):
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.101 309.13 KB 256 51.6% 6d28d540-2b44-4522-8612-b5f70a3d7d52 rack1
UN 192.168.1.102 257.15 KB 256 48.4% fc5c2dbe-8834-4040-9e77-c3d8199b6767 rack1

Cassandra load is high on one of the nodes

I have an 8 node Cassandra cluster (Cassandra 2.0.8). When I ran the status using nodetool, I see the following. I am a newbie and wondering why the load on one of the nodes (that node is my initial seed node) is high compared to others?
I also noticed that when I try to push data into Cassandra Table (column family) using PIG, that one node is using very high CPU (95%+) while the others are not (20-30%)
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN xxx.xxx.xx.xxx 15.55 MB 256 6.2% ------------------------------------ rack1
UN xxx.xxx.xx.xxx 36.89 MB 256 6.2% ------------------------------------ rack1
UN xxx.xxx.xx.xxx 3.77 GB 256 6.2% ------------------------------------ rack1
UN xxx.xxx.xx.xxx 1.04 GB 256 56.2% ------------------------------------ rack1
UN xxx.xxx.xx.xxx 43.49 MB 256 6.2% ------------------------------------ rack1
UN xxx.xxx.xx.xxx 40.36 MB 256 6.2% ------------------------------------ rack1
UN xxx.xxx.xx.xxx 43.69 MB 256 6.2% ------------------------------------ rack1
UN xxx.xxx.xx.xxx 40.23 MB 256 6.2% ------------------------------------ rack1
Any help is appreciated. Thank you.
You mentioned that you are pushing data through PIG. If so are you using the Hadoop support of Cassandra?
If yes, it is likely that your splits are causing this.

Resources