Cassandra nodes becomes unreachable to each other - cassandra

I have 3 nodes of elassandra running in docker containers.
Containers created like:
Host 10.0.0.1 : docker run --name elassandra-node-1 --net=host -e CASSANDRA_SEEDS="10.0.0.1" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest
Host 10.0.0.2 : docker run --name elassandra-node-2 --net=host -e CASSANDRA_SEEDS="10.0.0.1,10.0.0.2" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest
Host 10.0.0.3 : docker run --name elassandra-node-3 --net=host -e CASSANDRA_SEEDS="10.0.0.1,10.0.0.2,10.0.0.3" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest
Cluster was working fine for a couple of days since created, elastic, cassandra all was perfect.
Currently however all cassandra nodes became unreachable to each other:
Nodetool status on all nodes is like
Datacenter: DC1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 10.0.0.3 11.95 GiB 8 100.0% 7652f66e-194e-4886-ac10-0fc21ac8afeb r1
DN 10.0.0.2 11.92 GiB 8 100.0% b91fa129-1dd0-4cf8-be96-9c06b23daac6 r1
UN 10.0.0.1 11.9 GiB 8 100.0% 5c1afcff-b0aa-4985-a3cc-7f932056c08f r1
Where the UN is the current host 10.0.0.1
Same on all other nodes.
Nodetool describecluster on 10.0.0.1 is like
Cluster Information:
Name: BD Storage
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
24fa5e55-3935-3c0e-9808-99ce502fe98d: [10.0.0.1]
UNREACHABLE: [10.0.0.2,10.0.0.3]
When attached to the first node its only repeating these infos:
2018-12-09 07:47:32,927 WARN [OptionalTasks:1] org.apache.cassandra.auth.CassandraRoleManager.setupDefaultRole(CassandraRoleManager.java:361) CassandraRoleManager skipped default role setup: some nodes were not ready
2018-12-09 07:47:32,927 INFO [OptionalTasks:1] org.apache.cassandra.auth.CassandraRoleManager$4.run(CassandraRoleManager.java:400) Setup task failed with error, rescheduling
2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.2] org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561) Handshaking version with /10.0.0.2
2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.3] org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561) Handshaking version with /10.0.0.3
After a while when some node is restarted:
2018-12-09 07:52:21,972 WARN [MigrationStage:1] org.apache.cassandra.service.MigrationTask.runMayThrow(MigrationTask.java:67) Can't send schema pull request: node /10.0.0.2 is down.
Tried so far:
Restarting all containers at the same time
Restarting all containers one after another
Restarting cassandra in all containers like : service cassandra restart
Nodetool disablegossip then enable it
Nodetool repair : Repair command #1 failed with error Endpoint not alive: /10.0.0.2
Seems that all node schemas are different, but I still dont understand why they are marked as down to each other.

If you have different Cassandra version then nodetool repair will not pull the data.Keep same version of Cassandra. sometimes node showing down or unreachable because of gossip was not happening properly. reason may be network, high load on that node or node is very busy and lots of i/o operation on going such as repair, compaction etc.

Related

Multi-node multi-datacenter CASSANDRA

I am trying to setup a multi-node multi-datacenter cluster in Cassandra 3.11
For data-center 1 I have Cassandra running on 3 nodes(eg. 10.90.22.11, 10.90.22.12 and 10.90.22.13) and for data-center 2 I have Cassandra running on 2 nodes(eg. 10.90.22.21 and 10.90.22.22).
The ring is up but they are working separately. To make them work together I update the endpoint_snitch to be GossipingPropertyFileSnitch and also the dc and rac in cassandra-rackdc.properties to be DC1 and DC2 for respective nodes following the steps mentioned in this link.
After these changes when I restart Cassandra, the status of Cassandra is running however when I check for the ring with nodetool status I receive a error:
nodetool: Failed to connect to '127.0.0.1:7199'
ConnectException: 'Connection refused (Connection refused)'
What am I missing?
This error you posted indicates that nodetool couldn't connect to JMX that is supposed to be listening on port 7199:
Failed to connect to '127.0.0.1:7199'
Verify that Cassandra is running and check that the process is bound to various ports including 7199, 9042 and 7000. You can try running one of these commands:
$ netstat -tnlp
$ sudo lsof -nPi | grep LISTEN | grep java
Cheers!
You should try nodetool command with host/IP what you have put in your cassandra.yaml. Also, you should check your port 7199 or custom port if you set is open/allow from firewall.
nodetool -h hostname/ip status.
you can mention username.password if you enabled. please refer below link for more details and understanding:-
http://cassandra.apache.org/doc/latest/tools/nodetool/status.html

Cannot start Cassandra : Port already in use

I have two Ubuntu 16.04 nodes on which I installed Cassandra 3.11.3 with java version "1.8.0_181". I want to merge these two nodes into a Cassandra cluster. Their intern ips are 172.16.10.20 and 172.16.10.30.
on each /etc/cassandra/cassandra.yaml file I modified the following lines:
cluster_name: 'my_cluster'
- seeds: "172.16.10.20,172.16.10.30"
listen_address: XXXX
rpc_address: XXXX
where XXXX is respectively the intern ip of the current node.
I then restart Cassandra on each node
sudo service cassandra restart
and check that Cassandra runs :
sudo service cassandra status
cassandra.service - LSB: distributed storage system for structured data
Loaded: loaded (/etc/init.d/cassandra; generated)
Active: active (running) since Wed 2018-08-08 00:31:42 UTC; 3s ago
Docs: man:systemd-sysv-generator(8)
and the cluster
nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.16.10.20 190.11 KiB 256 100.0% 84dded4c-c74e-45f4-9481-ff837fec229d rack1
UN 172.16.10.30 265.06 KiB 256 100.0% 4695fef4-70c7-46b2-a0bd-8b752fe5beb6 rack1
Everything is up and normal.
I want to connect now to Cassandra:
cqlsh
and get:
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
Some googling later, I want to start cassandra by hand
cassandra
and get (among a huge message):
ERROR [main] 2018-08-07 23:02:51,365 CassandraDaemon.java:708 - Port already in use: 7199; nested exception is:
java.net.BindException: Address already in use (Bind failed)
It looks like the port 7199 is already in use. I kill the corresponding pid, change in /etc/cassandra/cassandra-env.sh the JMX_PORT to 7200... same issue, the port is said to be in use, plus the error
00:33:06,236 |-ERROR in ch.qos.logback.core.rolling.RollingFileAppender[SYSTEMLOG] - openFile(/var/log/cassandra/system.log,true) call failed. java.io.FileNotFoundException: /var/log/cassandra/system.log (Permission denied)
I have changed the permission, but the error remains. At this point of the story I am running out of ideas. What I am trying to achieve seem pretty straighforward so I guess others must have run into a similar issue.
The nodetool status output says it all here. You had everything running just fine. So revert any changes in terms of port usage.
As your nodetool status reveals that your node IPs are 172.16.10.20 and 172.16.10.30, try running cqlsh and providing one of those IPs. cqlsh tries to connect to 127.0.0.1 by default, which will not work in a plural node cluster.
cqlsh 172.16.10.20 -u yourusername -p yourpassword
Note: You can omit -u and -p if you don't have auth enabled. But if that's true, then you should really change your cluster to enable auth.

cassandra is not running as service

The system is Linux 14.04.1-Ubuntu x86_64, 200GB space, 8GB memory. Everything is done in both root and user. We installed the Cassandra version 3.6.0 from datastax using the following command (followed the instruction from website: http://docs.datastax.com/en/cassandra/3.x/cassandra/install/installDeb.html):
$ apt-get update
$ apt-get install datastax-ddc
However, the cassandra is not started as service.
root#e7:~# nodetool status
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused'.
root#e7:~# service cassandra start
root#e7:~# service cassandra status
* Cassandra is not running
We can start Cassandra manually using the command:
$ cassandra -R -f
...
INFO 18:45:02 Starting listening for CQL clients on /127.0.0.1:9042 (unencrypted)...
INFO 18:45:02 Binding thrift service to /127.0.0.1:9160
INFO 18:45:02 Listening for thrift clients...
INFO 18:45:12 Scheduling approximate time-check task with a precision of 10 milliseconds
root#e7:~# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 153.45 KiB 256 100.0% 28ba16df-1e4c-4a40-a786-ebee140364bf rack1
However, we have to start cassandra as a service. Any suggestions how to fix the problem?
Try using http://docs.datastax.com/en/cassandra/3.0/cassandra/install/installDeb.html
This is more stable and I have tried it.
I think the ports are not opened.
Try opening the following ports:
Cassandra inter-node ports
Port number Description
7000 Cassandra inter-node cluster communication.
7001 Cassandra SSL inter-node cluster communication.
7199 Cassandra JMX monitoring port.
Cassandra client port
Port number Description
9042 Cassandra client port.
9160 Cassandra client port (Thrift).
Also what type of Snitch is defined in the Cassandra.yaml file ?

Does OpsCenter Community work with a single-node Cassandra cluster?

I am using Ubuntu 15.10 64 bits with OpenJDK 1.8.0_66. Cassandra 3.3 and OpsCenter 5.2.4 are installed from apt packages and running locally. I am using the following sources.
deb http://www.apache.org/dist/cassandra/debian 33x main
deb http://debian.datastax.com/community stable main
The OpsCenter doc says: "In Add Cluster, enter the Hostnames or IP addresses of two or three nodes in the cluster, set the JMX and Native Transport ports, and click Save Cluster".
I have a single-node cluster, so I don't have two or three IP addresses to enter. When I enter the address of the node, I get the following message: "Error creating cluster: Unable to connect to cluster. Error is: Unable to connect to any seed nodes, tried ['127.0.0.1']".
I am using the default ports. netstat -an gives among others the following lines.
tcp 0 0 127.0.0.1:7199 0.0.0.0:* LISTEN
tcp6 0 0 127.0.0.1:9042 :::* LISTEN
nodetool status is working fine.
$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 201.17 KB 256 ? 04da2c59-1a88-4ed3-9af9-8f64ae27e9ac rack1
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
So is cqlsh.
$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.3 | CQL spec 3.4.0 | Native protocol v4]
Use HELP for help.
cqlsh>
After looking at other questions I tried uncommenting the following in /etc/cassandra/cassandra-env.sh. I tried both localhost and 127.0.0.1. It did not help.
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=<hostname>"
There is nothing in /var/log/cassandra/debug.log or /var/log/cassandra/system.log when the connection attempts fail. The only log messages are from Cassandra starting/stopping.
Is it possible at all to use OpsCenter with a single-node cluster? If so, what might I be missing?
Cassandra 3.3 and OpsCenter 5.2.4
You're out of luck, OpsCenter 5.x only work for Cassandra 2.1 branch and the future OpsCenter 6.x that will be compatible with Cassandra 3.x branch will only be available wit the Datastax Enterprise version, read this: http://docs.datastax.com/en/opscenter/5.2/opsc/opscPolicyChanges.html

Cassandra nodetool connection timed out

Im trying to use nodetool to check the status of my cluster, but its unable to connect.
My cassandra.yaml is configured with listen_address and rpc_address set as the server IP (e.g. 10.10.10.266).
Im able to connect through cqlsh and cassandra-cli using the same IP, but when I connect to nodetool it doesnt work.
/bin$ nodetool -h 10.10.10.266 ring
Failed to connect to '10.10.10.266:7199': Connection has timed out
I dont think I have a firewall enabled on the server (Ubuntu). Im running this directly on the server in question, so I wouldnt have thought it would be a firewall issue anyway.
You probably need to uncomment the following parameter in cassandra-env.sh:
-Djava.rmi.server.hostname=<public name>
Replace with the address of the interface you want the jmx interface to listen on.
nodetool connects through JMX interface. By default it's listening on port 7199 (other tools use RPC interface listening on port 9160 by default). Check JMX settings in cassandra-env.sh file. Most likely JMX server is listening on wrong interface (or probably loopback interface).
Default JMX configuration section (cassandra ver. 1.1.5) contains link to troubleshooting guide:
# jmx: metrics and administration interface
#
# add this if you're having trouble connecting:
# JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=<public name>"
#
# see
# https://blogs.oracle.com/jmxetc/entry/troubleshooting_connection_problems_in_jconsole
# for more on configuring JMX through firewalls, etc. (Short version:
# get it working with no firewall first.)
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=false"
JVM_OPTS="$JVM_OPTS $JVM_EXTRA_OPTS"
It also worths to list all network interfaces using ifconfig and try telnet'ing port 7199 on all interfaces.
I was facing the same timeout issue. However I found that my cluster was not getting started properly because of token issue and I was getting "Host ID collision between active endpoint". Once i deleted data directory and restarted cluster then nodetool started working fine.
I also saw this same issue but it turned out to be some weirdness in my hosts file that was preventing JMX from binding to the interfaces.
Specifically, the host file had an entry for the external IP address with the hostname. Our servers had two interfaces, one external and one for an internal network. Removing that hosts entry did the trick.
As someone mentioned, it connects to the JMX port.
You can find the JMX port:
In /etc/cassandra/cassandra-env.sh. This won't work for ccm based local clusters OR
(my fav) by looking at the command-line of Cassandra node process running on the node.
My case was a cluster created locally using ccm so all my nodes were running on same host with different JMX port.
vagrant#triforce:~$ ps -eaf | grep cassandra | grepi -o " [^ ]*jmx.local.port[^ ]* "
-Dcassandra.jmx.local.port=7100
-Dcassandra.jmx.local.port=7300
-Dcassandra.jmx.local.port=7200
vagrant#triforce:~$
This is because I have 3 nodes running on the localhost.
vagrant#triforce:~$ nodetool -p 7100 ring
Datacenter: datacenter1
==========
Address Rack Status State Load Owns Token
3074457345618258602
127.0.0.1 rack1 Up Normal 64.65 MB 33.33% -9223372036854775808
127.0.0.2 rack1 Up Normal 65.26 MB 33.33% -3074457345618258603
127.0.0.3 rack1 Up Normal 65.92 MB 33.33% 3074457345618258602
vagrant#triforce:~$

Resources