cassandra 'handshaking version with' - cassandra

I have 2 nodes
ip1 node1's ip
ip2 nodes2's ip
each node starting but not connecting each other.. For example nodetool status show own node. Not other node
in node1's log:
Handshaking version with /ip2
in node2's log there are no info or error messages related to node1
no error messages both of them. What causes this problem?

A node should not normally be in its own seed list; if it is, it will not try to join the existing cluster. Only the first node in a cluster should be in its own seed list.
Try putting only ip1 in both nodes' seed list and leave ip2 out of the seed list entirely. Also, set auto_bootstrap: true on node 2. Shut down the nodes, remove the /var/lib/cassandra directory from both nodes, and then start node 1. When node 1 finishes starting up (check for status UN using nodetool status), then start node 2. It should now talk to node 1 and join the cluster.

Related

cassandra service (3.11.5) stops automaticall after it starts/restart on AWS linux

cassandra service (3.11.5) stops automatically after it starts/restart on AWS linux.
I have fresh installation of cassandra on new instance of AWS linux (t3.xlarge) and
sudo service cassandra start
or
sudo service cassandra restart
after 1 or 2 seconds, the service stop automatically. I looked into logs and I found these.
I am not sure, I havent change configs related to snitch and its always SimpleSnitch. I dont have any multiple cassandras. Just only on single EC2.
Logs
INFO [main] 2020-02-12 17:40:50,833 ColumnFamilyStore.java:426 - Initializing system.schema_aggregates
INFO [main] 2020-02-12 17:40:50,836 ViewManager.java:137 - Not submitting build tasks for views in keyspace system as storage service is not initialized
INFO [main] 2020-02-12 17:40:51,094 ApproximateTime.java:44 - Scheduling approximate time-check task with a precision of 10 milliseconds
ERROR [main] 2020-02-12 17:40:51,137 CassandraDaemon.java:759 - Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
Installation steps
sudo curl -OL https://www.apache.org/dist/cassandra/redhat/311x/cassandra-3.11.5-1.noarch.rpm
sudo rpm -i cassandra-3.11.5-1.noarch.rpm
sudo pip install cassandra-driver
export CQLSH_NO_BUNDLED=true
sudo chkconfig --levels 3 cassandra on
The issue is in your log file:
ERROR [main] 2020-02-12 17:40:51,137 CassandraDaemon.java:759 - Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
It seems that you started the cluster, stopped it and renamed the datacenter from dc1 to datacenter1.
In order to fix:
If no data is stored, delete the data directories
If data is stored, rename the datacenter back to dc1 in the config
I had the same problem , where cassandra service immediately stops after it was started.
in the cassandra configuration file located at /etc/cassandra/cassandra.yaml change the cluster_name to the previous one, like this:
...
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'dc1'
# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
...

cassandra remove node from gossipinfo

I have cassandra version 3.6
Actully I want to remove a node "261.4.55.161" from cassandra So,
In previous I have 2 node of cassandra so I left a node with this command in the host "261.4.55.161".
[root#b59 conf]# "nodetool decommission"
now the node is not showing on "nodetool status cp" command only one node showing (this is what I want).
[root#b59 conf]# nodetool status cp;
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 12.111.41.22 43.8 GiB 256 100.0% 65f7597b-2l42-4bcb-a65a-53c25d4b7a13 rack1
But when I check the gossip with this command "nodetool gossipinfo"
This is still showing the node but "STATUS is LEFT", but I want to completely disable this node.
[root#b59 conf]# nodetool gossipinfo
/12.111.41.22
generation:1524471400
heartbeat:755047
STATUS:20:NORMAL,-1025782309085114491
LOAD:754953:4.7034856044E10
SCHEMA:69:79958430-ad10-34dd-baf9-1ac87e9e7910
DC:7:datacenter1
RACK:9:rack1
RELEASE_VERSION:5:3.6.0
RPC_ADDRESS:4:12.111.41.22
SEVERITY:755049:0.5
NET_VERSION:2:10
HOST_ID:3:65f7597b-2l42-4bcb-a65a-53c25d4b7a13
RPC_READY:53:true
TOKENS:19:<hidden>
/261.4.55.161
generation:1524717007
heartbeat:1500
STATUS:1502:LEFT,-1003381131543138657,1524976696131
LOAD:1481:6.4782307931E10
SCHEMA:10:79958430-ad10-34dd-baf9-1ac87e9e7910
DC:6:datacenter1
RACK:8:rack1
RELEASE_VERSION:4:3.6
RPC_ADDRESS:3:261.4.55.161
SEVERITY:1499:0.0
NET_VERSION:1:10
HOST_ID:2:a98d0b43-2b66-4b95-b8a6-e81197d9eb9d
RPC_READY:42:true
TOKENS:13:<hidden>
I don't want to show this node in gossipinfo also.
my question is how do I remove this node 261.4.55.161 from gossipinfo?
It should go away after awhile (few days i think it is) it remains in that state in gossip info as a precaution incase a node was offline and missed the decommission. It shouldn't be hurting anything in LEFT state, you can just ignore it. In left state its no longer part of the cluster.
There is a nodetool assassinate (on newer versions, older have to call JMX yourself) to forcibly remove it from gossip, but really theres no need to do that. Best to just ignore it.

Cassandra restart issues while restoring to a new cluster

I am restoring to a fresh new Cassandra 2.2.5 cluster consisting of 3 nodes.
Initial cluster health of the NEW cluster:
-- Address Load Tokens Owns Host ID Rack
UN 10.40.1.1 259.31 KB 256 ? d2b29b08-9eac-4733-9798-019275d66cfc uswest1adevc
UN 10.40.1.2 230.12 KB 256 ? 5484ab11-32b1-4d01-a5fe-c996a63108f1 uswest1adevc
UN 10.40.1.3 248.47 KB 256 ? bad95fe2-70c5-4a2f-b517-d7fd7a32bc45 uswest1cdevc
As part of the restore instructions in Datastax docs, i do the following on the new cluster:
1) cassandra stop on all of the three nodes one by one.
2) Edit cassandra.yaml for all of the three nodes with the backup'ed token ring information. [Step 2 from docs]
3) Remove the contents from /var/lib/cassandra/data/system/* [Step 4 from docs]
4) cassandra start on nodes 10.40.1.1, 10.40.1.2, 10.40.1.3 respectively.
Result:
10.40.1.1 restarts back successfully:
-- Address Load Tokens Owns Host ID Rack
UN 10.40.1.1 259.31 KB 256 ? 2d23add3-9eac-4733-9798-019275d125d3 uswest1adevc
But the second and the third nodes fail to restart stating:
java.lang.RuntimeException: A node with address 10.40.1.2 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:546) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:766) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:693) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:585) ~[apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) [apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) [apache-cassandra-2.2.5.jar:2.2.5]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625) [apache-cassandra-2.2.5.jar:2.2.5]
INFO [StorageServiceShutdownHook] 2016-08-09 18:13:21,980 Gossiper.java:1449 - Announcing shutdown
java.lang.RuntimeException: A node with address 10.40.1.3 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
...
Eventual cluster health:
-- Address Load Tokens Owns Host ID Rack
UN 10.40.1.1 259.31 KB 256 ? 2d23add3-9eac-4733-9798-019275d125d3 uswest1adevc
DN 10.40.1.2 230.12 KB 256 ? 6w2321ad-32b1-4d01-a5fe-c996a63108f1 uswest1adevc
DN 10.40.1.3 248.47 KB 256 ? 9et4944d-70c5-4a2f-b517-d7fd7a32bc45 uswest1cdevc
I understand that the HostID of a node might change after system dirs are removed.
My question is:
Do i need to explicitly state during the start to replace itself? Are the docs incomplete or am i missing something in my steps?
Turns out there were stale directories commit_log and saved_caches which i missed to delete earlier. The instructions work correctly with those directories deleted.
Usually on a situation like this, after i do a
$ systemctl stop cassandra
It i will run the
$ ps awxs | grep cassandra
will notice cassandra still has some features up.
I usually do a
$ kill -9 cassandra.pid
and
$ rm -rf /var/lib/cassandra/data/* && /var/lib/cassandra/commitlog/*
java.lang.RuntimeException: A node with address 10.40.1.3 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
If you are still facing this above error, that means your cassandra process is running on that node. Login to 10.40.1.3 node firstly. Then follow the following steps-
$ jps
You see some processes running. For example:
9107 Jps
1112 CassandraDaemon
Then kill the CassandraDaemon process by the process id you see after executing jps. In my example, here process id 1112 for CassandraDaemon.
$ kill -9 1112
Then check processes again after a while-
$ jps
You will see CassandraDaemon will no longer be available.
9170 Jps
Then remove your saved_caches and commitlog and start cassandra again.
Do this for all nodes you are suffering with above error you mentioned.

Apache Cassandra 3.7 snitch issue cannot start data center

I am using ubuntu 14.04 with apache cassandra 3.7. I am trying to start it but get the following error message:
ERROR [main] 2016-07-15 15:22:10,627 CassandraDaemon.java:731 - Cannot start node if snitch's data center (dc1) differs from previous data center (datacenter1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
I know I can set -Dcassandra.ignore_dc=true, BUT that is not a fix, its a band-aid and for development use only, this is suppose to be in production. I tried to clear out all the files and folders in /var/lib/cassandra, I MEAN every SINGLE FILE AND FOLDER, started apache cassandra again, AND STILL THE SAME ERROR MESSAGE... any other idea??
change in file:
/etc/cassandra/cassandra-rackdc.properties
entry from dc1 to datacenter1
on all nodes
and then do a rolling restart of nodes.
If have just switched to GossipingPropertyFileSnitch, start Cassandra with the option
-Dcassandra.ignore_dc=true
If it starts successfully, execute:
nodetool repair
nodetool cleanup
Afterwards, Cassandra should be able to start normally without the ignore option.
I faced the issue while upgrading my Apache cassandra from 3.11.1 to 3.11.4 .
cassandra.yaml
old_Config : endpoint_snitch: GossipingPropertyFileSnitch
New_Config: endpoint_snitch: SimpleSnitch
{changed it to GossipingPropertyFileSnitch}
cassandra-rackdc.properties
old_version_config: dc:Dc1 rack:Rack1
New_version_config: dc:dc rack:rack (changed this to Dc1 and Rack1)
this resolves my issue

Unable to start Cassandra: "node already exists"

I have problems to get a existing Cassandra node to join the cluster again after reboot (on a new virtual machine instance).
I had a running Cassandra cluster with 4 nodes all in state "up and normal" according to nodetool status. The nodes are running on virtual machines in Azure. I changed the instance type of the virtual machine for 10.0.0.6, which returned in a reboot of this machine. The machine stayed on 10.0.0.6.
After the reboot I am unable to start Cassandra again. I am getting this exception:
INFO 22:39:07 Handshaking version with /10.0.0.4
INFO 22:39:07 Node /10.0.0.6 is now part of the cluster
INFO 22:39:07 Node /10.0.0.5 is now part of the cluster
INFO 22:39:07 Handshaking version with cassandraprd001/10.0.0.6
INFO 22:39:07 Node /10.0.0.9 is now part of the cluster
INFO 22:39:07 Handshaking version with /10.0.0.5
INFO 22:39:07 Node /10.0.0.4 is now part of the cluster
INFO 22:39:07 InetAddress /10.0.0.6 is now UP
INFO 22:39:07 Handshaking version with /10.0.0.9
INFO 22:39:07 InetAddress /10.0.0.4 is now UP
INFO 22:39:07 InetAddress /10.0.0.9 is now UP
INFO 22:39:07 InetAddress /10.0.0.5 is now UP
ERROR 22:39:08 Exception encountered during startup
java.lang.RuntimeException: A node with address cassandraprd001/10.0.0.6 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:455) ~[apache-cassandra-2.1.0.jar:2.1.0]
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:667) ~[apache-cassandra-2.1.0.jar:2.1.0]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:615) ~[apache-cassandra-2.1.0.jar:2.1.0]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:509) ~[apache-cassandra-2.1.0.jar:2.1.0]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338) [apache-cassandra-2.1.0.jar:2.1.0]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:457) [apache-cassandra-2.1.0.jar:2.1.0]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:546) [apache-cassandra-2.1.0.jar:2.1.0]
java.lang.RuntimeException: A node with address cassandraprd001/10.0.0.6 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:455)
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:667)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:615)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:509)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:457)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:546)
Exception encountered during startup: A node with address cassandraprd001/10.0.0.6 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
INFO 22:39:08 Announcing shutdown
I am using Cassandra 2.1.0. I am not replaying a dead node - I am just trying to get the old node up and running again. According to nodetool status (on the other nodes) all nodes are "up and normal" except 10.0.0.6 which is "down and normal".
How do I get this node up and running again?
First, on another node, use
nodetool status
the results show you list of nodes in the cluster. Find your node with ip which fail to start, get its id and fill to command:
nodetool removenode <node_id>
then start cassandra.
Best,
Quick answer, if the node's ip is 10.200.10.200
add this
JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=10.200.10.200"
to the end of your
cassandra-env.sh
Don't forget to remove it once your done.
You can look this blog, http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html.
It works for me, this is a bug for Cassandra. If your node's host_id changed, but use old IP, will throw this exception.
If you use Cassandra 2.x.x, you should modify cassandra/conf/cassandra-env.sh.
Finally, don't forget to REMOVE modifications on cassandra-env.sh after the complete bootstrap!

Resources