We have configured 3 node cassandra cluster in RHEL 7.2 version and we are doing cluster testing. When we start cassandra in all 3 nodes they form a cluster and they work fine.
But when we bring one node down using "init 6" or "reboot" command, the rebooted node takes more time to join the cluster, however if we manually kill and start cassandra process the nodes join cluster immediately without any issues.
We have provided all 3 IPs as seed nodes and the cluster name is same for all 3 nodes and their respective IP as listen address.
Please help us in resolving this issue.
Thanks
Update
Cassandra - 3.9 version
While investigating the issue further we noticed Node 1 (rebooted
node) able to send "SYN", "ACK2" messages for both the nodes (Node
2, Node 3) even though nodetool status displays "Node 2 and 3 as
"DN"" only in "Node 1"
enter code here
After 10 - 15min we noticed "Connection Timeout" exception in Node
2 and 3. being thrown from OutboundTcpConnection.java (line # 311)
which triggers a state change event to "Node 1" and changes the
state as "UN".
if (logger.isTraceEnabled())
logger.trace("error writing to {}", poolReference.endPoint(), e);
Please let us know what triggers "Connection TimeOut" exception in "Node 2 and 3" and ways to resolve this.
We believe this issue is similar to https://issues.apache.org/jira/browse/CASSANDRA-9630
But when we bring one node down using "init 6" or "reboot" command, the rebooted node takes more time to join the cluster, however if we manually kill and start cassandra process the nodes join cluster immediately without any issues.
Remember that Cassandra writes everything to the commit log to ensure durability in case of some "plug-out-of-the-wall" event. When that happens, Cassandra reconciles the data stored on-disk with data in the commit log at start-up time. If there are differences, it could take a while for that reconciliation to complete.
That's why its important to run these commands before stopping Cassandra:
nodetool disablegossip
nodetool drain
Disabling gossip makes sure that the node you're shutting down won't take any additional requests. Drain ensures that anything in the commit log is written to disk. Once those are done, you can stop your node. Then the node restart should occur much faster.
Related
Any reason why com.datastax.driver.core.Metadata:getHosts() would return state UP for a host that has shutdown?
However, nodetool status returns DN for that host.
No matter how many times I check Host.getState(), it still says UP for that dead host.
This is how I'm querying Metadata:
cluster = DseCluster.builder()
.addContactPoints("192.168.1.1", "192.168.1.2", "192.168.1.3")
.withPort(9042)
.withReconnectionPolicy(new ConstantReconnectionPolicy(2000))
.build();
cluster.getMetadata().getAllHosts();
EDIT: Updated code to reflect I'm trying to connect to 3 hosts. I should've stated that the cluster I'm connecting has 3 nodes, 2 in DC1 and another in DC2.
Also, whenever I relaunch my Java process running this code, the behavior changes. Sometimes it gives me the right states, then when I restart it again, it gives me the wrong states, and so on.
I will post an answer which I got from the datastaxacademy slack:
Host.getState() is the driver's view of what it thinks the host
state is, where nodetool status is what that C* node thinks the
state of all nodes in the clusters are from its view (propagated via
gossip) There is not a way to get that via the driver
I'm running a cluster of 10 Cassandra 3.10 and I saw a very strange behavior: after restart, a node won't open immediately native_transport_port (9042).
After one node restart, the flow is :
node finishes to read all commitlog,
update all its data,
it's visible for other nodes in the cluster,
wait for random time (from 1 minute to hours) to open 9042 port
My logs are in DEBUG mode, and nothing is written about opening this port.
What is happening and how can I debug this problem?
Output for several nodetool commands are:
nodetool enablebinary does not return at all
nodetool compactionstats 0 pending tasks
nodetool netstats Mode: STARTING. Not sending any streams.
nodetool info: Gossip active : true
Thrift active : false
Native Transport active: false
Thank you.
Are you saving your key/row cache? It tends to take a lot of time when that is the case. Also, what is your file max limit?
I have a 3 node Cassandra cluster setup (replication set to 2) with Solr installed, each node having RHEL, 32 GB Ram, 1 TB HDD and DSE 4.8.3. There are lots of writes happening on my nodes and also my web application reads from my nodes.
I have observed that all the nodes go down after every 3-4 days. I have to do a restart of every node and then they function quite well till the next 3-4 days and again the same problem repeats. I checked the server logs but they do not show any error even when the server goes down. I am unable to figure out why is this happening.
In my application, sometimes when I connect to the nodes through the C# Cassandra driver, I get the following error
Cassandra.NoHostAvailableException: None of the hosts tried for query are available (tried: 'node-ip':9042) at Cassandra.Tasks.TaskHelper.WaitToComplete(Task task, Int32 timeout) at Cassandra.Tasks.TaskHelper.WaitToComplete[T](Task``1 task, Int32 timeout) at Cassandra.ControlConnection.Init() at Cassandra.Cluster.Init()`
But when I check the OpsCenter, none of the nodes are down. All nodes status show perfectly fine. Could this be a problem with the driver? Earlier I was using Cassandra C# driver version 2.5.0 installed from nuget, but now I updated even that to version 3.0.3 still this errors persists.
Any help on this would be appreciated. Thanks in advance.
If you haven't done so already, you may want to look at setting your logging levels to default by running: nodetool -h 192.168.XXX.XXX setlogginglevel org.apache.cassandra DEBUG on all your nodes
Your first issue is most likely an OutOfMemory Exception.
For your second issue, the problem is most likely that you have really long GC pauses. Tailing /var/log/cassandra/debug.log or /var/log/cassandra/system.log may give you a hint but typically doesn't reveal the problem unless you are meticulously looking at the timestamps. The best way to troubleshoot this is to ensure you have GC logging enabled in your jvm.options config and then tail your gc logs taking note of the pause times:
grep 'Total time for which application threads were stopped:' /var/log/cassandra/gc.log.1 | less
The Unexpected exception during request; channel = [....] java.io.IOException: Error while read (....): Connection reset by peer error is typically inter-node timeouts. i.e. The coordinator times out waiting for a response from another node and sends a TCP RST packet to close the connection.
What's the behavior when a partition is sent to a node and the node crashes right before executing a job? If a new node is introduced into the cluster, what's the entity that detects the addition of this new machine? Does the new machine get assigned the partition that didn't get processed?
The master considers the worker to be failure if it didnt receive the heartbeat message for past 60 sec (according to spark.worker.timeout). In that case the partition is assigned to another worker(remember partitioned RDD can be reconstructed even if its lost).
For the question if the new node is introduced into cluster? the spark-master will not detect the new node addition to the cluster once the slaves are started, because before application-submit in cluster the sbin/start-master.sh starts the master and sbin/start-slaves.sh reads the conf/slaves file (contains IP address of all slaves) in spark-master machine and starts a slave instance on each machine specified. The spark-master will not read this configuration file after being started. so its not possible to add a new node once all slaves started.
I'm trying but failing to join a new (well old, but wiped out) node to an existing cluster.
Currently cluster consists of 2 nodes and runs C* 2.1.2. I start a third node with 2.1.2, it gets to joining state, it bootstraps, i.e. streams some data as shown by nodetool netstats, but after some time, it gets stuck. From that point nothing gets streamed, the new node stays in joining state. I restarted node twice, everytime it streamed more data, but then got stuck again. (I'm currently on a third round like that).
Other facts:
I don't see any errors in the log on any of the nodes.
The connectivity seems fine, I can ping, netcat to port 7000 all ways.
I have 267 GB load per running node, replication 2, 16 tokens.
Load of a new node is around 100GBs now
I'm guessing that the node after few rounds of restarts, will finally suck in all of the data from running nodes and join the cluster. But definitely it's not the way it should work.
EDIT: I discovered some more info:
The bootstrapping process stops in the middle of streaming some table, always after sending exactly 10MB of some SSTable, e.g.:
$ nodetool netstats | grep -P -v "bytes\(100"
Mode: NORMAL
Bootstrap e0abc160-7ca8-11e4-9bc2-cf6aed12690e
/192.168.200.16
Sending 516 files, 124933333900 bytes total
/home/data/cassandra/data/leadbullet/page_view-2a2410103f4411e4a266db7096512b05/leadbullet-page_view-ka-13890-Data.db 10485760/167797071 bytes(6%) sent to idx:0/192.168.200.16
Read Repair Statistics:
Attempted: 2016371
Mismatch (Blocking): 0
Mismatch (Background): 168721
Pool Name Active Pending Completed
Commands n/a 0 55802918
Responses n/a 0 425963
I can't diagnose the error & I'll be grateful for any help!
Try to telnet from one node to another using correct port.
Make sure you are joining the correct name cluster.
Try use: nodetool repair
You might be pinging the external IP addressed, and your cluster communicates using internal IP addresses.
If you are running on Amazon AWS, make sure you have firewall open on both internal IP addresses.