I'm trying but failing to join a new (well old, but wiped out) node to an existing cluster.
Currently cluster consists of 2 nodes and runs C* 2.1.2. I start a third node with 2.1.2, it gets to joining state, it bootstraps, i.e. streams some data as shown by nodetool netstats, but after some time, it gets stuck. From that point nothing gets streamed, the new node stays in joining state. I restarted node twice, everytime it streamed more data, but then got stuck again. (I'm currently on a third round like that).
Other facts:
I don't see any errors in the log on any of the nodes.
The connectivity seems fine, I can ping, netcat to port 7000 all ways.
I have 267 GB load per running node, replication 2, 16 tokens.
Load of a new node is around 100GBs now
I'm guessing that the node after few rounds of restarts, will finally suck in all of the data from running nodes and join the cluster. But definitely it's not the way it should work.
EDIT: I discovered some more info:
The bootstrapping process stops in the middle of streaming some table, always after sending exactly 10MB of some SSTable, e.g.:
$ nodetool netstats | grep -P -v "bytes\(100"
Mode: NORMAL
Bootstrap e0abc160-7ca8-11e4-9bc2-cf6aed12690e
/192.168.200.16
Sending 516 files, 124933333900 bytes total
/home/data/cassandra/data/leadbullet/page_view-2a2410103f4411e4a266db7096512b05/leadbullet-page_view-ka-13890-Data.db 10485760/167797071 bytes(6%) sent to idx:0/192.168.200.16
Read Repair Statistics:
Attempted: 2016371
Mismatch (Blocking): 0
Mismatch (Background): 168721
Pool Name Active Pending Completed
Commands n/a 0 55802918
Responses n/a 0 425963
I can't diagnose the error & I'll be grateful for any help!
Try to telnet from one node to another using correct port.
Make sure you are joining the correct name cluster.
Try use: nodetool repair
You might be pinging the external IP addressed, and your cluster communicates using internal IP addresses.
If you are running on Amazon AWS, make sure you have firewall open on both internal IP addresses.
Related
3 node cluster and RF of 3 means every node has all the data. Consistency is ONE.
So when queried for some data on node-1, ideally as node-1 has all the data it should be able to complete my query.
But when I checked how my query is running using 'tracing on', it shows me that it connects to node-2 also, which is not needed as per my understanding.
Am I missing something here ?
Thanks in advance.
Edited ::
Added the output of 'tracing on'.
It can be seen in the image that, node 10.101.201.3 has contacted 10.101.201.4
3 node cluster and RF of 3 means every node has all the data.
Just because every node has 100% of all data, does not mean that every node has 100% of all token ranges. The token ranges in a 3 node cluster will be split-up evenly # ~33% each.
In short, node-1 may have all of the data, but it's only primarily responsible for 33% of it. When the partition key is hashed, the query is likely being directed toward node-2 because it is primarily responsible for that partition key...despite the fact that the other nodes contain secondary and tertiary replicas.
cqlsh. Does this change if I'm running the query from application code?
Yes, because the specified load balancing policy (configured in app code) can also affect this behavior.
Any reason why com.datastax.driver.core.Metadata:getHosts() would return state UP for a host that has shutdown?
However, nodetool status returns DN for that host.
No matter how many times I check Host.getState(), it still says UP for that dead host.
This is how I'm querying Metadata:
cluster = DseCluster.builder()
.addContactPoints("192.168.1.1", "192.168.1.2", "192.168.1.3")
.withPort(9042)
.withReconnectionPolicy(new ConstantReconnectionPolicy(2000))
.build();
cluster.getMetadata().getAllHosts();
EDIT: Updated code to reflect I'm trying to connect to 3 hosts. I should've stated that the cluster I'm connecting has 3 nodes, 2 in DC1 and another in DC2.
Also, whenever I relaunch my Java process running this code, the behavior changes. Sometimes it gives me the right states, then when I restart it again, it gives me the wrong states, and so on.
I will post an answer which I got from the datastaxacademy slack:
Host.getState() is the driver's view of what it thinks the host
state is, where nodetool status is what that C* node thinks the
state of all nodes in the clusters are from its view (propagated via
gossip) There is not a way to get that via the driver
We have configured 3 node cassandra cluster in RHEL 7.2 version and we are doing cluster testing. When we start cassandra in all 3 nodes they form a cluster and they work fine.
But when we bring one node down using "init 6" or "reboot" command, the rebooted node takes more time to join the cluster, however if we manually kill and start cassandra process the nodes join cluster immediately without any issues.
We have provided all 3 IPs as seed nodes and the cluster name is same for all 3 nodes and their respective IP as listen address.
Please help us in resolving this issue.
Thanks
Update
Cassandra - 3.9 version
While investigating the issue further we noticed Node 1 (rebooted
node) able to send "SYN", "ACK2" messages for both the nodes (Node
2, Node 3) even though nodetool status displays "Node 2 and 3 as
"DN"" only in "Node 1"
enter code here
After 10 - 15min we noticed "Connection Timeout" exception in Node
2 and 3. being thrown from OutboundTcpConnection.java (line # 311)
which triggers a state change event to "Node 1" and changes the
state as "UN".
if (logger.isTraceEnabled())
logger.trace("error writing to {}", poolReference.endPoint(), e);
Please let us know what triggers "Connection TimeOut" exception in "Node 2 and 3" and ways to resolve this.
We believe this issue is similar to https://issues.apache.org/jira/browse/CASSANDRA-9630
But when we bring one node down using "init 6" or "reboot" command, the rebooted node takes more time to join the cluster, however if we manually kill and start cassandra process the nodes join cluster immediately without any issues.
Remember that Cassandra writes everything to the commit log to ensure durability in case of some "plug-out-of-the-wall" event. When that happens, Cassandra reconciles the data stored on-disk with data in the commit log at start-up time. If there are differences, it could take a while for that reconciliation to complete.
That's why its important to run these commands before stopping Cassandra:
nodetool disablegossip
nodetool drain
Disabling gossip makes sure that the node you're shutting down won't take any additional requests. Drain ensures that anything in the commit log is written to disk. Once those are done, you can stop your node. Then the node restart should occur much faster.
We use vnodes on our cluster.
I noticed that when the token space of a node changes (automatically on vnodes, during a repair or a cleanup after adding new nodes), the datastax nodejs driver gets a lot of "Operation timed out - received only X responses" for a few minutes.
I tried using ONE and LOCAL_QUORUM consistencies.
I suppose this is due to the coordinator not hitting the right node just after the move. This seems to be a logical behavior (data was moved) but we really want to address this particular issue.
What do you guys suggest we should do to avoid this ? Having a custom retry policy ? Caching ? Changing the consistency ?
Example of behavior
when we see this:
4/7/2016, 10:43am Info Host 172.31.34.155 moved from '8185241953623605265' to '-1108852503760494577'
We see a spike of those:
{
"message":"Operation timed out - received only 0 responses.",
"info":"Represents an error message from the server",
"code":4608,
"consistencies":1,
"received":0,
"blockFor":1,
"isDataPresent":0,
"coordinator":"172.31.34.155:9042",
"query":"SELECT foo FROM foo_bar LIMIT 10"
}
I suppose this is due to the coordinator not hitting the right node just after the move. This seems to be a logical behavior (data was moved) but we really want to address this particular issue.
In fact, when adding new node, there will be token range movement but Cassandra can still serve read requests using the old token ranges until the scale out has finished completely. So the behavior you're facing is very suspicious.
If you can reproduce this error, please activate query tracing to narrow down the issue.
The error can also be related to a node under heavy load and not replying fast enough
I have a 3 node Cassandra cluster setup (replication set to 2) with Solr installed, each node having RHEL, 32 GB Ram, 1 TB HDD and DSE 4.8.3. There are lots of writes happening on my nodes and also my web application reads from my nodes.
I have observed that all the nodes go down after every 3-4 days. I have to do a restart of every node and then they function quite well till the next 3-4 days and again the same problem repeats. I checked the server logs but they do not show any error even when the server goes down. I am unable to figure out why is this happening.
In my application, sometimes when I connect to the nodes through the C# Cassandra driver, I get the following error
Cassandra.NoHostAvailableException: None of the hosts tried for query are available (tried: 'node-ip':9042) at Cassandra.Tasks.TaskHelper.WaitToComplete(Task task, Int32 timeout) at Cassandra.Tasks.TaskHelper.WaitToComplete[T](Task``1 task, Int32 timeout) at Cassandra.ControlConnection.Init() at Cassandra.Cluster.Init()`
But when I check the OpsCenter, none of the nodes are down. All nodes status show perfectly fine. Could this be a problem with the driver? Earlier I was using Cassandra C# driver version 2.5.0 installed from nuget, but now I updated even that to version 3.0.3 still this errors persists.
Any help on this would be appreciated. Thanks in advance.
If you haven't done so already, you may want to look at setting your logging levels to default by running: nodetool -h 192.168.XXX.XXX setlogginglevel org.apache.cassandra DEBUG on all your nodes
Your first issue is most likely an OutOfMemory Exception.
For your second issue, the problem is most likely that you have really long GC pauses. Tailing /var/log/cassandra/debug.log or /var/log/cassandra/system.log may give you a hint but typically doesn't reveal the problem unless you are meticulously looking at the timestamps. The best way to troubleshoot this is to ensure you have GC logging enabled in your jvm.options config and then tail your gc logs taking note of the pause times:
grep 'Total time for which application threads were stopped:' /var/log/cassandra/gc.log.1 | less
The Unexpected exception during request; channel = [....] java.io.IOException: Error while read (....): Connection reset by peer error is typically inter-node timeouts. i.e. The coordinator times out waiting for a response from another node and sends a TCP RST packet to close the connection.