How to recover Cassandra node from failed bootstrap - cassandra

A node when down while bootstrapping a new node, and the bootstrapping failed. The node shut down, leaving the following messages in its log:
INFO [main] 2015-02-07 06:03:32,761 StorageService.java:1025 - JOINING: Starting to bootstrap...
ERROR [main] 2015-02-07 06:03:32,799 CassandraDaemon.java:465 - Exception encountered during startup
java.lang.RuntimeException: A node required to move the data consistently is down (/10.0.3.56). If you wish to move the data from a potentially inconsistent replica, restart the node with -Dcassandra.consistent.rangemovement=false
How do I recover the situation? Can I restart the bootstrap process once the failed node is back online? Or do I need to revert the partial bootstrap and try again somehow?
I have tracked down the original cause. The new node was able to connect to the node at 10.0.3.56, but 10.0.3.56 was not able to open connections back to the new node. 10.0.3.56 contained the only copy of some data that needed to be moved to the new node (replication factor == 1), but its attempts to send the data were blocked.

Since this involves data move, not just replication, and based on the place in the code where exception is thrown, I assume you are trying to replace a dead node as it is described here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
By the look of it, the node did not get to joining the ring. You can certainly doublecheck by running nodetool status, if the node has joined at all.
If not then you can simply delete all from the data, commitlog and saved_caches, and restart the process. What was wrong with that 10.0.2.56 node?
If this node has joined the ring then it should be still safe to simply restart it once you start node 10.0.2.56 up.

Related

Cassandra node not joining cluster after reboot

We have configured 3 node cassandra cluster in RHEL 7.2 version and we are doing cluster testing. When we start cassandra in all 3 nodes they form a cluster and they work fine.
But when we bring one node down using "init 6" or "reboot" command, the rebooted node takes more time to join the cluster, however if we manually kill and start cassandra process the nodes join cluster immediately without any issues.
We have provided all 3 IPs as seed nodes and the cluster name is same for all 3 nodes and their respective IP as listen address.
Please help us in resolving this issue.
Thanks
Update
Cassandra - 3.9 version
While investigating the issue further we noticed Node 1 (rebooted
node) able to send "SYN", "ACK2" messages for both the nodes (Node
2, Node 3) even though nodetool status displays "Node 2 and 3 as
"DN"" only in "Node 1"
enter code here
After 10 - 15min we noticed "Connection Timeout" exception in Node
2 and 3. being thrown from OutboundTcpConnection.java (line # 311)
which triggers a state change event to "Node 1" and changes the
state as "UN".
if (logger.isTraceEnabled())
logger.trace("error writing to {}", poolReference.endPoint(), e);
Please let us know what triggers "Connection TimeOut" exception in "Node 2 and 3" and ways to resolve this.
We believe this issue is similar to https://issues.apache.org/jira/browse/CASSANDRA-9630
But when we bring one node down using "init 6" or "reboot" command, the rebooted node takes more time to join the cluster, however if we manually kill and start cassandra process the nodes join cluster immediately without any issues.
Remember that Cassandra writes everything to the commit log to ensure durability in case of some "plug-out-of-the-wall" event. When that happens, Cassandra reconciles the data stored on-disk with data in the commit log at start-up time. If there are differences, it could take a while for that reconciliation to complete.
That's why its important to run these commands before stopping Cassandra:
nodetool disablegossip
nodetool drain
Disabling gossip makes sure that the node you're shutting down won't take any additional requests. Drain ensures that anything in the commit log is written to disk. Once those are done, you can stop your node. Then the node restart should occur much faster.

Cassandra nodejs driver time out after a node moves

We use vnodes on our cluster.
I noticed that when the token space of a node changes (automatically on vnodes, during a repair or a cleanup after adding new nodes), the datastax nodejs driver gets a lot of "Operation timed out - received only X responses" for a few minutes.
I tried using ONE and LOCAL_QUORUM consistencies.
I suppose this is due to the coordinator not hitting the right node just after the move. This seems to be a logical behavior (data was moved) but we really want to address this particular issue.
What do you guys suggest we should do to avoid this ? Having a custom retry policy ? Caching ? Changing the consistency ?
Example of behavior
when we see this:
4/7/2016, 10:43am Info Host 172.31.34.155 moved from '8185241953623605265' to '-1108852503760494577'
We see a spike of those:
{
"message":"Operation timed out - received only 0 responses.",
"info":"Represents an error message from the server",
"code":4608,
"consistencies":1,
"received":0,
"blockFor":1,
"isDataPresent":0,
"coordinator":"172.31.34.155:9042",
"query":"SELECT foo FROM foo_bar LIMIT 10"
}
I suppose this is due to the coordinator not hitting the right node just after the move. This seems to be a logical behavior (data was moved) but we really want to address this particular issue.
In fact, when adding new node, there will be token range movement but Cassandra can still serve read requests using the old token ranges until the scale out has finished completely. So the behavior you're facing is very suspicious.
If you can reproduce this error, please activate query tracing to narrow down the issue.
The error can also be related to a node under heavy load and not replying fast enough

Cassandra Nodes Going Down

I have a 3 node Cassandra cluster setup (replication set to 2) with Solr installed, each node having RHEL, 32 GB Ram, 1 TB HDD and DSE 4.8.3. There are lots of writes happening on my nodes and also my web application reads from my nodes.
I have observed that all the nodes go down after every 3-4 days. I have to do a restart of every node and then they function quite well till the next 3-4 days and again the same problem repeats. I checked the server logs but they do not show any error even when the server goes down. I am unable to figure out why is this happening.
In my application, sometimes when I connect to the nodes through the C# Cassandra driver, I get the following error
Cassandra.NoHostAvailableException: None of the hosts tried for query are available (tried: 'node-ip':9042) at Cassandra.Tasks.TaskHelper.WaitToComplete(Task task, Int32 timeout) at Cassandra.Tasks.TaskHelper.WaitToComplete[T](Task``1 task, Int32 timeout) at Cassandra.ControlConnection.Init() at Cassandra.Cluster.Init()`
But when I check the OpsCenter, none of the nodes are down. All nodes status show perfectly fine. Could this be a problem with the driver? Earlier I was using Cassandra C# driver version 2.5.0 installed from nuget, but now I updated even that to version 3.0.3 still this errors persists.
Any help on this would be appreciated. Thanks in advance.
If you haven't done so already, you may want to look at setting your logging levels to default by running: nodetool -h 192.168.XXX.XXX setlogginglevel org.apache.cassandra DEBUG on all your nodes
Your first issue is most likely an OutOfMemory Exception.
For your second issue, the problem is most likely that you have really long GC pauses. Tailing /var/log/cassandra/debug.log or /var/log/cassandra/system.log may give you a hint but typically doesn't reveal the problem unless you are meticulously looking at the timestamps. The best way to troubleshoot this is to ensure you have GC logging enabled in your jvm.options config and then tail your gc logs taking note of the pause times:
grep 'Total time for which application threads were stopped:' /var/log/cassandra/gc.log.1 | less
The Unexpected exception during request; channel = [....] java.io.IOException: Error while read (....): Connection reset by peer error is typically inter-node timeouts. i.e. The coordinator times out waiting for a response from another node and sends a TCP RST packet to close the connection.

How to start/restart the Cassandra node efficiently with auto_bootstrap property

My Understanding on auto_bootstrap is
Below are my understanding about auto_bootstrap property. At first, please correct me if I am wrong at any point.
Initially the property ‘auto_bootstrap’ will not be available in the cassandra.yaml file. This means that the default value was ‘true’.
true - this means that bootstrap/stream the data to the respective node from all the other nodes while starting/restarting
false - do not stream the data while starting/restarting
Where do we need ‘auto_bootstrap: true’
1) When a new node needs to be added in the existing cluster, this needs to set to ‘true’ to bootstrap the data automatically from all the other nodes in the cluster. This will take some considerable amount of time (based on the current load of the cluster) to get the new node added in the cluster. But this will make the load balance automatically in the cluster.
Where do we need ‘auto_bootstrap: false’
1) When a new node needs to be added quickly in the existing cluster without bootstrapping the data, this needs to set to ‘false’. The new node will be added quickly irrespective of the current load of the cluster. Later we need to manually stream the data to the new node to make cluster load balanced.
2) When initializing the fresh cluster with no data, this needs to set to ‘false’. At least the first seed node to be started/added in the fresh cluster should have the value as ‘false’.
My Question is
We are using Cassandra 2.0.3 of six nodes with two data centers (each has 3 nodes). Our Cassandra is a stand-alone process (not service). I am going to change few properties in cassandra.yaml file for one node. It is apparent that node should be restarted after updating the cassandra.ymal file to take the changes effect. Our cluster is loaded with huge data.
How to restart the node
After killing the node, I can simply restart the node as below
$ cd install_location
$ bin/cassandra
This means that restart the node with no auto_bootstrap property (default is true).
with 'true'
1) The node to be restarted currently has its own huge data. Does the node bootstrap again all its own data and replace the existing data.
2) Will it take more time the node to join the cluster again.
with 'false'
I do not want to bootstrap the data. So
3) Can I add the property as auto_bootstrap: false and restart the node as mentioned above.
4) After successful restart I will go and delete the auto_bootstrap property. Is that okay?
Else
5) As I am restarting the node with the same ip address, Will the cluster automatically identify that this is an existing node through gossip info and hence restart the node without streaming the data despite auto_bootstrap is set to true or not present in cassandra.yaml file?
As I am restarting an existing node with the same ip address, restart will happen without streaming any data despite the value of auto_bootstrap. So we can merely restart the existing node without touching any parameters. So option 5 fits here.
First of all, you should always run
nodetool drain
on the node before killing Cassandra so that client connections/ongoing operations have a chance to gracefully complete.
Assuming that the node was fully bootstrapped & had status "Up" and "Joined": when you start Cassandra up again, the node will not need to bootstrap again since it's already joined the cluster & taken ownership of certain sets of tokens. However, it will need to catch up with the data that has been mutated since it was down. Therefore, the commitlogs that occurred during that time will be streamed to the node and the changes will be applied. So, it will take much less time to start up after it has bootstrapped once. Just don't leave it down for too long.
You should not set auto_bootstrap to false unless you're creating the first seed node for a new cluster.
The node will be identified as a pre-existing node which has tokens assigned to it by virtue of the host id that is assigned to it when it joins the cluster. The IP address does not matter unless it is a seed node.

Cassandra 2.1.2 node stuck on joining the cluster

I'm trying but failing to join a new (well old, but wiped out) node to an existing cluster.
Currently cluster consists of 2 nodes and runs C* 2.1.2. I start a third node with 2.1.2, it gets to joining state, it bootstraps, i.e. streams some data as shown by nodetool netstats, but after some time, it gets stuck. From that point nothing gets streamed, the new node stays in joining state. I restarted node twice, everytime it streamed more data, but then got stuck again. (I'm currently on a third round like that).
Other facts:
I don't see any errors in the log on any of the nodes.
The connectivity seems fine, I can ping, netcat to port 7000 all ways.
I have 267 GB load per running node, replication 2, 16 tokens.
Load of a new node is around 100GBs now
I'm guessing that the node after few rounds of restarts, will finally suck in all of the data from running nodes and join the cluster. But definitely it's not the way it should work.
EDIT: I discovered some more info:
The bootstrapping process stops in the middle of streaming some table, always after sending exactly 10MB of some SSTable, e.g.:
$ nodetool netstats | grep -P -v "bytes\(100"
Mode: NORMAL
Bootstrap e0abc160-7ca8-11e4-9bc2-cf6aed12690e
/192.168.200.16
Sending 516 files, 124933333900 bytes total
/home/data/cassandra/data/leadbullet/page_view-2a2410103f4411e4a266db7096512b05/leadbullet-page_view-ka-13890-Data.db 10485760/167797071 bytes(6%) sent to idx:0/192.168.200.16
Read Repair Statistics:
Attempted: 2016371
Mismatch (Blocking): 0
Mismatch (Background): 168721
Pool Name Active Pending Completed
Commands n/a 0 55802918
Responses n/a 0 425963
I can't diagnose the error & I'll be grateful for any help!
Try to telnet from one node to another using correct port.
Make sure you are joining the correct name cluster.
Try use: nodetool repair
You might be pinging the external IP addressed, and your cluster communicates using internal IP addresses.
If you are running on Amazon AWS, make sure you have firewall open on both internal IP addresses.

Resources