Cassandra node decommission stuck in UL - cassandra

Cassandra decommission is in UL state , today is 5th day of process. Streams completed within 3 days only, since then state is UL .
Cluster details :
5 nodes , RF 2 , all in same DC.
Analysis :
No streams in netstats command.
Data size of remaining nodes has increased.
I can see writing logs coming in the decommissioned node log file.
No error is there on any node of cluster.
Please comment on :
Is this decommission process completed, how to know more about it ?
If it is stuck , restarting a node will not help , because it will start from fresh.
Any specific property required to be added ?
Any specific line in log which can tell about decommissioning process?

I have seen the decommission process get stuck sometimes. Here are the steps that I'd recommend:
Stop the node and remove it from the cluster. This operation will also stream data in an attempt to prevent data loss.
Run a repair (on each node), and then check for data consistency.
If you notice data is missing, re-add the node, and retry the decommission process.

Related

nodetool decommision is behaving strange

I tried removing a node from a cluster by issuing "nodetool decommission"
and have seen netstats to find out how much data is being distributed to other nodes which is all fine.
After the node has been decommissioned, I could see the status of few nodes in a cluster as 'UD' when I run nodetool status on few nodes(Not the one I decommissioned) and few nodes are showing 'UN' as status
I'm quite confused about why the status on nodes is showing such behavior, and not same on all nodes after the decommissioned the node.
Am I missing any steps before and after?
Any comments/Help is highly appreciated!
If gossip information is not the same in all nodes, then you should do a rolling restart on the cluster. That will make gossip reset in all nodes.
Was the node you removed a seed node? If it was, don't forget to remove the IP from the cassandra.yaml in all nodes.

Scale Cassandra with copying data manually

I created an AMI from my cassandra machine and then launched a new instance. After making config changes(setting the seed node to the first one, and setting auto_bootstrap: false) when I start cassandra and do a nodetool status it shows data on the both the nodes. I just want to know if the cluster actually knows that both nodes have the data and if a request comes can route it to the second node also.
As without manually copying data, the streaming is actually not getting completed. It somehow manages to fail after a certain period of time and then I have to again run 'nodetool bootstrap resume' to restart bootstraping process which again fails.
I don't think this should work this way (all the copying thing).
Why you can't perform normal bootstrapping? What are error messages in the logs when you try to do it? What is RF of your keyspace?
In addition to your data, Cassandra also saves information about the node on disk, all the system tables, for example node id, so you can't just replicate the image. If you copied cassandra image, and just changed config, this wouldn't work, you should delete all data prior to starting the node and joining to cluster.
EDIT:
If you going with auto_bootstrap: off
Remove all the data from the new server (both data and commit log directories).
Start the node, and after it joins, run rebuild.
Run repair after the process is finished.
If you going with auto_bootstrap: on
Remove all the data from the new server (both data and commit log directories).
Start the node and monitor the bootstraping.
Before trying these, remove the node you can't add from the cluster.

Cassandra removing node : is it possible to stop decommission and lauch removenode?

I have an issue with a cassandra node on a 10 nodes cluster.
I first launched a decommission on that node to remove it from the cluster.
The decommission is currently active but the load on this node is such that it takes an infinite time and I would like to go faster.
What I thought to do was to stop this node and launch a removenode from another one.
The DataStax documentation explains that we should use decommission or removenode depending on the up/down status of the node. But there is no information about removenode while targeted node has already leaving status.
So my question is: Is it possible to launch a removenode of a stopped node while this one has already a leaving status?
So my question is: Is it possible to launch a removenode of a stopped node while this one has already a leaving status?
I had to do this last week, so "yes" it is possible.
Just be careful, though. At the time, I was working on bringing up a new DC in a non-production environment, so I didn't care about losing the data that was on the node (or in the DC, for that matter).
What I thought to do was to stop this node and launch a removenode from another one.
You can do exactly that. Get the Host ID of the node you want to drop, and run:
$ nodetool removenode 2e143c2b-0571-4c5d-22d5-9a2668648710
And if that gets stuck, ctrlc out of it, and (on the same node) you can run:
$ nodetool removenode force
Decommissioning a node in Cassandra can only be stopped , if that node is restarted .
Its status will change from UL to UN .
This approach is tested and cassandra cluster worked well afterwards.
Following this safe approach , trigger nodetool remove for consistent data.

Bootstrapping new node time/issues

I have an existing 3 node Cassandra cluster. Added 3 new nodes to the cluster. The nodes are still "bootstrapping", but I added the nodes 3 days ago. I'm really concerned about this process.
1) How long does bootstrapping typically take? We have about 40GB on each node.
2) All of the new nodes have died at least once during bootstrapping with no cause given in the logs. Are there any known issues around this?
Using Cassandra 2.0.6 on Ubuntu 12.04. Any help or advice would be greatly appreciated.
Usually Bootstrapping is process to communicate the brand new node to the existing nodes in a cluster. It is recommended to allow 2-3 minutes to complete the bootstrapping process.
Do the new nodes have the same cassandra configuration ? While adding a new node, from the console you can run nodetool cfstats command to check either any streams going on.

Cassandra - Removing a node from the cluster

I have a cluster with three nodes and I need to remove one node. How can I make sure the data from the node to be removed will be replicated to the two other nodes before I actually remove it? Is this done using snapshots? How should I proceed?
From the doc
You can take a node out of the cluster with nodetool decommission to a
live node, or nodetool removenode (to any other machine) to remove a
dead one. This will assign the ranges the old node was responsible for
to other nodes, and replicate the appropriate data there. If
decommission is used, the data will stream from the decommissioned
node. If removenode is used, the data will stream from the remaining
replicas.
You want to run nodetool decommission on the node you want to remove. This will cause the node to stream all its data to the other nodes and then remove itself from the ring.

Resources