I have a cluster with three nodes and I need to remove one node. How can I make sure the data from the node to be removed will be replicated to the two other nodes before I actually remove it? Is this done using snapshots? How should I proceed?
From the doc
You can take a node out of the cluster with nodetool decommission to a
live node, or nodetool removenode (to any other machine) to remove a
dead one. This will assign the ranges the old node was responsible for
to other nodes, and replicate the appropriate data there. If
decommission is used, the data will stream from the decommissioned
node. If removenode is used, the data will stream from the remaining
replicas.
You want to run nodetool decommission on the node you want to remove. This will cause the node to stream all its data to the other nodes and then remove itself from the ring.
Related
I had a cluster with 2 nodes (node 1 and node 2).
After decommissioning node 2 I wanted to use the server as a fresh Cassandra database for other purposes, but as soon as I restart this message appears:
org.apache.cassandra.exceptions.ConfigurationException: This node was
decommissioned and will not rejoin the ring unless
cassandra.override_decommission=true has been set, or all existing
data is removed and the node is bootstrapped again
So I removed all existing data.
But I don't want the node to be bootstrapped again (neither rejoin the previous ring) but to be a fresh new and pure Cassandra database to be used.
The old node is not on the seed list.
Cassandra version: 3.9
EDIT: I think I was missunderstood, sorry for that. After the decommission I want to have:
Db1: node 1
Db2: node 2
Two diferent databases with no correlation, totally separated. That's because we want to reuse the machine where node2 is hosted again to deploy a Cassandra DB in another enviroment.
Don't use override_decommission. That flag is only used for rejoining the same cluster.
You should remove all data files on the node (Cassandra will recreate system tables on start). Most importantly you need to change the seed in cassandra.yaml. I suspect that it is still the ip of node 1, so you need to change it to node 2 (itself).
Use option
cassandra.override_decommission: true
Use that option, cassandra.override_decommission=true. Also, be aware what is the definition of cluster_name is cassandra.yaml:
The name of the cluster. This setting prevents nodes in one logical
cluster from joining another. All nodes in a cluster must have the
same value.
So, to be sure, also use another value for cluster_name option in cassandra.yaml.
Try these steps:
run in cqlsh: UPDATE system.local SET cluster_name = 'new_name'
where key='local';
nodetool flush in order to persist the data
nodetool decommission
stop node
change name in cassandra.yaml
clean node sudo rm -rf /var/lib/cassandra/* /var/log/cassandra/* but I would just move those file in some other place until you get the state that you want
start node
Please check 1, 2
I have a 4 node cluster and will be adding an additional node in two days. We aren't using vnodes.
Just wondering the best way to rebalance the cluster after I'm done. Do I just bring the new node up and then start the nodetool move?
Or do I shut each node down, change the initial_token value for each one (using one of those generators to calculate the values for me) and then bring each node up?
I just want to know the simplest way to do this from command line. The new node already has Cassandra installed as it was initially a non-production server, I will delete the data off of the node and change the config files accordingly for the new cluster it will now be a part of, just unsure as to the other steps.
From this page Adding or replacing single-token nodes, the simplest mechanism is to start the new node with it's initial-token left empty in cassandra.yaml. This will make the cluster 'split the token range of the heaviest loaded node and position the new node there'. This won't give you a balanced cluster.
If you want a balanced cluster then you have to go through the nodetool move, node restart, nodetool cleanup procedure you mentioned.
I remove a dead node from my Cassandra cluster using nodetool removenode.
Running nodetool status after indicates that the deleted node no longer exists. However when the dead node comes online, nodetool status indicates that the removed node is UP.
Is there any command that prevents dead node from joining the cluster when it is back online?
Once you remove the node you should change cluster name in cassandra.yaml. This will prevent this node from rejoining.
The other methods, like changing seeds may or may not work, depending on how the node was expelled from the cluster. It may still have the cluster info in the cache and therefore will use that instead of the list of seed nodes.
But changing the cluster name will 100% prevent it from rejoining.
Our cluster is unbalanced and most of the data is in one node. Now the node which as most of the data is dead because of out of space.
How to bring the node UP with ready only mode and rebalance the cluster?
We are using vnodes and DSE 4.0.3
There is no explicit read only mode for Cassandra. As such you're likely to need to temporarily add some disk space to the node to get it online and then rebalance the cluster.
If that's not an option then removing snapshots can sometimes give you enough space to get going. Running nodetool cleanup can also help if it's not previously been run.
If you're using vnodes - then a common problem is if you've converted an old style token node to vnodes. The node will just grab an even range of tokens that maps to it's original range. If the other nodes in the cluster have randomly generated tokens it'll lead to a huge imbalance between them. Decommissioning the node and then re-adding it should resolve the problem.
The output of nodetool ring will show us if that's happened. Actually the chances are a decom and re-add will be the solution in any case.
Use nodetool drain to stop receiving writes on a certain node.
Is it possible to add a new node to an existing cluster in cassandra 1.2 without running nodetool cleanup on each individual node once data has been added?
It probably isn't but I need to ask because I'm trying to create an application where each user's machine is a server allowing for endless scaling.
Any advice would be appreciated.
Yes, it is possible. But you should be aware of the side-effects of not doing so.
nodetool cleanup purges keys that are no longer allocated to that node. According to the Apache docs, these keys count against the allocated data for that node, which can cause the auto bootstrap process for the next node to not properly balance the ring. So depending on how you are bringing new user machines into the ring, this may or may not be a problem.
Also keep in mind that nodetool cleanup only needs to be run on nodes that lost keyspace to the new node - i.e. adjacent nodes, not all nodes, in the cluster.