I have a cassandra(version 2.2.8) cluster of total 6 nodes, where 3 nodes are seed nodes. One of the seed node recently went down. I need to replace that dead seed node. My cluster is setup in a way where it cannot survive a loss of more than 1 node. I read this documentation to replace the dead seed node.
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html
As per the documentation, I am scared to remove the dead seed node from the seed list and do a rolling restart. If for any reason any node doesnt start, Ill lose the data.
How to approach this scenario? Is it ok to not remove the dead seed node from the seed list until the new node is fully up and running? As i already have two working seed nodes already present in the seed list. Please advice.
In short: Yes it is okay to wait with removing the seed node.
Explanation: Seed node configuration do two things:
When adding new nodes. The new node will read the seed configuration to get a first contact point to the Cassandra cluster. After the node has joined the cluster it will save information about all Cassandra nodes in it's system.peers table. For all future starts it will use this information to connect to the cluster, not the seed node configuration.
Cassandra also uses seeds as a way to improve gossip. Basically, seed nodes are more likely to receive gossip messages than normal nodes. This improves the speed at which nodes receive updates about other nodes, such as the status.
Losing a seed node will in your case only impact 2. Since you have two more seed nodes I don't see this as a big issue. I would still do a rolling restart on all nodes as soon as you have updated your seed configuration.
Related
I have simple cassandra cluster, 1 seed and 2 node.
I understand from redundancy purpose there should be two node, but my question is, "for some reason if all my seed nodes got deleted, how to create and new seed node and make it join my running cluster".
Assuming you have at least one node that survives (doesn't have to be a seed), simply add the node's IP to the seed list and it becomes a seed node. You don't need the node that was destroyed at all (as long as the RF > 1 and nodes are in sync (i.e. no missing data)). So if you had, say, a 3 node cluster and node 1 was the only seed. If node 1 was distroyed/lost, simply change the cassandra.yaml on the other nodes to have the seed list point to any remaining node and you're done. Either restart cassandra or, as stated above, use nodetool to reload the seeds.
Is there the proper way of how to remove a node from the seed list in Cassandra's cluster? I just want to reduce the seeds list without actual replacing or decomission of a node. Is that possible?
Yes, just remove the particular node from the seed list on all nodes, and either perform a rolling restart, or call nodetool reloadseeds (if you have version of Cassandra that has this subcommand - I don't remember in which version it was introduced...)
Remove node ip/hostname from seed list from all nodes of Cassandra cluster and take a rolling restart of all nodes.
I understand the seed nodes play the role of a "first point of contact" in the gossip protocol which allows the nodes to discover and maintain the list of their peers.
The documentation recommends the following.
A) Start the seeds one at a time, then the remaining nodes in the cluster.
B) Define the same seeds on each node.
I am not sure why. Below is my best guess.
A) If all the nodes started at the same time, one of them might contact a seed which does not have the right information yet. This might prevent/delay the lists of peers convergence.
B) This is because the seeds have more up-to-date information than the other nodes. As such the lists of peers converge faster if all the nodes use the same seeds.
Are my guesses correct? Is there any other reason? Are there cases where the lists of peers cannot converge because the seeds are configured differently on each node, or the nodes are not started in the right order?
It's important to point out that seed nodes are only relevant when a node first joins the cluster, e.g. when adding a new node to manage capacity. The node will contact the seed node as the first point of contact to get the current node list and topology. After that, the list is maintained through gossip, even when a node goes down and comes back online, so once a node has joined the cluster, seeds are no longer relevant.
To me, those recommendations are to ensure a clean startup of a new cluster.
A) For obvious reasons, the seeds should be started before non-seed nodes. Starting seed nodes one at a time ensure that nodes which start after the first will be able to contact an existing node you start up and get the cluster info. Otherwise they may think they are the first node in the cluster and spend time doing initial setup -- the nodes should eventually converge to form the desired cluster, but it may take longer.
B) If you have different seeds on each node, then you may inadvertently setup a node startup order which has to be followed for successful cluster creation. e.g. if C and D has seeds A and B, but E has seeds C,D, then you will need to set your configuration to start C or D after A or B, and then start E after that, even though they may all be seed nodes.
This rule lets you set your configuration to: Start seeds (any order) then start non-seeds (any order).
We are deploying our application to production this month and our stack will include a 3 node, single datacenter Cassandra version 1.2 cluster. In anticipation of this, we have been getting our initial Cassandra.yaml settings worked out. While doing this I ran into a interesting situation for which I haven't been able to find an answer.
This has to do with setting the -seeds parameter in each of the nodes Cassandra.yaml files. All of the reading I've done say it is best practice to:
Have at least 2 seeds per datacenter. This makes sense so that one of the nodes can come down and other nodes can be seeded by the second seed.
These two seeds should be the same for all (in our case 3) nodes.
In the deployment I tested this on, I started out with all three nodes having a single seed, node 1's IP address. My intention was to change the seeds of all three nodes to the IP address of node1 and node2. First I did node 3 by:
decommissioning the node.
Shutting down Cassandra.
changing the -seeds value to ip_node1,ip_node2
starting up Cassandra.
running nodetool status to ensure the node was added back to the cluster.
Next I did node 2, following the exact same steps I did for node 3. But something unexpected happened. When I restarted Cassandra on node 2, it did not join the existing ring. Instead it started its own single node ring. It seems pretty obvious that of the two seed parameters I passed it, it used its own IP address and thus believed it was the first node in a new ring.
I was surprised Cassandra didn't select the seed argument of the other seed value I passed it (node 2's). The only way I could get it to join the existing datacenter was to set its seeds to one or both of the other nodes in the cluster.
An obvious work around to this is to configure each of my three nodes seeds value to the IP addresses of the other two nodes in the cluster. But since several sources have suggested this isn't a "Best Practice" I thought I'd ask how this should be handled. So my question is:
Is it normal for Cassandra to always use its own IP address as a seed if it is in the seed list?
Is configuring the cluster the way I've suggested, which goes against best practice a huge issue?
This might not be the solution to your question but did you compare all your cassandra.yaml files?
They should all be the same, apart from things like listen_address.
Is it possible you might have had a whitespace or typo in the cluster name also?
I just thought I'd mention it as something good to check.
I'm interested in speeding up the process of bootstrapping a cluster and adding/removing nodes (Granted, in the case of node removal, most time will be spend draining the node). I saw in the source code that nodes that are seeds are not bootstrapped, and hence do not sleep for 30 seconds while waiting for gossip to stabilize. Thus, if all nodes are declared to be seeds, the process of creating a cluster will run 30 seconds faster. My question is is this ok? and what are the downsides of this? Is there a hidden requirement in cassandra that we have at least one non-seed node to perform a bootstrap (as suggested in the answer to the following question)? I know I can shorten RING_DELAY by modifying /etc/cassandra/cassandra-env.sh, but if simply setting all nodes to be seeds would be better or faster in some way, that might be better. (Intuitively, there must be a downside to setting all nodes to be seeds since it appears to strictly improve startup time.)
Good question. Making all nodes seeds is not recommended. You want new nodes and nodes that come up after going down to automatically migrate the right data. Bootstrapping does that. When initializing a fresh cluster without data, turn off bootstrapping. For data consistency, bootstrapping needs to be on at other times. A new start-up option -Dcassandra.auto_bootstrap=false was added to Cassandra 2.1: You start Cassandra with the option to put auto_bootstrap=false into effect temporarily until the node goes down. When the node comes back up the default auto_bootstrap=true is back in effect. Folks are less likely to go on indefinitely without bootstrapping after creating a cluster--no need to go back and forth configuring the yaml on each node.
In multiple data-center clusters, the seed list should include at least one node from each data center. To prevent partitions in gossip communications, use the same list of seed nodes in all nodes in a cluster. This is critical the first time a node starts up.
These recommendations are mentioned on several different pages of 2.1 Cassandra docs: http://www.datastax.com/documentation/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html.