Reasons for the recommendations about Cassandra seed nodes configuration? - cassandra

I understand the seed nodes play the role of a "first point of contact" in the gossip protocol which allows the nodes to discover and maintain the list of their peers.
The documentation recommends the following.
A) Start the seeds one at a time, then the remaining nodes in the cluster.
B) Define the same seeds on each node.
I am not sure why. Below is my best guess.
A) If all the nodes started at the same time, one of them might contact a seed which does not have the right information yet. This might prevent/delay the lists of peers convergence.
B) This is because the seeds have more up-to-date information than the other nodes. As such the lists of peers converge faster if all the nodes use the same seeds.
Are my guesses correct? Is there any other reason? Are there cases where the lists of peers cannot converge because the seeds are configured differently on each node, or the nodes are not started in the right order?

It's important to point out that seed nodes are only relevant when a node first joins the cluster, e.g. when adding a new node to manage capacity. The node will contact the seed node as the first point of contact to get the current node list and topology. After that, the list is maintained through gossip, even when a node goes down and comes back online, so once a node has joined the cluster, seeds are no longer relevant.
To me, those recommendations are to ensure a clean startup of a new cluster.
A) For obvious reasons, the seeds should be started before non-seed nodes. Starting seed nodes one at a time ensure that nodes which start after the first will be able to contact an existing node you start up and get the cluster info. Otherwise they may think they are the first node in the cluster and spend time doing initial setup -- the nodes should eventually converge to form the desired cluster, but it may take longer.
B) If you have different seeds on each node, then you may inadvertently setup a node startup order which has to be followed for successful cluster creation. e.g. if C and D has seeds A and B, but E has seeds C,D, then you will need to set your configuration to start C or D after A or B, and then start E after that, even though they may all be seed nodes.
This rule lets you set your configuration to: Start seeds (any order) then start non-seeds (any order).

Related

Replacing a seed node without removing it from seed list

I have a cassandra(version 2.2.8) cluster of total 6 nodes, where 3 nodes are seed nodes. One of the seed node recently went down. I need to replace that dead seed node. My cluster is setup in a way where it cannot survive a loss of more than 1 node. I read this documentation to replace the dead seed node.
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html
As per the documentation, I am scared to remove the dead seed node from the seed list and do a rolling restart. If for any reason any node doesnt start, Ill lose the data.
How to approach this scenario? Is it ok to not remove the dead seed node from the seed list until the new node is fully up and running? As i already have two working seed nodes already present in the seed list. Please advice.
In short: Yes it is okay to wait with removing the seed node.
Explanation: Seed node configuration do two things:
When adding new nodes. The new node will read the seed configuration to get a first contact point to the Cassandra cluster. After the node has joined the cluster it will save information about all Cassandra nodes in it's system.peers table. For all future starts it will use this information to connect to the cluster, not the seed node configuration.
Cassandra also uses seeds as a way to improve gossip. Basically, seed nodes are more likely to receive gossip messages than normal nodes. This improves the speed at which nodes receive updates about other nodes, such as the status.
Losing a seed node will in your case only impact 2. Since you have two more seed nodes I don't see this as a big issue. I would still do a rolling restart on all nodes as soon as you have updated your seed configuration.

Cassandra seed values in a three node datacenter

We are deploying our application to production this month and our stack will include a 3 node, single datacenter Cassandra version 1.2 cluster. In anticipation of this, we have been getting our initial Cassandra.yaml settings worked out. While doing this I ran into a interesting situation for which I haven't been able to find an answer.
This has to do with setting the -seeds parameter in each of the nodes Cassandra.yaml files. All of the reading I've done say it is best practice to:
Have at least 2 seeds per datacenter. This makes sense so that one of the nodes can come down and other nodes can be seeded by the second seed.
These two seeds should be the same for all (in our case 3) nodes.
In the deployment I tested this on, I started out with all three nodes having a single seed, node 1's IP address. My intention was to change the seeds of all three nodes to the IP address of node1 and node2. First I did node 3 by:
decommissioning the node.
Shutting down Cassandra.
changing the -seeds value to ip_node1,ip_node2
starting up Cassandra.
running nodetool status to ensure the node was added back to the cluster.
Next I did node 2, following the exact same steps I did for node 3. But something unexpected happened. When I restarted Cassandra on node 2, it did not join the existing ring. Instead it started its own single node ring. It seems pretty obvious that of the two seed parameters I passed it, it used its own IP address and thus believed it was the first node in a new ring.
I was surprised Cassandra didn't select the seed argument of the other seed value I passed it (node 2's). The only way I could get it to join the existing datacenter was to set its seeds to one or both of the other nodes in the cluster.
An obvious work around to this is to configure each of my three nodes seeds value to the IP addresses of the other two nodes in the cluster. But since several sources have suggested this isn't a "Best Practice" I thought I'd ask how this should be handled. So my question is:
Is it normal for Cassandra to always use its own IP address as a seed if it is in the seed list?
Is configuring the cluster the way I've suggested, which goes against best practice a huge issue?
This might not be the solution to your question but did you compare all your cassandra.yaml files?
They should all be the same, apart from things like listen_address.
Is it possible you might have had a whitespace or typo in the cluster name also?
I just thought I'd mention it as something good to check.

Is it ok to set all cassandra nodes as seeds?

I'm interested in speeding up the process of bootstrapping a cluster and adding/removing nodes (Granted, in the case of node removal, most time will be spend draining the node). I saw in the source code that nodes that are seeds are not bootstrapped, and hence do not sleep for 30 seconds while waiting for gossip to stabilize. Thus, if all nodes are declared to be seeds, the process of creating a cluster will run 30 seconds faster. My question is is this ok? and what are the downsides of this? Is there a hidden requirement in cassandra that we have at least one non-seed node to perform a bootstrap (as suggested in the answer to the following question)? I know I can shorten RING_DELAY by modifying /etc/cassandra/cassandra-env.sh, but if simply setting all nodes to be seeds would be better or faster in some way, that might be better. (Intuitively, there must be a downside to setting all nodes to be seeds since it appears to strictly improve startup time.)
Good question. Making all nodes seeds is not recommended. You want new nodes and nodes that come up after going down to automatically migrate the right data. Bootstrapping does that. When initializing a fresh cluster without data, turn off bootstrapping. For data consistency, bootstrapping needs to be on at other times. A new start-up option -Dcassandra.auto_bootstrap=false was added to Cassandra 2.1: You start Cassandra with the option to put auto_bootstrap=false into effect temporarily until the node goes down. When the node comes back up the default auto_bootstrap=true is back in effect. Folks are less likely to go on indefinitely without bootstrapping after creating a cluster--no need to go back and forth configuring the yaml on each node.
In multiple data-center clusters, the seed list should include at least one node from each data center. To prevent partitions in gossip communications, use the same list of seed nodes in all nodes in a cluster. This is critical the first time a node starts up.
These recommendations are mentioned on several different pages of 2.1 Cassandra docs: http://www.datastax.com/documentation/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html.

Consequences of all Cassandra Nodes being Seeds?

Is there any reason why it would be bad to have all Cassandra nodes be in the seed nodes list?
We're working on automated deployments of Cassandra, and so can easily maintain a list of every node that is supposed to be in the cluster and distribute this as a list of seed nodes to all existing nodes, and all new nodes on startup.
All the documentation I can find suggests having a minimum number of seeds, but doesn't clarify what would happen if all nodes were seeds. There is some mention of seeds being preferred in the gossip protocol, but it is not clear what the consequence would be if all nodes were seeds.
There is no reason as far as I am aware that is is bad to have all nodes as seeds in your list. I'll post some doc links below to give you some background reading but to summarise; the seed nodes are used primarily for bootstrapping. Once a node is running it will maintain a list of nodes it has established gossip with for subsequent startups.
The only disadvantage of having too many is that the proceedure for replacing nodes if they are seed nodes is slightly different:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html
Further background reading: (note some of the older docs although superceded sometimes do contain more lengthy explanations)
http://www.datastax.com/docs/1.0/cluster_architecture/gossip
http://www.datastax.com/documentation/cassandra/1.2/cassandra/initialize/initializeMultipleDS.html

One Cassandra node fails to join the Cassandra cluster

I'm configuring a three node Cassandra cluster which are located in three different machines. I can ping to one another, and ssh too.
I have setup the cassandra cluster in these three machines. Say they are A, B, C where A is the seed. Here, C joins to the seed (A) successfully, and a joining log get printed. When I analyze the cluster via A, I can see that C has joined, and has 66.7% ownership. 'A' has 33.3% ownership. (I have equally divided the tokens.)
But the node B didn't join the cluster. There is no errors printed. The configurations of B, and C are similar except for the listen_address , and rpc_address. I verified the config between these two, and they are similar.
This is probably an issue with network, but I'm not sure whether that's the case. There is no issue gets printed. Any suggestions on things I can try out here? This seems pretty strange. May be this is due to some port issue?
What Cassandra version are you on?
Try shutting down every node and bringing them up one by one. Cassandra prior to 1.1.6 (I think it was that point version) had a problem where nodes sometimes couldn't re-join the ring.
Secondly, make sure every node is configured with the same cluster name and with the same set of seed nodes.

Resources