How to update configuration of a Cassandra cluster - cassandra

I have a 3 node Cassandra cluster and I want to make some adjustments to the cassandra.yaml
My question is, how should I perform this? One node at a time or is there a way to make it happen without shutting down nodes?
Btw, I am using Cassandra 2.2 and this is a production cluster.

There are multiple approaches here:
If you edit the cassandra.yaml file, you need to restart cassandra to re-read the contents of that file. If you restart all nodes at once, your cluster will be unavailable. Restarting one node at a time is almost always safe (provided you have sane replication-factors and consistency-levels). If your cluster is configured to survive a rack or datacenter outage, then you can safely restart more nodes concurrently.
Many settings can be changed without a restart via JMX, though I don't have a documentation link handy. Changing via JMX WON'T change cassandra.yml though, so you'll need to update that also or your config will revert back to what's in the file when the node restarts.
If you're using DSE, OpsCenter's Lifecycle Manager feature makes updating configs a simple point-and-click affair (disclaimer, I'm biased as I'm an LCM dev).

Related

What is the Correct order to restart a cluster for point-in-time restore?

I have a mixed workload cluster across multiple datacenters. I have ran the sstableloader command for the tables I want to restore using snapshots which I had backed up. I have added commit log files which I had backed up from archive to a restore directory on all nodes. I have updated the commitlog_archiving.properties file with these configs.
What is the correct way and order to restart nodes of my cluster?
Do these considerations apply for restarting as well?
As a general rule, we recommend restarting seed nodes in the DC first before other nodes so gossip propagation happens faster particularly for larger clusters (arbitrarily 15+ nodes). It is important to note that a restart is not required if you restored data using sstableloader.
If you are just performing a rolling restart then the order of the DCs does not matter. But it matters if you are starting up a cluster from a cold shutdown meaning all nodes are down and the cluster is completely offline.
When starting from a cold shutdown, it is important to start with the "Analytics DC" (nodes running in Analytics mode, i.e with Spark enabled) because it makes it easier to elect a Spark master. Assuming that the replication for Analytics keyspaces are configured with the recommended replication factor of 3, you will need to start 2 or 3 nodes beginning with the seeds ideally 1 minute apart because the LeaderManager requires a quorum of nodes to elect a Spark master.
We recommend leaving DCs with nodes running in Search mode (with Solr enabled) last as a matter of convenience so that all the other DCs are operational before the cluster starts accepting Search requests from the application(s). Cheers!
If you've done all of that, I don't think the order matters too much. Although, you should restart your seed nodes first, that way the nodes in the cluster have a common cluster entrypoint to find their way back in and correctly rejoin.

In a clustered environment, if there is items.xml change, is it enough to do Update Running System in one node and just clear cache the other nodes?

In a clustered environment, if there is items.xml change, is it enough to do Update Running System in one node and just clear cache the other nodes?
​Assume that I have Node1 and Node2. If I add an attribute in items.xml and build-and-deploy it to Node1 and Node2, is it enough for me to do Update Running System in Node1, and just Clear Cache Node2? Or, do I also need to do Update Running System in Node2?
Ideally, its good to restart other nodes once you are done with an update running system. If zero downtime is not the case, then take downtime to make on node ready with an update system and then make other nodes ready.
Another way:
Take node1 out of the cluster, get it ready with all configuration, update system, etc. Add it back to the cluster and take all other nodes down. Do deployment on other remaining nodes and make them up. Here you do not need to update the running system or cache clear.
if downtime is really critical for the business, then refer rolling update on the cluster to implement the proper solution.

Cassandra cluster - Migrating all hosts in cluster

I am using Cassandra(3.5) with 20 nodes with data center-1 with 10 nodes and data center-2 with 10 nodes and has huge data. All hosts belong to say legacy hosts. Now we have newer generation hosts say generation-2.
I have tried adding new nodes and decommissioning old node. But this will be tie consuming.
Q1: How can I migrate all hosts from legacy hosts to generation-2 host? What is the best approach for that?
Q2: What will be rollback strategy?
Q3: Finally, How can I validate data once I migrate to generation-2 hosts?
If you just replacing the nodes with newer hardware, keeping the same number of nodes, then it's simple (operations should be done on every node):
prepare the new installation on every node, with configuration identical to existing nodes, but with different IP addresses but don't start the nodes
(optional) disable autocompaction with nodetool disableautocompaction - this could help to execute step 5 faster
copy data from old node to new node using rsync (this could take long time)
execute nodetool drain & stop old node
use rsync to synchronize changes happened since initial copying (it should be relatively fast)
make sure that the old node won't start again (for example, remove Cassandra package) - otherwise it could be a chaos
start the new node
This works because Cassandra node is identified by the UUID that is stored in the local table, so changing of IP doesn't affect the operations.
P.S. In future, if you'll need to replace node (not as described, but completely died), use the procedure of node replacement - in this case, you won't stream data twice, as happened when you do decomissioning and then re-adding node.

Cassndra in Production

Anybody supporting a Cassandra application in production? Curious to know about, how you handle cassandra.yaml file. Also, do you think "seed node" get's a status of master node (partially).
Anybody supporting a Cassandra application in production?
Yes, my team supports several applications which use Cassandra in production.
Curious to know about, how you handle cassandra.yaml file.
By "handle" the cassandra.yaml file, I assume you mean deploy with different values with automation at large scale. We use an open source tool called Rundeck for that.
Rundeck allows you to build options into your jobs, which is useful for properties like cluster_name, seeds, etc. Then, you inject those options into your deploy scripts, using a regex replace (sed) to get them into specific properties in the yaml. Ex:
sed -i "s/cluster_name: 'Test Cluster'/cluster_name: '#cluster_name#'/" cassandra.yaml
Also, do you think "seed node" get's a status of master node (partially).
No, a seed node is not any kind of "master" node.
A seed node is no different from any other node.
In theory, every node in your cluster could be a seed node for another node. All it is, is a way for a new node to discover the network topology of the cluster. Think of it as an entry point to the cluster.

Changing Snitch on live cluster in datastax 4.5

I have 8 nodes in one region and now i want to add new node in other region.Presently i m using ec2snitch ,after adding node to new region i need to change snitchs of all nodes to ec2 multiregion snitch.
Now my question is, does this change will impact my current running cluster? and what would be the best practice for doing this .
Thanks
You should do a rolling restart changing to ec2 multi region snitch before adding the new node. It should not impact your running cluster. Though I would suggest you bring up a test cluster briefly to test making the change.
To perform a rolling restart from Opscenter:
Click Nodes in the left pane.
In the contextual menu select Restart
from the Cluster Actions dropdown.
Set the amount of time to wait after restarting each node, select whether the node should be
drained before stopping, and then click Restart Cluster.
See more details here:
http://www.datastax.com/documentation/opscenter/5.0/opsc/online_help/opscRestartingCluster_t.html
Here is a link to the DataStax documentation for switching snitches. I found that to be useful when I switched to the GossipingPropertiesFileSnitch. I also had to edit cassandra-rackdc.properties on all nodes before doing the rolling restart.
Even though my topology didn't change, I followed the instruction in the reference. Stopped all the nodes, restarted them (start with the seeds), then ran 'nodetool repair' and 'nodetool cleanup' on all nodes.

Resources