Changing Snitch on live cluster in datastax 4.5 - cassandra

I have 8 nodes in one region and now i want to add new node in other region.Presently i m using ec2snitch ,after adding node to new region i need to change snitchs of all nodes to ec2 multiregion snitch.
Now my question is, does this change will impact my current running cluster? and what would be the best practice for doing this .
Thanks

You should do a rolling restart changing to ec2 multi region snitch before adding the new node. It should not impact your running cluster. Though I would suggest you bring up a test cluster briefly to test making the change.

To perform a rolling restart from Opscenter:
Click Nodes in the left pane.
In the contextual menu select Restart
from the Cluster Actions dropdown.
Set the amount of time to wait after restarting each node, select whether the node should be
drained before stopping, and then click Restart Cluster.
See more details here:
http://www.datastax.com/documentation/opscenter/5.0/opsc/online_help/opscRestartingCluster_t.html

Here is a link to the DataStax documentation for switching snitches. I found that to be useful when I switched to the GossipingPropertiesFileSnitch. I also had to edit cassandra-rackdc.properties on all nodes before doing the rolling restart.
Even though my topology didn't change, I followed the instruction in the reference. Stopped all the nodes, restarted them (start with the seeds), then ran 'nodetool repair' and 'nodetool cleanup' on all nodes.

Related

How to update configuration of a Cassandra cluster

I have a 3 node Cassandra cluster and I want to make some adjustments to the cassandra.yaml
My question is, how should I perform this? One node at a time or is there a way to make it happen without shutting down nodes?
Btw, I am using Cassandra 2.2 and this is a production cluster.
There are multiple approaches here:
If you edit the cassandra.yaml file, you need to restart cassandra to re-read the contents of that file. If you restart all nodes at once, your cluster will be unavailable. Restarting one node at a time is almost always safe (provided you have sane replication-factors and consistency-levels). If your cluster is configured to survive a rack or datacenter outage, then you can safely restart more nodes concurrently.
Many settings can be changed without a restart via JMX, though I don't have a documentation link handy. Changing via JMX WON'T change cassandra.yml though, so you'll need to update that also or your config will revert back to what's in the file when the node restarts.
If you're using DSE, OpsCenter's Lifecycle Manager feature makes updating configs a simple point-and-click affair (disclaimer, I'm biased as I'm an LCM dev).

Unable to start DSE using SPARK_ENABLED=1

We are running 6 node cluster with:
HADOOP_ENABLED=0
SOLR_ENABLED=0
SPARK_ENABLED=0
CFS_ENABLED=0
Now, we would like to add Spark to all of them. It seems like "adding" is not the right term because this would not fail. Anyways, the steps we've done:
1. drained one of the nodes
2. changed /etc/default/dse to SPARK_ENABLED=1 and HADOOP_ENABLED=0
3. sudo service dse restart
And got the following in the log:
ERROR [main] 2016-05-17 11:51:12,739 CassandraDaemon.java:294 - Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Cannot start node if snitch's data center (Analytics) differs from previous data center (Cassandra). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
There are two related questions that have been already answered:
Unable to start solr aspect of DSE search
Two node DSE spark cluster error setting up second node. Why?
Unfortunately, clearing the data on the node is not an option - why would I do that? I need the data to be intact.
Using "-Dcassandra.ignore_rack=true -Dcassandra.ignore_dc=true" is a bit scary in production. I don't understand why DSE wants to create another DC and why can't it just use the existing one?
I know that according to datastax's doc one should partition the load using different DC for different workloads. In our case we just want to run SPARK jobs on the same nodes that Cassandra is running using the same DC.
Is that possible?
Thanks!
The other answers are correct. The issue here is trying to warn you that you have previously identified this node as being in another DC. This means that it probably doesn't have the right data for any key-spaces with Network Topology Strategy. For example if you had a NTS keyspace which had only one replica in "Cassandra" and changed the DC to "Analytics" you could inadvertently lose all of the data.
This warning and the accompanying flag are telling you that you are doing something that you should not be doing in a production cluster.
The real solution to this is to explicitly name your dc's using GossipingFileSnitch and not rely on SimpleSnitch which names based on the DSE workload.
In this case, switch to GPFS and set the DC name to Cassandra.

Cassandra new node bootstrapping while it should not

I'm adding a new datacenter to an existing cluster and I'm following this "procedure".
However the first node I start is apparently bootstrapping: the load information from nodetool status keeps growing...
I added
auto_boostrap: false
in cassandra.yaml.
Am I missing something?
By adding auto_bootstrap: false , you tell the node not to bootstrap - that DOESNT tell it not to take any writes. What is the replication settings for the new datacenter? Did you already enable it in the various keyspaces? If so, it will participate in writes.
When you say you see the load increasing, is it streaming? Do you see files being transferred in nodetool netstats? Is the node Up, Normal or Up, Joining?
check your cassandra-rackdc.properties and don't forget to change DC(dc=DC1) name in that file if you are creating a new datacenter otherwise it will consider the new node as of same datacenter.

Adding new node to Cassandra cluster

I have a 4 node cluster and will be adding an additional node in two days. We aren't using vnodes.
Just wondering the best way to rebalance the cluster after I'm done. Do I just bring the new node up and then start the nodetool move?
Or do I shut each node down, change the initial_token value for each one (using one of those generators to calculate the values for me) and then bring each node up?
I just want to know the simplest way to do this from command line. The new node already has Cassandra installed as it was initially a non-production server, I will delete the data off of the node and change the config files accordingly for the new cluster it will now be a part of, just unsure as to the other steps.
From this page Adding or replacing single-token nodes, the simplest mechanism is to start the new node with it's initial-token left empty in cassandra.yaml. This will make the cluster 'split the token range of the heaviest loaded node and position the new node there'. This won't give you a balanced cluster.
If you want a balanced cluster then you have to go through the nodetool move, node restart, nodetool cleanup procedure you mentioned.

Adding a new node to existing cluster

Is it possible to add a new node to an existing cluster in cassandra 1.2 without running nodetool cleanup on each individual node once data has been added?
It probably isn't but I need to ask because I'm trying to create an application where each user's machine is a server allowing for endless scaling.
Any advice would be appreciated.
Yes, it is possible. But you should be aware of the side-effects of not doing so.
nodetool cleanup purges keys that are no longer allocated to that node. According to the Apache docs, these keys count against the allocated data for that node, which can cause the auto bootstrap process for the next node to not properly balance the ring. So depending on how you are bringing new user machines into the ring, this may or may not be a problem.
Also keep in mind that nodetool cleanup only needs to be run on nodes that lost keyspace to the new node - i.e. adjacent nodes, not all nodes, in the cluster.

Resources