Nodetool rebuild not working reliably on Cassandra 3.11.3 - cassandra

I presently have a cassandra 3.11.3 cluster with a single DC. I recently added another dc to my cluster. And I followed the instructions #
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
As per the instructions I ran 'nodetool rebuild -ks -- dc1 on each of the nodes. However this rebuild command did not actually work as intended. My data is partially missing in the new nodes. This I know because I sampled the data in the new dc through my app using consistency local_one. I dont see the data replenish through read repair either. Oh and I should mention that there were no errors in the logs following the rebuild command. So everything appeared to have succeeded.
What am I missing here? Is there a known issue reported on this?

You should run nodetool rebuild --<existing DC> on each node. this command will pull all keyspace data from existing data center based on allocated tokens and RF. To ensure consistency please run full repair on nodes as well.

Related

Removing DC from multi DC cluster in Cassandra

I have a two datacenter site (dc1 and dc2). I am writing with replication of 3 (dc1:3 , dc2:3) on dc1. dc2 is backup site taking no traffic. I upgraded all the nodes of dc2 to C* version 3.11.2. Nodes of dc1 are on C* version 2.1.16. Now due to some issue I have to rollback my upgrade. I have two options
Data backup restore the complete site (dc1 and dc2) - It will cause a lot of data loss.
Remove dc2 from dc1 using steps given here.
Is there any issue in removing a site(dc2) in case of mixed C* versions?
If it were me, I would:
Take DC2 out of replication.
Shutdown nodes on DC2.
Remove the nodes/assassinate them.
Uninstall C* completely.
Wipe the nodes of all data/logs/configuration.
Install C* and reconfigure.
Add nodes to a new DC.
This means there's no data loss by having to restore from backups. Cheers!
Yes, second option is seems to be good and you can recover your data safely. You should remove DC2 datacenter from your existing cluster. As you are saying no traffic on DC2 so it could be easy to performing addition and removal operation.
You need to follow the steps as below.
Change the replication factor of keyspaces.
Stop the Cassandra services on DC2.
You can remove the nodes from existing cluster via nodetool removenode command if it is creating a issue you can use assassinate.
Once node removed from the cluster one by one you need to uninstall the Cassandra there.
Remove existing data on removed node completely.
Then, you need to install fresh Cassandra there based on previous configuration, you can refer config files from existing cluster or you took a backup for you config on 2.1.16
Now, you need to add your datacenter again on cluster.
In this way, you can easily get your datacenter and data quickly.
You can refer the documentation here for any confusion in addition
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsAddDCToClusterDesigDC.html

Which are reasons restart cassandra's cluster?

I just have one reason to restart cluster below :
All the nodes have the same hardware configuration
1. When i update file cassandra.yaml
Are there other reasons ?
The thing you are asking for is Rolling Restart a cassandra cluster. There are so many reason to restart a cassandra cluster. I'm just mentioning some below-
when you update any value in cassandra.yaml. (As you mentioned above)
When your nodetool got stucked somehow. such as- you gave command nodetool repair and cancelled the command but it got stucked behind, then you won't be able to give another nodetool repair command.
When you are adding a new node to cluster and you got stream_failed due to nproc limit. That time your running cluster nodes could be down to this issue and going to hold the status.
When you don't want to use sstableloader and you need to restore your data from snapshots. That time you need to provide your snapshots to the data_directory on each node and rolling restart.
When you are about to upgrade your cassandra_version.
For example when you upgrading Cassandra version.

Temporarily change multi node to single node

I have configured cassandra 3.0.9 on 3 nodes but I have to use only 1 node for sometime. I have disconnected other 2 nodes from network also removed the entries of both the nodes from Cassandra.yaml, rackdc and topology files.
When I check node tool status it shows me both the down nodes. When I try to execute any query on cqlsh it gives me below error:
Blockquote
OperationTimedOut: errors={'127.0.0.1': 'Request timed out while waiting for schema agreement. See Session.execute_async and Cluster.max_schema_agreement_wait.'}, last_host=127.0.0.1
Blockquote
Warning: schema version mismatch detected; check the schema versions of your nodes in system.local and system.peers.
How I can resolve this?
That's not how you remove a node from a Cassandra cluster. In fact, what you're doing is quite dangerous. Typically, you'd use nodetool decommission. If your other two nodes are still intact and just offline, I suggest bringing them back online temporarily and let decommission do its thing.
I'm going to also throw this out there - it's possible you're missing a good portion of your data with the steps you did above unless all keyspaces had RF=3. Cassandra distributes data evenly between the nodes in a respective DC. The decommission step I mention above redistributes the data.
Now if you don't have the other 2 nodes to run a nodetool decommission, you may have to remove the node with nodetool removenode and in the worst case, nodetool assassinate.
Check these docs for reference and the full steps to removing a node: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddingRemovingNodeTOC.html

Unable to start DSE using SPARK_ENABLED=1

We are running 6 node cluster with:
HADOOP_ENABLED=0
SOLR_ENABLED=0
SPARK_ENABLED=0
CFS_ENABLED=0
Now, we would like to add Spark to all of them. It seems like "adding" is not the right term because this would not fail. Anyways, the steps we've done:
1. drained one of the nodes
2. changed /etc/default/dse to SPARK_ENABLED=1 and HADOOP_ENABLED=0
3. sudo service dse restart
And got the following in the log:
ERROR [main] 2016-05-17 11:51:12,739 CassandraDaemon.java:294 - Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Cannot start node if snitch's data center (Analytics) differs from previous data center (Cassandra). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
There are two related questions that have been already answered:
Unable to start solr aspect of DSE search
Two node DSE spark cluster error setting up second node. Why?
Unfortunately, clearing the data on the node is not an option - why would I do that? I need the data to be intact.
Using "-Dcassandra.ignore_rack=true -Dcassandra.ignore_dc=true" is a bit scary in production. I don't understand why DSE wants to create another DC and why can't it just use the existing one?
I know that according to datastax's doc one should partition the load using different DC for different workloads. In our case we just want to run SPARK jobs on the same nodes that Cassandra is running using the same DC.
Is that possible?
Thanks!
The other answers are correct. The issue here is trying to warn you that you have previously identified this node as being in another DC. This means that it probably doesn't have the right data for any key-spaces with Network Topology Strategy. For example if you had a NTS keyspace which had only one replica in "Cassandra" and changed the DC to "Analytics" you could inadvertently lose all of the data.
This warning and the accompanying flag are telling you that you are doing something that you should not be doing in a production cluster.
The real solution to this is to explicitly name your dc's using GossipingFileSnitch and not rely on SimpleSnitch which names based on the DSE workload.
In this case, switch to GPFS and set the DC name to Cassandra.

How to repair cassandra. I am not sure how to use the nodetool after changing snitch

I am an intern with not much experience so I am sure you guys can help me out here a bit. I get all types of nodetool syntax from google and I am not sure how to use it and when i should use it.
My 3 node cluster was only showing 1 node on the ring. So i changed the snitch to rackInferring and restarted cassandra. Now do I have to run nodetool repair? How?
I am concerned that nodetool ring isn't seeing all of your nodes. Fix that first before running nodetool repair. Otherwise, nodetool repair has no other nodes to talk to and will not completely correctly.
To get your ring intact, make sure all of your nodes are configured to talk gossip on the same network. Make sure they all have at least one seed node which is configured the same on all nodes and is up. Make sure they all have the same cluster name. If that fails, bring them all down and then bring them back up one by one.

Resources