Workaround for CASSANDRA-12813: NPE in auth for bootstrapping node

Workaround for CASSANDRA-12813: NPE in auth for bootstrapping node - cassandra

I have a cassandra 3.9 production cluster and I am trying to add a node to that cluster. However I am seeing this issue.
CASSANDRA-12813 NPE in auth for bootstrapping node
https://issues.apache.org/jira/browse/CASSANDRA-12813
Shy of upgrading my production cluster to 3.11 (which I may not be able to do immediately), is there a known workaround to this issue?

An undocumented (but works) way around this is to copy "system_auth" directory from another node and place it in the new node's data directory. Start Cassandra only after this step. This way setting up new auth table during bootstrap will be bypassed with the existing content. The content from the system_auth SSTables will not cause any harm, as its a copy of the users/roles belonging to tokens corresponding to that other node. Once copied repair, will take the responsibility of cleaning it up, if the tokens corresponding doesn't belong there.
Once the node has successfully been able to bootstrap, do a "nodetool repair" on system_auth keyspace and it would take care of the exact replica copies.

Related

Scale Cassandra with copying data manually

I created an AMI from my cassandra machine and then launched a new instance. After making config changes(setting the seed node to the first one, and setting auto_bootstrap: false) when I start cassandra and do a nodetool status it shows data on the both the nodes. I just want to know if the cluster actually knows that both nodes have the data and if a request comes can route it to the second node also.
As without manually copying data, the streaming is actually not getting completed. It somehow manages to fail after a certain period of time and then I have to again run 'nodetool bootstrap resume' to restart bootstraping process which again fails.

I don't think this should work this way (all the copying thing).
Why you can't perform normal bootstrapping? What are error messages in the logs when you try to do it? What is RF of your keyspace?
In addition to your data, Cassandra also saves information about the node on disk, all the system tables, for example node id, so you can't just replicate the image. If you copied cassandra image, and just changed config, this wouldn't work, you should delete all data prior to starting the node and joining to cluster.
EDIT:
If you going with auto_bootstrap: off
Remove all the data from the new server (both data and commit log directories).
Start the node, and after it joins, run rebuild.
Run repair after the process is finished.
If you going with auto_bootstrap: on
Remove all the data from the new server (both data and commit log directories).
Start the node and monitor the bootstraping.
Before trying these, remove the node you can't add from the cluster.

Unable to start DSE using SPARK_ENABLED=1

We are running 6 node cluster with:
HADOOP_ENABLED=0
SOLR_ENABLED=0
SPARK_ENABLED=0
CFS_ENABLED=0
Now, we would like to add Spark to all of them. It seems like "adding" is not the right term because this would not fail. Anyways, the steps we've done:
1. drained one of the nodes
2. changed /etc/default/dse to SPARK_ENABLED=1 and HADOOP_ENABLED=0
3. sudo service dse restart
And got the following in the log:
ERROR [main] 2016-05-17 11:51:12,739 CassandraDaemon.java:294 - Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Cannot start node if snitch's data center (Analytics) differs from previous data center (Cassandra). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
There are two related questions that have been already answered:
Unable to start solr aspect of DSE search
Two node DSE spark cluster error setting up second node. Why?
Unfortunately, clearing the data on the node is not an option - why would I do that? I need the data to be intact.
Using "-Dcassandra.ignore_rack=true -Dcassandra.ignore_dc=true" is a bit scary in production. I don't understand why DSE wants to create another DC and why can't it just use the existing one?
I know that according to datastax's doc one should partition the load using different DC for different workloads. In our case we just want to run SPARK jobs on the same nodes that Cassandra is running using the same DC.
Is that possible?
Thanks!

The other answers are correct. The issue here is trying to warn you that you have previously identified this node as being in another DC. This means that it probably doesn't have the right data for any key-spaces with Network Topology Strategy. For example if you had a NTS keyspace which had only one replica in "Cassandra" and changed the DC to "Analytics" you could inadvertently lose all of the data.
This warning and the accompanying flag are telling you that you are doing something that you should not be doing in a production cluster.
The real solution to this is to explicitly name your dc's using GossipingFileSnitch and not rely on SimpleSnitch which names based on the DSE workload.
In this case, switch to GPFS and set the DC name to Cassandra.

Cassandra cluster old data is not replicated in new node

I installed apache cassandra on my local system for testing purpose. With 1 system (1 node) i was able to read/write and query in the database. I added another node and created a cluster. Now the data that I write on my system is replicated on other node and vice versa, but the data which was present on my system earlier to the addition of new node is not replicated. Though the Keyspaces and Tables are present on new node but they are empty. Did I do something wrong while adding the new node to the cluster?

My best guess is that you have auto_bootstrap turned off (it is ON by default). From the documentation:
auto_bootstrap
(Default: true ) This setting has been removed from default configuration. It makes new (non-seed) nodes automatically migrate the right data to themselves. When initializing a fresh cluster with no data, add auto_bootstrap: false.
The easiest way to fix this is to run a repair on the node which will ensure that it gets any data it's missing.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html

Simply run this on new node if the data on it is not up-to-date.
Run $CASSANDRA_HOME/bin/nodetool rebuild
Login as cqlsh and verify that new node has received the data.
Also look into below -
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

If you have correct seeds and replication settings. But still the data is not getting replicated to new data center. Please perform nodetool repair with -full option will replicate data to the new datacenter node.

Cassandra ring configuration

I am trying to connect Apache Cassandra nodes into a ring. They are not Datastax versions, but Cassandra 1.2.8 from the Apache website. When trying to add one as the seed of the other I get following exception:
Unable to find compaction strategy class 'com.datastax.bdp.hadoop.cfs.compaction.CFSCompactionStrategy'
Before that I change the "listen_address" and "rpc_address" to local IP address of each node. Next step I add one IP as a seed to another node. The nodes start up, an exception is printed but both nodes run fine until restart. After restarting either node the exception is printed and nodes do not run.
This is very strange - I do not have any DSE components.

Did you previously use any DSE components? If you did and are using the same data directory on any of your nodes, it may find old column families that were created with this compaction strategy. If you have no data you want in the data directories on all your nodes, you should clear them by stopping all nodes, deleting the directories, then starting the nodes.
Or if you have any DSE nodes still up, they may be joining the new cluster and propagating their schema, so creating column families with this compaction strategy. You can find out by looking in the logs and seeing which nodes try to connect. If any aren't from your 1.2.8 ring then this is probably the cause.

That error means you either had a DSE Analytics node in your ring at some point, or you restored your schema from someplace that had an Analytics node.
I would check if you have the folder /etc/dse/ on your VM, that would mean DSE was installed there.
To just wipe the node and start from scratch schema wise, you can stop the node, remove the /system/schema_* folders, then start the node. When it starts it will have no schema. Re-create any keyspaces/column families you had before, and they will get read from disk.

Adding a new node to existing cluster

Is it possible to add a new node to an existing cluster in cassandra 1.2 without running nodetool cleanup on each individual node once data has been added?
It probably isn't but I need to ask because I'm trying to create an application where each user's machine is a server allowing for endless scaling.
Any advice would be appreciated.

Yes, it is possible. But you should be aware of the side-effects of not doing so.
nodetool cleanup purges keys that are no longer allocated to that node. According to the Apache docs, these keys count against the allocated data for that node, which can cause the auto bootstrap process for the next node to not properly balance the ring. So depending on how you are bringing new user machines into the ring, this may or may not be a problem.
Also keep in mind that nodetool cleanup only needs to be run on nodes that lost keyspace to the new node - i.e. adjacent nodes, not all nodes, in the cluster.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string