Is it mandatory for all nodes of cassandra cluster to have same cluster name?

Is it mandatory for all nodes of cassandra cluster to have same cluster name? - cassandra

Cassandra version 2.1.8
Is it mandatory for all nodes of cassandra cluster to have same cluster name ?

Answer is YES. Otherwise, you'll get the following error.
Example ERROR for different cluster_name:
ERROR [main] 2014-02-25 01:51:17,377 CassandraDaemon.java (line 237) Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Saved cluster name Test Cluster != configured name thisisstupid
at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:542)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:233)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:462)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:552)
It is mandatory to have the same cluster_name for every single node in a cluster.

All nodes in the same cluster should have same cluster name. It's madatory.

Related

Getting schema version mismatch error while trying to migrate from DSE Cassandra(6.0.5) to Apache Cassandra(3.11.3)

We are trying to migrate/replicate data from a DSE Cassandra node to apache Cassandra node.
Have done POC of that on the local machine and getting a schema version problem.
Below are the details of the POC and the occurring problem.
DSE Cassandra node Details:
dse_version: 6.0.5
release_version: 4.0.0.605
sstable format:
aa-1-bti-CompressionInfo.db
aa-1-bti-Digest.crc32
aa-1-bti-Partitions.db
aa-1-bti-Statistics.db
aa-1-bti-Data.db
aa-1-bti-Filter.db
aa-1-bti-Rows.db
aa-1-bti-TOC.txt
Apache Cassandra node Details:
release_version: 3.11.3
sstable format:
mc-1-big-CompressionInfo.db
mc-1-big-Digest.crc32
mc-1-big-Statistics.db
mc-1-big-Data.db
mc-1-big-Filter.db
mc-1-big-TOC.txt
mc-1-big-Summary.db
There is 1 cluster(My Cluster) in which I have a total of 4 nodes.
2 dse nodes(lets say DSE1 and DSE2) are in one Datacenter(i.e., dc1).
2 apache nodes(lets say APC1 and APC2) are in other Datacenter(i.e., dc2).
Note: I have used NetworkTopologyStrategy Topology Strategy for keyspace and GossipingPropertyFileSnitch as endpoint_snitch. Have added
$JVM_OPTS -Dcassandra.allow_unsafe_replace=true in cassandra-env.sh file also.
When I am creating keyspace on DSE1 node with following CQL query:
CREATE KEYSPACE abc
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'dc1' : 2,
'dc2' : 2
}
AND DURABLE_WRITES = true;
The keyspace is getting created on DSE2 node but the following error is thrown by cqlsh:
Warning: schema version mismatch detected; check the schema versions of your nodes in system.local and system.peers
Also the following error is thrown from one of the 2 Apache nodes(APC1/APC2):
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for cfId 02559ab1-91ee-11ea-8450-2df21166f6a4. If a table was just created, this is likely due to the schema not being fully propagated. Please wait for schema agreement on table creation
Have checked the schema version also on all 4 nodes, getting below result:
Cluster Information:
Name: My Cluster
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
84c22c85-8165-398f-ab9a-e25a6169b7d3: [127.0.0.4, 127.0.0.6]
4c451173-5a05-3691-9a14-520419f849da: [127.0.0.5, 127.0.0.7]
Have tried to resolve the same using solution given in the following link:
https://myadventuresincoding.wordpress.com/2019/04/03/cassandra-fix-schema-disagreement/
But the problem still persists.
Also, Can we migrate data from DSE Cassandra node to apache Cassandra node naturally as suggested in the following link:
Migrate Datastax Enterprise Cassandra to Apache Cassandra
Can anyone please suggest, how to overcome the problem. Is there any other upgradation or compatibility fix we need to implement to resolve this.

How to print out Spark connection of Spark session ?

Suppose I run pyspark command and got global variable spark of type SparkSession. As I understand, this spark holds a connection to the Spark master. Can I print out the details of this connection including the hostname of this Spark master ?

For basic information you can use master property:
spark.sparkContext.master
To get details on YARN you might have to dig through hadoopConfiguration:
hadoopConfiguration = spark.sparkContext._jsc.hadoopConfiguration()
hadoopConfiguration.get("yarn.resourcemanager.hostname")
or
hadoopConfiguration.get("yarn.resourcemanager.address")
When submitted to YARN Spark uses Hadoop configuration to determine the resource manger so these values should match ones present in configuration placed in HADOOP_CONF_DIR or YARN_CONF_DIR.

Connecting to Cassandra with Spark

First, I have bought the new O'Reilly Spark book and tried those Cassandra setup instructions. I've also found other stackoverflow posts and various posts and guides over the web. None of them work as-is. Below is as far as I could get.
This is a test with only a handful of records of dummy test data. I am running the most recent Cassandra 2.0.7 Virtual Box VM provided by plasetcassandra.org linked from the main Cassandra project page.
I downloaded Spark 1.2.1 source and got the latest Cassandra Connector code from github and built both against Scala 2.11. I have JDK 1.8.0_40 and Scala 2.11.6 setup on Mac OS 10.10.2.
I run the spark shell with the cassandra connector loaded:
bin/spark-shell --driver-class-path ../spark-cassandra-connector/spark-cassandra-connector/target/scala-2.11/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar
Then I do what should be a simple row count type test on a test table of four records:
import com.datastax.spark.connector._
sc.stop
val conf = new org.apache.spark.SparkConf(true).set("spark.cassandra.connection.host", "192.168.56.101")
val sc = new org.apache.spark.SparkContext(conf)
val table = sc.cassandraTable("mykeyspace", "playlists")
table.count
I get the following error. What is confusing is that it is getting errors trying to find Cassandra at 127.0.0.1, but it also recognizes the host name that I configured which is 192.168.56.101.
15/03/16 15:56:54 INFO Cluster: New Cassandra host /192.168.56.101:9042 added
15/03/16 15:56:54 INFO CassandraConnector: Connected to Cassandra cluster: Cluster on a Stick
15/03/16 15:56:54 ERROR ServerSideTokenRangeSplitter: Failure while fetching splits from Cassandra
java.io.IOException: Failed to open thrift connection to Cassandra at 127.0.0.1:9160
<snip>
java.io.IOException: Failed to fetch splits of TokenRange(0,0,Set(CassandraNode(/127.0.0.1,/127.0.0.1)),None) from all endpoints: CassandraNode(/127.0.0.1,/127.0.0.1)
BTW, I can also use a configuration file at conf/spark-defaults.conf to do the above without having to close/recreate a spark context or pass in the --driver-clas-path argument. I ultimately hit the same error though, and the above steps seem easier to communicate in this post.
Any ideas?

Check the rpc_address config in your cassandra.yaml file on your cassandra node. It's likely that the spark connector is using that value from the system.local/system.peers tables and it may be set to 127.0.0.1 in your cassandra.yaml.
The spark connector uses thrift to get token range splits from cassandra. Eventually I'm betting this will be replaced as C* 2.1.4 has a new table called system.size_estimates (CASSANDRA-7688). It looks like it's getting the host metadata to find the nearest host and then making the query using thrift on port 9160.

Cassandra Streaming error - Unknown keyspace system_traces

In our dev cluster, which has been running smooth before, when we replace a node (which we have been doing constantly) the following failure occurs and prevents the replacement node from joining.
cassandra version is 2.0.7
What can be done about it?
ERROR [STREAM-IN-/10.128.---.---] 2014-11-19 12:35:58,007 StreamSession.java (line 420) [Stream #9cad81f0-6fe8-11e4-b575-4b49634010a9] Streaming error occurred
java.lang.AssertionError: Unknown keyspace system_traces
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:260)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:239)
at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:368)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:289)
at java.lang.Thread.run(Thread.java:745)

I got the same error while I was trying to setup my cluster, and as I was experimenting with different switches in cassandra.yaml, I restarted the service multiple times and removed the system dir under data directory (/var/lib/cassandra/data as mentioned here).
I guess for some reason cassandra tries to load system_traces keyspace and fails (the other dir under /var/lib/cassandra/data), and nodetool throws this error. You can just remove both system and system_traces before starting cassandra service, or even better delete all content of bommitlog, data and savedcache there.
This works obviously if you dont have any data just yet in the system.

cassandra sstable-loader error: "Got an unknow host from describe_ring()"

I am trying to load sstables to cassandra cluster of two nodes with sstable-loader utility provided in cassandra 0.8.4
1) I have loaded the data successfully on single node environment .
2) As i have created the cluster of two nodes ,while loading ,after gossip it throws exception
java.lang.RuntimeException: Got an unknow host from describe_ring()

This is a bug in 0.8.4 (https://issues.apache.org/jira/browse/CASSANDRA-3044). It's fixed in 0.8.5; you can test that by following the link on the release thread here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string