Getting schema version mismatch error while trying to migrate from DSE Cassandra(6.0.5) to Apache Cassandra(3.11.3) - cassandra

We are trying to migrate/replicate data from a DSE Cassandra node to apache Cassandra node.
Have done POC of that on the local machine and getting a schema version problem.
Below are the details of the POC and the occurring problem.
DSE Cassandra node Details:
dse_version: 6.0.5
release_version: 4.0.0.605
sstable format:
aa-1-bti-CompressionInfo.db
aa-1-bti-Digest.crc32
aa-1-bti-Partitions.db
aa-1-bti-Statistics.db
aa-1-bti-Data.db
aa-1-bti-Filter.db
aa-1-bti-Rows.db
aa-1-bti-TOC.txt
Apache Cassandra node Details:
release_version: 3.11.3
sstable format:
mc-1-big-CompressionInfo.db
mc-1-big-Digest.crc32
mc-1-big-Statistics.db
mc-1-big-Data.db
mc-1-big-Filter.db
mc-1-big-TOC.txt
mc-1-big-Summary.db
There is 1 cluster(My Cluster) in which I have a total of 4 nodes.
2 dse nodes(lets say DSE1 and DSE2) are in one Datacenter(i.e., dc1).
2 apache nodes(lets say APC1 and APC2) are in other Datacenter(i.e., dc2).
Note: I have used NetworkTopologyStrategy Topology Strategy for keyspace and GossipingPropertyFileSnitch as endpoint_snitch. Have added
$JVM_OPTS -Dcassandra.allow_unsafe_replace=true in cassandra-env.sh file also.
When I am creating keyspace on DSE1 node with following CQL query:
CREATE KEYSPACE abc
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'dc1' : 2,
'dc2' : 2
}
AND DURABLE_WRITES = true;
The keyspace is getting created on DSE2 node but the following error is thrown by cqlsh:
Warning: schema version mismatch detected; check the schema versions of your nodes in system.local and system.peers
Also the following error is thrown from one of the 2 Apache nodes(APC1/APC2):
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for cfId 02559ab1-91ee-11ea-8450-2df21166f6a4. If a table was just created, this is likely due to the schema not being fully propagated. Please wait for schema agreement on table creation
Have checked the schema version also on all 4 nodes, getting below result:
Cluster Information:
Name: My Cluster
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
84c22c85-8165-398f-ab9a-e25a6169b7d3: [127.0.0.4, 127.0.0.6]
4c451173-5a05-3691-9a14-520419f849da: [127.0.0.5, 127.0.0.7]
Have tried to resolve the same using solution given in the following link:
https://myadventuresincoding.wordpress.com/2019/04/03/cassandra-fix-schema-disagreement/
But the problem still persists.
Also, Can we migrate data from DSE Cassandra node to apache Cassandra node naturally as suggested in the following link:
Migrate Datastax Enterprise Cassandra to Apache Cassandra
Can anyone please suggest, how to overcome the problem. Is there any other upgradation or compatibility fix we need to implement to resolve this.

Related

Upgraded to Cassandra Java driver 4.x, getting error "You specified dc1 as the local DC, but some contact points are from a different DC"

We are running version datastax java driver 1.9. And our config is like this (this works):
cassandra.contactpoints=tfi-db-ddac-001.tfi.myCompany.net,tfi-db-ddac-002.tfi.myCompany.net
cassandra.username=username
cassandra.password=password
cassandra.keyspace.create=CREATE KEYSPACE myKeySpace WITH replication = {'class' : 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 1};
cassandra.keyspace_log.create=CREATE KEYSPACE myKeySpace_log WITH replication = {'class' : 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 1};
cassandra.log_entries.write_consistency_level=TWO
cassandra.metric_monitor.write_consistency_level=TWO
cassandra.app_tracking.write_consistency_level=TWO
With this and update the dependencies to 4.x:
try (CqlSession session = CqlSession.builder()
.addContactPoint(new InetSocketAddress("tfi-db-ddac-001.tfi.myCompany.net", 9042))
.addContactPoint(new InetSocketAddress("tfi-db-ddac-002.tfi.myCompany.net", 9042))
.withLocalDatacenter("dc1")
.withAuthCredentials("username","password")
.build()) {
I get always this error:
You specified dc1 as the local DC, but some contact points are from a different DC: Node(endPoint=tfi-db-ddac-002.tfi.myCompany.net/10.8.64.97:9042, hostId=43e3df16-1e44-4aff-b0ac-2fee0e17ace5, hashCode=2043258b)=ggi-l, Node(endPoint=tfi-db-ddac-001.tfi.myCompany.net/10.8.64.95:9042, hostId=4d5b9290-8c92-4f1a-b348-51d42d439e2b, hashCode=7f1da1b2)=ggi-s; please provide the correct local DC, or check your contact points
Can someone gives me please an advice/example how to migrate this to the 4.x Cassandra.
Thanks in advance!
So this:
'DC1' : 1
And this:
.withLocalDatacenter("dc1")
Are not using the same datacenter name. Remember, Cassandra is case-sensitive. As the older driver config was working, I'm guessing that the datacenter name should be uppercase DC1. Although, you can always check with nodetool status just to be sure.
Version 4 of the Cassandra Java driver was refactored from the ground up so it is not binary-compatible with older versions including the DSE releases of the driver.
One of the biggest differences compared to older versions of the Java driver is that the built-in load-balancing policies (DefaultLoadBalancingPolicy and DcInferringLoadBalancingPolicy) will only ever connect to just ONE datacenter. For this reason, the driver will only accept contact points which belong to the local DC configured (if using the default load-balancing policy).
During the initialisation phase, the driver explicitly checks the DCs of the configured contact points (see OptionalLocalDcHelper.checkLocalDatacenterCompatibility()). When the driver detects a "bad" contact point(s), it will log a WARN message with instructions to either (a) provide the correct local DC, or (b) check the list of contact points.
It doesn't matter whether none of the cluster DCs are "local" to your application instances -- the driver will still alert you when it detects this unrecommended configuration. Since it is logged as a warning (WARN), your application will still work but the driver will never include remote nodes in the query plan.
For more information, see Load balancing with the Java driver. Cheers!

sqoop job hangs on database enterprise 4.8.7

Have 6 nodes cassandra cluster from which one node is running in analytics mode and rest other in search modes.
Using DSE sqooop to load data from oracle 11g database to cassandra sample keyspace using command:
dse sqoop cql-import --connect jdbc:oracle:thin:#hostname:port:servicename --username --password --table TEST --cassandra-keyspace test --cassandra-table test_table --cassandra-column-mapping id:ID,name:NAME --cassandra_host --verbose
Note: /tmp/sqoop-cassandra/compile/87h70484m9mfkfl79/TEST.java used or overrides a deprecated API .
Note : Recompile with -Xlint:deprecation for details.
Job keep on above state with no other output or errors and stays forever.
When we check on oracle database side session on oracle side stays inactive with waiting event sqlnet message from client .
Table structure is very simple with two columns ( test on oracle and test_table on cassandra)
Table structure in oracle side : id number( primary key) , name varchar()
Table structure on Cassandra : id int ( primary key ), name text
Keyspace defination is network topology with one replication factor on the nodes which is running as analytics node.
I have spend couple of days to find the reason of this issues - why job hangs and session on oracle side remain in-active. Kindly help on this issues

Is it mandatory for all nodes of cassandra cluster to have same cluster name?

Cassandra version 2.1.8
Is it mandatory for all nodes of cassandra cluster to have same cluster name ?
Answer is YES. Otherwise, you'll get the following error.
Example ERROR for different cluster_name:
ERROR [main] 2014-02-25 01:51:17,377 CassandraDaemon.java (line 237) Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Saved cluster name Test Cluster != configured name thisisstupid
at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:542)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:233)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:462)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:552)
It is mandatory to have the same cluster_name for every single node in a cluster.
All nodes in the same cluster should have same cluster name. It's madatory.

Connecting to Cassandra with Spark

First, I have bought the new O'Reilly Spark book and tried those Cassandra setup instructions. I've also found other stackoverflow posts and various posts and guides over the web. None of them work as-is. Below is as far as I could get.
This is a test with only a handful of records of dummy test data. I am running the most recent Cassandra 2.0.7 Virtual Box VM provided by plasetcassandra.org linked from the main Cassandra project page.
I downloaded Spark 1.2.1 source and got the latest Cassandra Connector code from github and built both against Scala 2.11. I have JDK 1.8.0_40 and Scala 2.11.6 setup on Mac OS 10.10.2.
I run the spark shell with the cassandra connector loaded:
bin/spark-shell --driver-class-path ../spark-cassandra-connector/spark-cassandra-connector/target/scala-2.11/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar
Then I do what should be a simple row count type test on a test table of four records:
import com.datastax.spark.connector._
sc.stop
val conf = new org.apache.spark.SparkConf(true).set("spark.cassandra.connection.host", "192.168.56.101")
val sc = new org.apache.spark.SparkContext(conf)
val table = sc.cassandraTable("mykeyspace", "playlists")
table.count
I get the following error. What is confusing is that it is getting errors trying to find Cassandra at 127.0.0.1, but it also recognizes the host name that I configured which is 192.168.56.101.
15/03/16 15:56:54 INFO Cluster: New Cassandra host /192.168.56.101:9042 added
15/03/16 15:56:54 INFO CassandraConnector: Connected to Cassandra cluster: Cluster on a Stick
15/03/16 15:56:54 ERROR ServerSideTokenRangeSplitter: Failure while fetching splits from Cassandra
java.io.IOException: Failed to open thrift connection to Cassandra at 127.0.0.1:9160
<snip>
java.io.IOException: Failed to fetch splits of TokenRange(0,0,Set(CassandraNode(/127.0.0.1,/127.0.0.1)),None) from all endpoints: CassandraNode(/127.0.0.1,/127.0.0.1)
BTW, I can also use a configuration file at conf/spark-defaults.conf to do the above without having to close/recreate a spark context or pass in the --driver-clas-path argument. I ultimately hit the same error though, and the above steps seem easier to communicate in this post.
Any ideas?
Check the rpc_address config in your cassandra.yaml file on your cassandra node. It's likely that the spark connector is using that value from the system.local/system.peers tables and it may be set to 127.0.0.1 in your cassandra.yaml.
The spark connector uses thrift to get token range splits from cassandra. Eventually I'm betting this will be replaced as C* 2.1.4 has a new table called system.size_estimates (CASSANDRA-7688). It looks like it's getting the host metadata to find the nearest host and then making the query using thrift on port 9160.

Cassandra Streaming error - Unknown keyspace system_traces

In our dev cluster, which has been running smooth before, when we replace a node (which we have been doing constantly) the following failure occurs and prevents the replacement node from joining.
cassandra version is 2.0.7
What can be done about it?
ERROR [STREAM-IN-/10.128.---.---] 2014-11-19 12:35:58,007 StreamSession.java (line 420) [Stream #9cad81f0-6fe8-11e4-b575-4b49634010a9] Streaming error occurred
java.lang.AssertionError: Unknown keyspace system_traces
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:260)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:239)
at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:368)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:289)
at java.lang.Thread.run(Thread.java:745)
I got the same error while I was trying to setup my cluster, and as I was experimenting with different switches in cassandra.yaml, I restarted the service multiple times and removed the system dir under data directory (/var/lib/cassandra/data as mentioned here).
I guess for some reason cassandra tries to load system_traces keyspace and fails (the other dir under /var/lib/cassandra/data), and nodetool throws this error. You can just remove both system and system_traces before starting cassandra service, or even better delete all content of bommitlog, data and savedcache there.
This works obviously if you dont have any data just yet in the system.

Resources