Upgraded to Cassandra Java driver 4.x, getting error "You specified dc1 as the local DC, but some contact points are from a different DC" - cassandra

We are running version datastax java driver 1.9. And our config is like this (this works):
cassandra.contactpoints=tfi-db-ddac-001.tfi.myCompany.net,tfi-db-ddac-002.tfi.myCompany.net
cassandra.username=username
cassandra.password=password
cassandra.keyspace.create=CREATE KEYSPACE myKeySpace WITH replication = {'class' : 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 1};
cassandra.keyspace_log.create=CREATE KEYSPACE myKeySpace_log WITH replication = {'class' : 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 1};
cassandra.log_entries.write_consistency_level=TWO
cassandra.metric_monitor.write_consistency_level=TWO
cassandra.app_tracking.write_consistency_level=TWO
With this and update the dependencies to 4.x:
try (CqlSession session = CqlSession.builder()
.addContactPoint(new InetSocketAddress("tfi-db-ddac-001.tfi.myCompany.net", 9042))
.addContactPoint(new InetSocketAddress("tfi-db-ddac-002.tfi.myCompany.net", 9042))
.withLocalDatacenter("dc1")
.withAuthCredentials("username","password")
.build()) {
I get always this error:
You specified dc1 as the local DC, but some contact points are from a different DC: Node(endPoint=tfi-db-ddac-002.tfi.myCompany.net/10.8.64.97:9042, hostId=43e3df16-1e44-4aff-b0ac-2fee0e17ace5, hashCode=2043258b)=ggi-l, Node(endPoint=tfi-db-ddac-001.tfi.myCompany.net/10.8.64.95:9042, hostId=4d5b9290-8c92-4f1a-b348-51d42d439e2b, hashCode=7f1da1b2)=ggi-s; please provide the correct local DC, or check your contact points
Can someone gives me please an advice/example how to migrate this to the 4.x Cassandra.
Thanks in advance!

So this:
'DC1' : 1
And this:
.withLocalDatacenter("dc1")
Are not using the same datacenter name. Remember, Cassandra is case-sensitive. As the older driver config was working, I'm guessing that the datacenter name should be uppercase DC1. Although, you can always check with nodetool status just to be sure.

Version 4 of the Cassandra Java driver was refactored from the ground up so it is not binary-compatible with older versions including the DSE releases of the driver.
One of the biggest differences compared to older versions of the Java driver is that the built-in load-balancing policies (DefaultLoadBalancingPolicy and DcInferringLoadBalancingPolicy) will only ever connect to just ONE datacenter. For this reason, the driver will only accept contact points which belong to the local DC configured (if using the default load-balancing policy).
During the initialisation phase, the driver explicitly checks the DCs of the configured contact points (see OptionalLocalDcHelper.checkLocalDatacenterCompatibility()). When the driver detects a "bad" contact point(s), it will log a WARN message with instructions to either (a) provide the correct local DC, or (b) check the list of contact points.
It doesn't matter whether none of the cluster DCs are "local" to your application instances -- the driver will still alert you when it detects this unrecommended configuration. Since it is logged as a warning (WARN), your application will still work but the driver will never include remote nodes in the query plan.
For more information, see Load balancing with the Java driver. Cheers!

Related

Cassandra Java driver reports "Some contact points don't match local data center"

I had 3 node on-prem cassandra cluster with 1 node on each DC, Total 3 Data centers.(DC1,DC2,DC3)
Now I'm migrating from On-prem to GCP, So I have created another 3 DC's (DC4,DC5,DC6) on GCP and replicated the data from on-prem, After that i have made on-prem node's down(DC1,DC2,DC3) and cluster suppose to run with GCP nodes only present on (DC4,DC5,DC6).
I'm facing below error.
2022-08-11 14:33:24.199 INFO 1 --- [ main] c.d.d.c.NettyUtil : Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
2022-08-11 14:33:25.178 INFO 1 --- [ main] c.d.d.c.p.DCAwareRoundRobinPolicy : Using data-center name 'DC3' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
2022-08-11 14:33:25.179 WARN 1 --- [ main] c.d.d.c.p.DCAwareRoundRobinPolicy : Some contact points don't match local data center. Local DC = DC3. Non-conforming contact points: /1.2.3.4:9042 (DC1),/1.2.3.7:9042 (DC4),/1.2.3.9:9042 (DC6),/1.2.3.5:9042 (DC2),/1.2.3.8:9042 (DC5)
I have given only DC4,DC5,DC6 node ip's as contact points for Application.
my cassandra driver connection code:
public class CassandraCluster {
private static final int USED_HOSTS_PER_REMOTE_DC = 2;
final QueryOptions queryOptions = new QueryOptions();
queryOptions.setConsistencyLevel(ConsistencyLevel.QUORUM)
.setSerialConsistencyLevel(ConsistencyLevel.SERIAL);
final DCAwareRoundRobinPolicy dCAwareRoundRobinPolicy = DCAwareRoundRobinPolicy.builder()
.withUsedHostsPerRemoteDc(CassandraCluster.USED_HOSTS_PER_REMOTE_DC)
.allowRemoteDCsForLocalConsistencyLevel().build();
You didn't indicate which Java driver you're using so based on the API in your sample code, I will assume that your application is using the older v3.11.
Just a minor correction, this log message is a warning (WARN) and not an error:
WARN 1 --- [ main] c.d.d.c.p.DCAwareRoundRobinPolicy : \
Some contact points don't match local data center. Local DC = DC3. \
Non-conforming contact points: \
/1.2.3.4:9042 (DC1),\
/1.2.3.7:9042 (DC4),\
/1.2.3.9:9042 (DC6),\
/1.2.3.5:9042 (DC2),\
/1.2.3.8:9042 (DC5)
The driver reports the warning because the contact points belong to different data centres. This is normal based on your cluster topology and you should not be concerned.
What is more important is that you explicitly specify the local DC by calling withLocalDc(), for example:
Cluster cluster = Cluster.builder()
.addContactPoint(contactpoints)
.withLoadBalancingPolicy(
DCAwareRoundRobinPolicy.builder()
.withLocalDc("DC3")
.build()
).build();
Since you didn't specify the local DC, the driver used the first contact point's DC as the local DC. This can be dangerous in your case because if the first few contact points are unavailable, the driver can pick any DC to be "local".
For more info, see Load balancing in Java driver v3.11. Cheers!

Getting schema version mismatch error while trying to migrate from DSE Cassandra(6.0.5) to Apache Cassandra(3.11.3)

We are trying to migrate/replicate data from a DSE Cassandra node to apache Cassandra node.
Have done POC of that on the local machine and getting a schema version problem.
Below are the details of the POC and the occurring problem.
DSE Cassandra node Details:
dse_version: 6.0.5
release_version: 4.0.0.605
sstable format:
aa-1-bti-CompressionInfo.db
aa-1-bti-Digest.crc32
aa-1-bti-Partitions.db
aa-1-bti-Statistics.db
aa-1-bti-Data.db
aa-1-bti-Filter.db
aa-1-bti-Rows.db
aa-1-bti-TOC.txt
Apache Cassandra node Details:
release_version: 3.11.3
sstable format:
mc-1-big-CompressionInfo.db
mc-1-big-Digest.crc32
mc-1-big-Statistics.db
mc-1-big-Data.db
mc-1-big-Filter.db
mc-1-big-TOC.txt
mc-1-big-Summary.db
There is 1 cluster(My Cluster) in which I have a total of 4 nodes.
2 dse nodes(lets say DSE1 and DSE2) are in one Datacenter(i.e., dc1).
2 apache nodes(lets say APC1 and APC2) are in other Datacenter(i.e., dc2).
Note: I have used NetworkTopologyStrategy Topology Strategy for keyspace and GossipingPropertyFileSnitch as endpoint_snitch. Have added
$JVM_OPTS -Dcassandra.allow_unsafe_replace=true in cassandra-env.sh file also.
When I am creating keyspace on DSE1 node with following CQL query:
CREATE KEYSPACE abc
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'dc1' : 2,
'dc2' : 2
}
AND DURABLE_WRITES = true;
The keyspace is getting created on DSE2 node but the following error is thrown by cqlsh:
Warning: schema version mismatch detected; check the schema versions of your nodes in system.local and system.peers
Also the following error is thrown from one of the 2 Apache nodes(APC1/APC2):
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for cfId 02559ab1-91ee-11ea-8450-2df21166f6a4. If a table was just created, this is likely due to the schema not being fully propagated. Please wait for schema agreement on table creation
Have checked the schema version also on all 4 nodes, getting below result:
Cluster Information:
Name: My Cluster
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
84c22c85-8165-398f-ab9a-e25a6169b7d3: [127.0.0.4, 127.0.0.6]
4c451173-5a05-3691-9a14-520419f849da: [127.0.0.5, 127.0.0.7]
Have tried to resolve the same using solution given in the following link:
https://myadventuresincoding.wordpress.com/2019/04/03/cassandra-fix-schema-disagreement/
But the problem still persists.
Also, Can we migrate data from DSE Cassandra node to apache Cassandra node naturally as suggested in the following link:
Migrate Datastax Enterprise Cassandra to Apache Cassandra
Can anyone please suggest, how to overcome the problem. Is there any other upgradation or compatibility fix we need to implement to resolve this.

sqoop job hangs on database enterprise 4.8.7

Have 6 nodes cassandra cluster from which one node is running in analytics mode and rest other in search modes.
Using DSE sqooop to load data from oracle 11g database to cassandra sample keyspace using command:
dse sqoop cql-import --connect jdbc:oracle:thin:#hostname:port:servicename --username --password --table TEST --cassandra-keyspace test --cassandra-table test_table --cassandra-column-mapping id:ID,name:NAME --cassandra_host --verbose
Note: /tmp/sqoop-cassandra/compile/87h70484m9mfkfl79/TEST.java used or overrides a deprecated API .
Note : Recompile with -Xlint:deprecation for details.
Job keep on above state with no other output or errors and stays forever.
When we check on oracle database side session on oracle side stays inactive with waiting event sqlnet message from client .
Table structure is very simple with two columns ( test on oracle and test_table on cassandra)
Table structure in oracle side : id number( primary key) , name varchar()
Table structure on Cassandra : id int ( primary key ), name text
Keyspace defination is network topology with one replication factor on the nodes which is running as analytics node.
I have spend couple of days to find the reason of this issues - why job hangs and session on oracle side remain in-active. Kindly help on this issues

Connecting to Cassandra with Spark

First, I have bought the new O'Reilly Spark book and tried those Cassandra setup instructions. I've also found other stackoverflow posts and various posts and guides over the web. None of them work as-is. Below is as far as I could get.
This is a test with only a handful of records of dummy test data. I am running the most recent Cassandra 2.0.7 Virtual Box VM provided by plasetcassandra.org linked from the main Cassandra project page.
I downloaded Spark 1.2.1 source and got the latest Cassandra Connector code from github and built both against Scala 2.11. I have JDK 1.8.0_40 and Scala 2.11.6 setup on Mac OS 10.10.2.
I run the spark shell with the cassandra connector loaded:
bin/spark-shell --driver-class-path ../spark-cassandra-connector/spark-cassandra-connector/target/scala-2.11/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar
Then I do what should be a simple row count type test on a test table of four records:
import com.datastax.spark.connector._
sc.stop
val conf = new org.apache.spark.SparkConf(true).set("spark.cassandra.connection.host", "192.168.56.101")
val sc = new org.apache.spark.SparkContext(conf)
val table = sc.cassandraTable("mykeyspace", "playlists")
table.count
I get the following error. What is confusing is that it is getting errors trying to find Cassandra at 127.0.0.1, but it also recognizes the host name that I configured which is 192.168.56.101.
15/03/16 15:56:54 INFO Cluster: New Cassandra host /192.168.56.101:9042 added
15/03/16 15:56:54 INFO CassandraConnector: Connected to Cassandra cluster: Cluster on a Stick
15/03/16 15:56:54 ERROR ServerSideTokenRangeSplitter: Failure while fetching splits from Cassandra
java.io.IOException: Failed to open thrift connection to Cassandra at 127.0.0.1:9160
<snip>
java.io.IOException: Failed to fetch splits of TokenRange(0,0,Set(CassandraNode(/127.0.0.1,/127.0.0.1)),None) from all endpoints: CassandraNode(/127.0.0.1,/127.0.0.1)
BTW, I can also use a configuration file at conf/spark-defaults.conf to do the above without having to close/recreate a spark context or pass in the --driver-clas-path argument. I ultimately hit the same error though, and the above steps seem easier to communicate in this post.
Any ideas?
Check the rpc_address config in your cassandra.yaml file on your cassandra node. It's likely that the spark connector is using that value from the system.local/system.peers tables and it may be set to 127.0.0.1 in your cassandra.yaml.
The spark connector uses thrift to get token range splits from cassandra. Eventually I'm betting this will be replaced as C* 2.1.4 has a new table called system.size_estimates (CASSANDRA-7688). It looks like it's getting the host metadata to find the nearest host and then making the query using thrift on port 9160.

OpsCenter Community, keep data on different cluster

Trying to setup OpsCenter free keep data on different cluster, but getting error:
WARN: Unable to find a matching cluster for node with IP [u'x.x.x.1']; the message was {u'os-load': 0.35}. This usually indicates that an OpsCenter agent is still running on an old node that was decommissioned or is part of a cluster that OpsCenter is no longer monitoring.
Same error for second node in cluster :(
But, if I set [dse].enterprise_override = true in cluster config -- everything works fine.
My config is:
user#casnode1:~/opscenter/conf/clusters# cat ClusterTest.conf
[jmx]
username =
password =
port = 7199
[kerberos_client_principals]
[kerberos]
[agents]
[kerberos_hostnames]
[kerberos_services]
[storage_cassandra]
seed_hosts = x.x.x.2
api_port = 9160
connect_timeout = 6.0
bind_interface =
connection_pool_size = 5
username =
password =
send_thrift_rpc = True
keyspace = OpsCenter2
[cassandra]
username =
seed_hosts = x.x.x.1, x.x.x.4
api_port = 9160
password =
So, the question is: Is it possible in OpsCenter Community setup different cluster to keep opscenter data?
OpsCenter version is 4.0.3
Is it possible in OpsCenter Community setup different cluster to keep opscenter data?
It is not. Storing data on a separate cluster is only supported on DataStax Enterprise clusters.
Note: Using the override you mentioned without permission from DataStax is a violation of the OpsCenter license agreement, and will not be supported.

Resources