Will using DNS failover work as a Multi-DC failover strategy? - cassandra

If I have a multi-DC cluster, DC1 and DC2, where DC2 is only used for failover. And in the driver on the client side, I define the contact points using the domain names (foo1.net, foo2.net, and foo3.net). I have foo* pointing to DC1 and if I ever detect any error with DC1, I will make the DNS route foo* to point to DC2.
This approach seems to work on paper, but will it actually work? Any issues with this approach?

In the case of the DataStax Java Driver 3.x this will not work since DNS is only evaluated at the beginning of Cluster instantiation.
The contact points provided are resolved using DNS via InetAddress.getAllByName in Cluster.Builder.addContactPoint:
public Builder addContactPoint(String address) {
// We explicitly check for nulls because InetAdress.getByName() will happily
// accept it and use localhost (while a null here almost likely mean a user error,
// not "connect to localhost")
if (address == null)
throw new NullPointerException();
try {
addContactPoints(InetAddress.getAllByName(address));
return this;
} catch (UnknownHostException e) {
throw new IllegalArgumentException("Failed to add contact point: " + address, e);
}
}
If DNS is changed during the lifecycle of the Cluster, the driver will not be aware of this unless you construct a new Cluster.Builder instance and create a new Cluster from it.
I prefer a design that pushes Data Center failover outside the scope of your application and into a higher level of your architecture. Instead of making your client application responsible for failing over, you should run instances of your clients colocated in each C* data center. Your application load balancer/router/DNS could direct traffic to instances of your application in other data centers when data centers become unavailable.

Related

Spring Boot app can't connect to Cassandra cluster, driver returning "AllNodesFailedException: Could not reach any contact point"

i've updated my spring-boot to v3.0.0 and spring-data-cassandra to v4.0.0 which resulted in unable to connect to cassandra cluster which is deployed in stg env and runs on IPv6 address having different datacenter rather DC1
i've added a config file which accepts localDC programatically
`#Bean(destroyMethod = "close")
public CqlSession session() {
CqlSession session = CqlSession.builder()
.addContactPoint(InetSocketAddress.createUnresolved("[240b:c0e0:1xx:xxx8:xxxx:x:x:x]", port))
.withConfigLoader(
DriverConfigLoader.programmaticBuilder()
.withString(DefaultDriverOption.LOAD_BALANCING_LOCAL_DATACENTER, localDatacenter)
.withString(DefaultDriverOption.AUTH_PROVIDER_PASSWORD,password)
.withString(DefaultDriverOption.CONNECTION_INIT_QUERY_TIMEOUT,"10s")
.withString(DefaultDriverOption.CONNECTION_CONNECT_TIMEOUT, "20s")
.withString(DefaultDriverOption.REQUEST_TIMEOUT, "20s")
.withString(DefaultDriverOption.CONTROL_CONNECTION_TIMEOUT, "20s")
.withString(DefaultDriverOption.SESSION_KEYSPACE,keyspace)
.build())
//.addContactPoint(InetSocketAddress.createUnresolved(InetAddress.getByName(contactPoints).getHostName(), port))
.build();
}
return session;`
and this is my application.yml file
spring:
data:
cassandra:
keyspace-name: xxx
contact-points: [xxxx:xxxx:xxxx:xxx:xxx:xxx]
port: xxx
local-datacenter: xxxx
use-dc-aware: true
username: xxxxx
password: xxxxx
ssl: true
SchemaAction: CREATE_IF_NOT_EXISTS
So locally I was able to connect to cassandra (by default it is pointing to localhost) , but in stg env my appplication is not able to connect to that cluster
logs in my stg env
caused by: com.datastax.oss.driver.api.core.AllNodesFailedException: Could not reach any contact point, make sure you've provided valid addresses (showing first 1 nodes, use getAllErrors() for more): Node (endPoint=/[240b:cOe0:102:xxxx:xxxx:x:x:x]:3xxx,hostId-null,hashCode=4e9ba6a8):[com.datastax.oss.driver.api.core.connection.ConnectionInitException:[s0|controllid:0x984419ed,L:/[240b:cOe0:102:5dd7: xxxx:x:x:xxx]:4xxx - R:/[240b:c0e0:102:xxxx:xxxx:x:x:x]:3xxx] Protocol initialization request, step 1 (OPTIONS: unexpected tarlure com.datastax.oss.driver.apt.core.connection.closedconnectiontxception: Lost connection to remote peer)]
Network
You appear to have a networking issue. The driver can't connect to any of the nodes because they are unreachable from a network perspective as it states in the error message:
... AllNodesFailedException: Could not reach any contact point ...
You need to check that:
you have configured the correct IP addresses,
you have configured the correct CQL port, and
there is network connectivity between your app and the cluster.
Security
I also noted that you configured the driver to use SSL:
ssl: true
but I don't see anywhere where you've configured the certificate credentials and this could explain why the driver can't initiate connections.
Check that the cluster has client-to-node encryption enabled. If it does then you need to prepare the client certificates and configure SSL on the driver.
Driver build
This post appears to be a duplicate of another question you posted but is now closed due to lack of clarity and details.
In that question it appears you are running a version of the Java driver not produced by DataStax as pointed out by #absurdface:
Specifically I note that java-driver-core-4.11.4-yb-1-RC1.jar isn't a Java driver artifact released by DataStax (there isn't even a 4.11.4 Java driver release). This could be relevant for reasons we'll get into ...
We are not aware of where this build came from and without knowing much about it, it could be the reason you are not able to connect to the cluster.
We recommend that you switch to one of the supported builds of the Java driver. Cheers!
A hearty +1 to everything #erick-ramirez mentioned above. I would also expand on his answers with an observation or two.
Normally spring-data-cassandra is used to automatically configure a CqlSession and make it available for injection (or for use in CqlTemplate etc.). That's what you'd normally be configuring with your application.yml file. But you're apparently creating the CqlSession directly in code, which means that spring-data-cassandra isn't involved... and therefore what's in your application.yml likely isn't being used.
This analysis strongly suggests that your CqlSession is not being configured to use SSL. My understanding is that your testing sequence went as follows:
Tested app locally on a local server, everything worked
Tested app against test environment, observed the errors above
If this sequence is correct and you have SSL enabled in you test environment but not on your local Cassandra instance that could very easily explain the behaviour you're describing.
This explanation could also explain the specific error you cite in the error message. "Lost connection to remote peer" indicates that something is unexpectedly killing your socket connection before any protocol messages are explained... and an SSL issue would cause almost exactly that behaviour.
I would recommend checking the SSL configuration for both servers involved in your testing. I would also suggest consulting the SSL-related documentation referenced by Erick above and confirm that you have all the relevant materials when building your CqlSession.
added the certificate in my spring application
public CqlSession session() throws IOException, CertificateException, NoSuchAlgorithmException, KeyStoreException, KeyManagementException {
Resource resource = new ClassPathResource("root.crt");
InputStream inputStream = resource.getInputStream();
CertificateFactory cf = CertificateFactory.getInstance("X.509");
Certificate cert = cf.generateCertificate(inputStream);
TrustManagerFactory trustManagerFactory = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
KeyStore keyStore = KeyStore.getInstance(KeyStore.getDefaultType());
keyStore.load(null);
keyStore.setCertificateEntry("ca", cert);
trustManagerFactory.init(keyStore);
SSLContext sslContext = SSLContext.getInstance("TLSv1.3");
sslContext.init(null, trustManagerFactory.getTrustManagers(), null);
return CqlSession.builder()
.withSslContext(sslContext)
.addContactPoint(new InetSocketAddress(contactPoints,port))
.withAuthCredentials(username, password)
.withLocalDatacenter(localDatacenter)
.withKeyspace(keyspace)
.build();
}
so added the cert file in the configuration file of the cqlsession builder and this helped me in connecting to the remote cassandra cluster

What load balancing policies are available in Cassandra Java driver 4.x?

We are upgrading datastax Cassandra java driver from 3.2 to 4.x to support DSE 6.8.
Load balancing policies our application currently supports are RoundRobinPolicy and DCAwareRoundRobinPolicy.
These policies aren't available in java-driver-core 4.12.
How can we support the above policies.Please help..
Current code in our application using cassandra-driver-core-3.1.0.jar:
public static LoadBalancingPolicy getLoadBalancingPolicy(String loadBalanceStr, boolean isTokenAware) {
LoadBalancingPolicy loadBalance = null;
if (isTokenAware) {
loadBalance = new TokenAwarePolicy(loadBalanceDataConvert(loadBalanceStr));
} else {
loadBalance = loadBalanceDataConvert(loadBalanceStr);
}
return loadBalance;
}
private static LoadBalancingPolicy loadBalanceDataConvert(String loadBalanceStr) {
if (CassandraConstants.CASSANDRACONNECTION_LOADBALANCEPOLICY_DC.equals(loadBalanceStr)) {
return new DCAwareRoundRobinPolicy.Builder().build();
} else if (CassandraConstants.CASSANDRACONNECTION_LOADBALANCEPOLICY_ROUND.equals(loadBalanceStr)) {
return new RoundRobinPolicy();
}
return null;
}
The load balancing has been heavily simplified in version 4.x of the Cassandra Java driver. You no longer need to nest multiple policies within each other to achieve high availability.
In our opinion, the best policy is the DefaultLoadBalancingPolicy which is enabled by default and achieves all the best attributes as the policies in older versions.
The DefaultLoadBalancingPolicy generates a query plan that is token-aware by default so replicas which own the data appear first and prioritised over other nodes in the local DC. For token-awareness to work, you must provide routing information either by keyspace (with getRoutingKeyspace()), or by routing key (with getRoutingKey()).
If routing information is not provided, the DefaultLoadBalancingPolicy generates a query plan that is a simple round-robin shuffle of available nodes in the local DC.
We understand that developers who are used to configuring DCAwareRoundRobinPolicy in older versions would like to continue using it but we do not recommend it. It is our opinion that failover should take place at the infrastructure layer, not the application layer.
Our opinion is that the DefaultLoadBalancingPolicy is the right choice in all cases. If you prefer to configure DC-failover, make sure you fully understand the implications and know that we think it is the wrong choice.
For details, see the following documents:
Java driver v4 Upgrade Guide
Load Balancing in Java driver v4

Datastax Cassandra C/C++ driver cass_cluster_set_blacklist_filtering functionality

Datastax C/C++ driver has a blacklist filtering functionality as part of its load balancing controls.
https://docs.datastax.com/en/developer/cpp-driver/2.5/topics/configuration/
Correct me If I missing something but my understanding is that a CQL client can't connect to blacklisted hosts.
I'm using C/C++ driver v2.5 and the below codeblock and trying to connect to a multinode cluster:
CassCluster* cluster = cass_cluster_new();
CassSession* session = cass_session_new();
const char* hosts = "192.168.57.101";
cass_cluster_set_contact_points(cluster, hosts);
cass_cluster_set_blacklist_filtering(cluster, hosts);
CassFuture* connect_future = cass_session_connect(session, cluster);
In this codeblock the host to which the CQL client is trying to connect is set as blacklisted. However, CQL client seems to connect to this host and executes any queries. Is there something wrong with the above codeblock? If not so, is this the expected behavior? Does it behaves differently because it is a multinode cluster and establish connection to the other peers?
Any help will be appreciated.
Thank you in advance
Since you are supplying only one contact point, that IP address is being used to establish the control connection into the cluster. Once that control connection is established and the peers table is read to determine other nodes available in the cluster, connections are made to those other nodes. At this point all queries will be routed to those other nodes and not your initial/blacklisted contact point; however the connection to the initial contact point will remain as it is the control connection into the cluster.
To get a better look at what is going on inside the driver you can enable logging in the driver. Here is an example to enable logging via the console:
void on_log(const CassLogMessage* message, void* data) {
fprintf(stderr, "%u.%03u [%s] (%s:%d:%s): %s\n",
(unsigned int) (message->time_ms / 1000),
(unsigned int) (message->time_ms % 1000),
cass_log_level_string(message->severity),
message->file, message->line, message->function,
message->message);
}
/* Log configuration *MUST* be done before any other driver call */
cass_log_set_level(CASS_LOG_TRACE);
cass_log_set_callback(on_log, NULL);
In order to reduce the extra connection on a node that will be blacklisted you can supply a different contact point into the cluster that is not the same as the node (or nodes) that will be blacklisted.

Can Hazelcast connect as a client to existing Hazelcast cluster instead of joining as a member of the cluster to implement vertx clustering

We are currently using vertx and hazelcast as its clustering implementation. For it to work as per the docs hazelcast is embedded inside our application meaning it will join as a member of the cluster. We would like our application to be independent of Hazelcast. The reason is when ever Hazelcast cache becomes inconsistent we are bringing down all our servers and restarting. Instead we would like to keep Hazelcast to its own server and connect vertx as a client so we restart hazelcast independent of our application server. Zookeeper cluster implementation does exactly how we would like but we don't want to maintain another cluster for just this purpose because we are also using Hazelcast for other cache purposes internal to our application. Currently we are doing some thing like this to make vertx work.
Config hazelcastConfig = new Config();
//Group
GroupConfig groupConfig = new GroupConfig();
groupConfig.setName(hzGroupName);
groupConfig.setPassword(groupPassword);
hazelcastConfig.setGroupConfig(groupConfig);
//Properties
Properties properties = new Properties();
properties.setProperty("hazelcast.mancenter.enabled", "false");
properties.setProperty("hazelcast.memcache.enabled", "false");
properties.setProperty("hazelcast.rest.enabled", "false");
properties.setProperty("hazelcast.wait.seconds.before.join", "0");
properties.setProperty("hazelcast.logging.type", "jdk");
hazelcastConfig.setProperties(properties);
//Network
NetworkConfig networkConfig = new NetworkConfig();
networkConfig.setPort(networkPort);
networkConfig.setPortAutoIncrement(networkPortAutoincrement);
//Interfaces
InterfacesConfig interfacesConfig = new InterfacesConfig();
interfacesConfig.setEnabled(true);
interfacesConfig.setInterfaces(interfaces);
networkConfig.setInterfaces(interfacesConfig);
//Join
JoinConfig joinConfig = new JoinConfig();
MulticastConfig multicastConfig = new MulticastConfig();
multicastConfig.setEnabled(false);
joinConfig.setMulticastConfig(multicastConfig);
TcpIpConfig tcpIpConfig = new TcpIpConfig();
tcpIpConfig.setEnabled(true);
List<String> members = Arrays.asList(hzNetworkMembers.split(","));
tcpIpConfig.setMembers(members);
joinConfig.setTcpIpConfig(tcpIpConfig);
networkConfig.setJoin(joinConfig);
//Finish Network
hazelcastConfig.setNetworkConfig(networkConfig);
clusterManager = new HazelcastClusterManager(hazelcastConfig);
VertxOptions options = new VertxOptions().setClusterManager(clusterManager);
options.setClusterHost(interfaces.get(0));
options.setMaxWorkerExecuteTime(VertxOptions.DEFAULT_MAX_WORKER_EXECUTE_TIME * workerVerticleMaxExecutionTime);
options.setBlockedThreadCheckInterval(1000 * 60 * 60);
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
vertx = res.result();
} else {
throw new RuntimeException("Unable to launch Vert.x");
}
});
********* Alternate Solution **********
we actually changed our distributed caching implementation from hazelcast to Redis (Amazon ElastiCache).
We coudnt rely on hazelcast for 3 reasons.
1) because of its inconsistency during server restarts 2) we were using embedded hazelcast and we ended up restarting our app when hazelcast data in inconsistent and we want our app to be independent of other services 3) memory allocation (hazelcast data) now is independent of application server
Vertx 3.2.0 now supports handing it a preconfigured Hazelcast instance for which to build a cluster. Therefore you have complete control over the Hazelcast configuration including how and where you want data stored. But you also need a bug fix from Vert.x 3.2.1 release to really use this.
See updated documentation at https://github.com/vert-x3/vertx-hazelcast/blob/master/src/main/asciidoc/index.adoc#using-an-existing-hazelcast-cluster
Note: When you create your own cluster, you need to have the extra Hazelcast settings required by Vertx. And those are noted in the documentation above.
Vert.x 3.2.1 release fixes an issue that blocks the use of client connections. Be aware that if you do distributed locks with Hazelcast clients, the default timeout is 60 seconds for the lock to go away if the network connection is stalled in a way that isn't obvious to the server nodes (all other JVM exits should immediately clear a lock).
You can lower this amount using:
// This is checked every 10 seconds, so any value < 10 will be treated the same
System.setProperty("hazelcast.client.max.no.heartbeat.seconds", "9");
Also be aware that with Hazelcast clients you may want to use near caching for some maps and look at other advanced configuration options for performance tuning a client which will behave differently than a full data node.
Since version 3.2.1 you can run other full Hazelcast nodes configured correctly with the map settings required by Vertx. And then create custom Hazelcast clients when starting Vertx (taken from a new unit test case):
ClientConfig clientConfig = new ClientConfig().setGroupConfig(new GroupConfig("dev", "dev-pass"));
HazelcastInstance clientNode1 = HazelcastClient.newHazelcastClient(clientConfig);
HazelcastClusterManager mgr1 = new HazelcastClusterManager(clientNode1);
VertxOptions options1 = new VertxOptions().setClusterManager(mgr1).setClustered(true).setClusterHost("127.0.0.1");
Vertx.clusteredVertx(options1, ...)
Obviously your client configuration and needs will differ. Consult the Hazelcast documentation for Client configuration: http://docs.hazelcast.org/docs/3.5/manual/html-single/index.html

Cassandra - Connection and Data Insertion

Below is my basic program
public static void main(String[] args) {
Cluster cluster;
Session session;
cluster = Cluster.builder().addContactPoint("192.168.20.131").withPort(9042).build();
System.out.println("Connection Established");
cluster.close();
}
Now i want to know that i have a 7 node cluster and i have cassandra instance running on all 7 nodes. Assuming that the above mentioned IP address is my entry point how does it actually works. Suppose some other user try to run a program on any other cassandra node out of 7 so which IP will be entered as Contact Point. Or do i have to add all the 7 nodes IP addresses comma separated in my main() method ..?
As it is described here,
The driver discovers the nodes that constitute a cluster by querying
the contact points used in building the cluster object. After this it
is up to the cluster's load balancing policy to keep track of node
events (that is add, down, remove, or up) by its implementation of the
Host.StateListener interface.

Resources