What load balancing policies are available in Cassandra Java driver 4.x? - cassandra

We are upgrading datastax Cassandra java driver from 3.2 to 4.x to support DSE 6.8.
Load balancing policies our application currently supports are RoundRobinPolicy and DCAwareRoundRobinPolicy.
These policies aren't available in java-driver-core 4.12.
How can we support the above policies.Please help..
Current code in our application using cassandra-driver-core-3.1.0.jar:
public static LoadBalancingPolicy getLoadBalancingPolicy(String loadBalanceStr, boolean isTokenAware) {
LoadBalancingPolicy loadBalance = null;
if (isTokenAware) {
loadBalance = new TokenAwarePolicy(loadBalanceDataConvert(loadBalanceStr));
} else {
loadBalance = loadBalanceDataConvert(loadBalanceStr);
}
return loadBalance;
}
private static LoadBalancingPolicy loadBalanceDataConvert(String loadBalanceStr) {
if (CassandraConstants.CASSANDRACONNECTION_LOADBALANCEPOLICY_DC.equals(loadBalanceStr)) {
return new DCAwareRoundRobinPolicy.Builder().build();
} else if (CassandraConstants.CASSANDRACONNECTION_LOADBALANCEPOLICY_ROUND.equals(loadBalanceStr)) {
return new RoundRobinPolicy();
}
return null;
}

The load balancing has been heavily simplified in version 4.x of the Cassandra Java driver. You no longer need to nest multiple policies within each other to achieve high availability.
In our opinion, the best policy is the DefaultLoadBalancingPolicy which is enabled by default and achieves all the best attributes as the policies in older versions.
The DefaultLoadBalancingPolicy generates a query plan that is token-aware by default so replicas which own the data appear first and prioritised over other nodes in the local DC. For token-awareness to work, you must provide routing information either by keyspace (with getRoutingKeyspace()), or by routing key (with getRoutingKey()).
If routing information is not provided, the DefaultLoadBalancingPolicy generates a query plan that is a simple round-robin shuffle of available nodes in the local DC.
We understand that developers who are used to configuring DCAwareRoundRobinPolicy in older versions would like to continue using it but we do not recommend it. It is our opinion that failover should take place at the infrastructure layer, not the application layer.
Our opinion is that the DefaultLoadBalancingPolicy is the right choice in all cases. If you prefer to configure DC-failover, make sure you fully understand the implications and know that we think it is the wrong choice.
For details, see the following documents:
Java driver v4 Upgrade Guide
Load Balancing in Java driver v4

Related

Will using DNS failover work as a Multi-DC failover strategy?

If I have a multi-DC cluster, DC1 and DC2, where DC2 is only used for failover. And in the driver on the client side, I define the contact points using the domain names (foo1.net, foo2.net, and foo3.net). I have foo* pointing to DC1 and if I ever detect any error with DC1, I will make the DNS route foo* to point to DC2.
This approach seems to work on paper, but will it actually work? Any issues with this approach?
In the case of the DataStax Java Driver 3.x this will not work since DNS is only evaluated at the beginning of Cluster instantiation.
The contact points provided are resolved using DNS via InetAddress.getAllByName in Cluster.Builder.addContactPoint:
public Builder addContactPoint(String address) {
// We explicitly check for nulls because InetAdress.getByName() will happily
// accept it and use localhost (while a null here almost likely mean a user error,
// not "connect to localhost")
if (address == null)
throw new NullPointerException();
try {
addContactPoints(InetAddress.getAllByName(address));
return this;
} catch (UnknownHostException e) {
throw new IllegalArgumentException("Failed to add contact point: " + address, e);
}
}
If DNS is changed during the lifecycle of the Cluster, the driver will not be aware of this unless you construct a new Cluster.Builder instance and create a new Cluster from it.
I prefer a design that pushes Data Center failover outside the scope of your application and into a higher level of your architecture. Instead of making your client application responsible for failing over, you should run instances of your clients colocated in each C* data center. Your application load balancer/router/DNS could direct traffic to instances of your application in other data centers when data centers become unavailable.

How to access counters from Reporting task in Nifi 1.2.0

IN Nifi 1.4.0, in order to access counters from a scripted reporting task, you can do something like:
context.eventAccess.controllerStatus.processGroupStatus.each { pg ->
pg.processorStatus.each { ps ->
ps.counters.each { counter
System.out.println("${counter.key} -> ${counter.value})
}
}
That's because ProcessorStatus API exposes:
Map<String,Long> getCounters()
However, I'm with NiFi 1.2.0.
Which does not have this method for class ProcessorReportingTask.
I'm desperately searching for a way to access the counters from a ReportingTask (so, through a ReportingContext, because that's what's bound within the script).
Because my ReportingTask is reporting metrics to our graphite server.
Any Idea?
I know I could access the metrics through REST APIs.
But then I would vanish completely the goal of my ScriptedReporingTask, and I would need to setup an additional piece of software to collect these metrics from outside. While the ReponrtingTask will just run from NiFi system.
I believe you will have to upgrade to 1.4.0 in order to get access to the getCounters method, there is no other way I know of through a ReportingContext which is why it was added in 1.4.0 as part of this JIRA:
https://issues.apache.org/jira/browse/NIFI-106

How to configure multiple cassandra contact-points for lagom?

In Lagom it appears the contact point get loaded from the service locator which accepts only a single URI. How can we specify multiple cassandra contact-points?
lagom.services {
cas_native = "tcp://10.0.0.120:9042"
}
I have tried setting just the contact points in the akka persistence config but that doesn't seem to override the service locator config.
All that I was missing was the session provider to override service lookup:
session-provider = akka.persistence.cassandra.ConfigSessionProvider
contact-points = ["10.0.0.120", "10.0.3.114", "10.0.4.168"]
was needed in the lagom cassandra config

How to set client side timestamp in spring data cassandra?

In datastax driver we have api like
withTimestampGenerator(new AtomicMonotonicTimestampGenerator())
to enable the feature to setting timestamp per query at client side. How can we achieve same with spring data canssandra.
I am aware that i can use "USING TIMESTAMP value" in cql but is there something which spring data cassandra provide ? I dont find such api in CassandraClusterFactoryBean .
You are correct!
Unfortunately, it appears SD Cassandra is missing a configuration option on the CassandraCqlClusterFactoryBean class supporting the withTimestampGenerator(:TimestampGenerator) configuration setting with the DataStax Java driver Cluster.Builder API.
Equally unfortunate is there is no workaround (other than the USING TIMESTAMP in CQL) at the moment either.
It also appears the CassandraCqlClusterFactoryBean is missing configuration options for:
Cluster.Builder.withAddressTranslator(:AddressTranslator)
Cluster.Builder.withClusterName(:String)
Cluster.Builder.withCodeRegistry(:CodecRegistry)
Cluster.Builder.withMaxSchemaAgreementWaitSeconds(:int)
Cluster.Builder.withSpeculativeExecutionPolicy(:SpeculativeExecutionPolicy)
Though, beware, the withTimestampGenerator(..) is only supported in version 3 of the DataStax Java Driver, which the next version (i.e. 1.5.0) of SD Cassandra will support...
This feature is only available with version V3 or above of the native protocol. With earlier versions, timestamps are always generated server-side, and setting a generator through this method will have no effect.
The timestamp capability is available in SD 1.5.x,
public void setTimestampGenerator(TimestampGenerator timestampGenerator) {
this.timestampGenerator = timestampGenerator;
}
https://github.com/spring-projects/spring-data-cassandra/blob/cc4625f492c256e5fa3cb6640d19b4e048b9542b/spring-data-cassandra/src/main/java/org/springframework/data/cql/config/CassandraCqlClusterFactoryBean.java.

Can Hazelcast connect as a client to existing Hazelcast cluster instead of joining as a member of the cluster to implement vertx clustering

We are currently using vertx and hazelcast as its clustering implementation. For it to work as per the docs hazelcast is embedded inside our application meaning it will join as a member of the cluster. We would like our application to be independent of Hazelcast. The reason is when ever Hazelcast cache becomes inconsistent we are bringing down all our servers and restarting. Instead we would like to keep Hazelcast to its own server and connect vertx as a client so we restart hazelcast independent of our application server. Zookeeper cluster implementation does exactly how we would like but we don't want to maintain another cluster for just this purpose because we are also using Hazelcast for other cache purposes internal to our application. Currently we are doing some thing like this to make vertx work.
Config hazelcastConfig = new Config();
//Group
GroupConfig groupConfig = new GroupConfig();
groupConfig.setName(hzGroupName);
groupConfig.setPassword(groupPassword);
hazelcastConfig.setGroupConfig(groupConfig);
//Properties
Properties properties = new Properties();
properties.setProperty("hazelcast.mancenter.enabled", "false");
properties.setProperty("hazelcast.memcache.enabled", "false");
properties.setProperty("hazelcast.rest.enabled", "false");
properties.setProperty("hazelcast.wait.seconds.before.join", "0");
properties.setProperty("hazelcast.logging.type", "jdk");
hazelcastConfig.setProperties(properties);
//Network
NetworkConfig networkConfig = new NetworkConfig();
networkConfig.setPort(networkPort);
networkConfig.setPortAutoIncrement(networkPortAutoincrement);
//Interfaces
InterfacesConfig interfacesConfig = new InterfacesConfig();
interfacesConfig.setEnabled(true);
interfacesConfig.setInterfaces(interfaces);
networkConfig.setInterfaces(interfacesConfig);
//Join
JoinConfig joinConfig = new JoinConfig();
MulticastConfig multicastConfig = new MulticastConfig();
multicastConfig.setEnabled(false);
joinConfig.setMulticastConfig(multicastConfig);
TcpIpConfig tcpIpConfig = new TcpIpConfig();
tcpIpConfig.setEnabled(true);
List<String> members = Arrays.asList(hzNetworkMembers.split(","));
tcpIpConfig.setMembers(members);
joinConfig.setTcpIpConfig(tcpIpConfig);
networkConfig.setJoin(joinConfig);
//Finish Network
hazelcastConfig.setNetworkConfig(networkConfig);
clusterManager = new HazelcastClusterManager(hazelcastConfig);
VertxOptions options = new VertxOptions().setClusterManager(clusterManager);
options.setClusterHost(interfaces.get(0));
options.setMaxWorkerExecuteTime(VertxOptions.DEFAULT_MAX_WORKER_EXECUTE_TIME * workerVerticleMaxExecutionTime);
options.setBlockedThreadCheckInterval(1000 * 60 * 60);
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
vertx = res.result();
} else {
throw new RuntimeException("Unable to launch Vert.x");
}
});
********* Alternate Solution **********
we actually changed our distributed caching implementation from hazelcast to Redis (Amazon ElastiCache).
We coudnt rely on hazelcast for 3 reasons.
1) because of its inconsistency during server restarts 2) we were using embedded hazelcast and we ended up restarting our app when hazelcast data in inconsistent and we want our app to be independent of other services 3) memory allocation (hazelcast data) now is independent of application server
Vertx 3.2.0 now supports handing it a preconfigured Hazelcast instance for which to build a cluster. Therefore you have complete control over the Hazelcast configuration including how and where you want data stored. But you also need a bug fix from Vert.x 3.2.1 release to really use this.
See updated documentation at https://github.com/vert-x3/vertx-hazelcast/blob/master/src/main/asciidoc/index.adoc#using-an-existing-hazelcast-cluster
Note: When you create your own cluster, you need to have the extra Hazelcast settings required by Vertx. And those are noted in the documentation above.
Vert.x 3.2.1 release fixes an issue that blocks the use of client connections. Be aware that if you do distributed locks with Hazelcast clients, the default timeout is 60 seconds for the lock to go away if the network connection is stalled in a way that isn't obvious to the server nodes (all other JVM exits should immediately clear a lock).
You can lower this amount using:
// This is checked every 10 seconds, so any value < 10 will be treated the same
System.setProperty("hazelcast.client.max.no.heartbeat.seconds", "9");
Also be aware that with Hazelcast clients you may want to use near caching for some maps and look at other advanced configuration options for performance tuning a client which will behave differently than a full data node.
Since version 3.2.1 you can run other full Hazelcast nodes configured correctly with the map settings required by Vertx. And then create custom Hazelcast clients when starting Vertx (taken from a new unit test case):
ClientConfig clientConfig = new ClientConfig().setGroupConfig(new GroupConfig("dev", "dev-pass"));
HazelcastInstance clientNode1 = HazelcastClient.newHazelcastClient(clientConfig);
HazelcastClusterManager mgr1 = new HazelcastClusterManager(clientNode1);
VertxOptions options1 = new VertxOptions().setClusterManager(mgr1).setClustered(true).setClusterHost("127.0.0.1");
Vertx.clusteredVertx(options1, ...)
Obviously your client configuration and needs will differ. Consult the Hazelcast documentation for Client configuration: http://docs.hazelcast.org/docs/3.5/manual/html-single/index.html

Resources