Hazelcast IMDG vs Hazelcast Jet config - hazelcast

How to read Hazlecast IMDG data in the Hazelcast jet.
In my case I required both Hazlecast IMDG (Distributed cache) to store data for the future and also jet to perform batch and stream processing.
So I will be saving data using Hazelcast IMDG(MapStore) and filtering using Hazelcast jet.
public class Test {
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
JetInstance jet = Jet.newJetInstance();
public static void main(String[] args) {
Test t = new Test();
t.loadIntoIMap();
t.readFromIMap();
}
public void loadIntoIMap() {
IMap<String, String> map = hz.getMap("my-distributed-map");
// Standard Put and Get
map.put("1", "John");
map.put("2", "Mary");
map.put("3", "Jane");
}
public void readFromIMap() {
System.err.println("--manu---");
jet.getMap("s").put("1", "2");
System.err.println(jet.getMap("s").size());
System.err.println(jet.getMap("my-distributed-map").size());
}
}
Do we need separate configuration both(jet and IMDG) or in single config can I share Hz IMap data inside jet.
I'm little confused between jet and Hazelcast IMDG

The answer differs depending on the version you want to use.
IMDG up to 4.2 and Jet 4.5
Hazelcast Jet is built on top of Hazelcast IMDG. When you start a Jet instance there is automatically an IMDG instance running. There is JetInstance#getHazelcastInstance method to retrieve the IMDG instance from Jet instance and JetConfig#setHazelcastConfig to configure IMDG specific configs.
You can access the maps from your cluster in Jet using com.hazelcast.jet.pipeline.Sources#map(String)
You should not start both IMDG and Jet separately on the same machine. However, you can create 2 clusters, one IMDG, one Jet and connect from Jet using com.hazelcast.jet.pipeline.Sources#remoteMap(String, ClientConfig) and similar for other data structures.
If you are already using Hazelcast it's likely this version.
Hazelcast 5.0
With a recent 5.0 release these two products were merged together. There is a single artefact to use - com.hazelcast:hazelcast. You just create a Hazelcast instance and, if enabled, you can get the Jet engine from there using HazelcastInstance#getJet
5.0 is 100 % compatible with IMDG 4.2, just change the dependency and mostly compatible with Jet 4.5, some code changes are needed though.

Related

What load balancing policies are available in Cassandra Java driver 4.x?

We are upgrading datastax Cassandra java driver from 3.2 to 4.x to support DSE 6.8.
Load balancing policies our application currently supports are RoundRobinPolicy and DCAwareRoundRobinPolicy.
These policies aren't available in java-driver-core 4.12.
How can we support the above policies.Please help..
Current code in our application using cassandra-driver-core-3.1.0.jar:
public static LoadBalancingPolicy getLoadBalancingPolicy(String loadBalanceStr, boolean isTokenAware) {
LoadBalancingPolicy loadBalance = null;
if (isTokenAware) {
loadBalance = new TokenAwarePolicy(loadBalanceDataConvert(loadBalanceStr));
} else {
loadBalance = loadBalanceDataConvert(loadBalanceStr);
}
return loadBalance;
}
private static LoadBalancingPolicy loadBalanceDataConvert(String loadBalanceStr) {
if (CassandraConstants.CASSANDRACONNECTION_LOADBALANCEPOLICY_DC.equals(loadBalanceStr)) {
return new DCAwareRoundRobinPolicy.Builder().build();
} else if (CassandraConstants.CASSANDRACONNECTION_LOADBALANCEPOLICY_ROUND.equals(loadBalanceStr)) {
return new RoundRobinPolicy();
}
return null;
}
The load balancing has been heavily simplified in version 4.x of the Cassandra Java driver. You no longer need to nest multiple policies within each other to achieve high availability.
In our opinion, the best policy is the DefaultLoadBalancingPolicy which is enabled by default and achieves all the best attributes as the policies in older versions.
The DefaultLoadBalancingPolicy generates a query plan that is token-aware by default so replicas which own the data appear first and prioritised over other nodes in the local DC. For token-awareness to work, you must provide routing information either by keyspace (with getRoutingKeyspace()), or by routing key (with getRoutingKey()).
If routing information is not provided, the DefaultLoadBalancingPolicy generates a query plan that is a simple round-robin shuffle of available nodes in the local DC.
We understand that developers who are used to configuring DCAwareRoundRobinPolicy in older versions would like to continue using it but we do not recommend it. It is our opinion that failover should take place at the infrastructure layer, not the application layer.
Our opinion is that the DefaultLoadBalancingPolicy is the right choice in all cases. If you prefer to configure DC-failover, make sure you fully understand the implications and know that we think it is the wrong choice.
For details, see the following documents:
Java driver v4 Upgrade Guide
Load Balancing in Java driver v4

How to set client side timestamp in spring data cassandra?

In datastax driver we have api like
withTimestampGenerator(new AtomicMonotonicTimestampGenerator())
to enable the feature to setting timestamp per query at client side. How can we achieve same with spring data canssandra.
I am aware that i can use "USING TIMESTAMP value" in cql but is there something which spring data cassandra provide ? I dont find such api in CassandraClusterFactoryBean .
You are correct!
Unfortunately, it appears SD Cassandra is missing a configuration option on the CassandraCqlClusterFactoryBean class supporting the withTimestampGenerator(:TimestampGenerator) configuration setting with the DataStax Java driver Cluster.Builder API.
Equally unfortunate is there is no workaround (other than the USING TIMESTAMP in CQL) at the moment either.
It also appears the CassandraCqlClusterFactoryBean is missing configuration options for:
Cluster.Builder.withAddressTranslator(:AddressTranslator)
Cluster.Builder.withClusterName(:String)
Cluster.Builder.withCodeRegistry(:CodecRegistry)
Cluster.Builder.withMaxSchemaAgreementWaitSeconds(:int)
Cluster.Builder.withSpeculativeExecutionPolicy(:SpeculativeExecutionPolicy)
Though, beware, the withTimestampGenerator(..) is only supported in version 3 of the DataStax Java Driver, which the next version (i.e. 1.5.0) of SD Cassandra will support...
This feature is only available with version V3 or above of the native protocol. With earlier versions, timestamps are always generated server-side, and setting a generator through this method will have no effect.
The timestamp capability is available in SD 1.5.x,
public void setTimestampGenerator(TimestampGenerator timestampGenerator) {
this.timestampGenerator = timestampGenerator;
}
https://github.com/spring-projects/spring-data-cassandra/blob/cc4625f492c256e5fa3cb6640d19b4e048b9542b/spring-data-cassandra/src/main/java/org/springframework/data/cql/config/CassandraCqlClusterFactoryBean.java.

Cassandra Cluster Recovery

I have a Spring Boot Application that uses Spring Data for Cassandra. One of the requirements is that the application will start even if the Cassandra Cluster is unavailable. The Application logs the situation and all its endpoints will not work properly but the Application does not shutdown. It should retry to connect to the cluster during this time. When the cluster is available the application should start to operate normally.
If I am able to connect during the application start and the cluster becomes unavailable after that, the cassandra java driver is capable of managing the retries.
How can I manage the retries during application start and still use Cassandra Repositories from Spring Data?
Thanx
It is possible to start a Spring Boot application if Apache Cassandra is not available but you need to define the Session and CassandraTemplate beans on your own with #Lazy. The beans are provided out of the box with CassandraAutoConfiguration but are initialized eagerly (default behavior) which creates a Session. The Session requires a connection to Cassandra which will prevent a startup if it's not initialized lazily.
The following code will initialize the resources lazily:
#Configuration
public class MyCassandraConfiguration {
#Bean
#Lazy
public CassandraTemplate cassandraTemplate(#Lazy Session session, CassandraConverter converter) throws Exception {
return new CassandraTemplate(session, converter);
}
#Bean
#Lazy
public Session session(CassandraConverter converter, Cluster cluster,
CassandraProperties cassandraProperties) throws Exception {
CassandraSessionFactoryBean session = new CassandraSessionFactoryBean();
session.setCluster(cluster);
session.setConverter(converter);
session.setKeyspaceName(cassandraProperties.getKeyspaceName());
session.setSchemaAction(SchemaAction.NONE);
return session.getObject();
}
}
One of the requirements is that the application will start even if the Cassandra Cluster is unavailable
I think you should read this session from the Java driver doc: http://datastax.github.io/java-driver/manual/#cluster-initialization
The Cluster object does not connect automatically unless some calls are executed.
Since you're using Spring Data Cassandra (that I do not recommend since it has less feature than the plain Mapper Module of the Java driver ...) I don't know if the Cluster object or Session object are exposed directly to the users ...
For retry, you can put the cluster.init() call in a try/catch block and if the cluster is still unavaible, you'll catch an NoHostAvailableException according to the docs. Upon the exception, you can schedule a retry of cluster.init() later

Can Hazelcast connect as a client to existing Hazelcast cluster instead of joining as a member of the cluster to implement vertx clustering

We are currently using vertx and hazelcast as its clustering implementation. For it to work as per the docs hazelcast is embedded inside our application meaning it will join as a member of the cluster. We would like our application to be independent of Hazelcast. The reason is when ever Hazelcast cache becomes inconsistent we are bringing down all our servers and restarting. Instead we would like to keep Hazelcast to its own server and connect vertx as a client so we restart hazelcast independent of our application server. Zookeeper cluster implementation does exactly how we would like but we don't want to maintain another cluster for just this purpose because we are also using Hazelcast for other cache purposes internal to our application. Currently we are doing some thing like this to make vertx work.
Config hazelcastConfig = new Config();
//Group
GroupConfig groupConfig = new GroupConfig();
groupConfig.setName(hzGroupName);
groupConfig.setPassword(groupPassword);
hazelcastConfig.setGroupConfig(groupConfig);
//Properties
Properties properties = new Properties();
properties.setProperty("hazelcast.mancenter.enabled", "false");
properties.setProperty("hazelcast.memcache.enabled", "false");
properties.setProperty("hazelcast.rest.enabled", "false");
properties.setProperty("hazelcast.wait.seconds.before.join", "0");
properties.setProperty("hazelcast.logging.type", "jdk");
hazelcastConfig.setProperties(properties);
//Network
NetworkConfig networkConfig = new NetworkConfig();
networkConfig.setPort(networkPort);
networkConfig.setPortAutoIncrement(networkPortAutoincrement);
//Interfaces
InterfacesConfig interfacesConfig = new InterfacesConfig();
interfacesConfig.setEnabled(true);
interfacesConfig.setInterfaces(interfaces);
networkConfig.setInterfaces(interfacesConfig);
//Join
JoinConfig joinConfig = new JoinConfig();
MulticastConfig multicastConfig = new MulticastConfig();
multicastConfig.setEnabled(false);
joinConfig.setMulticastConfig(multicastConfig);
TcpIpConfig tcpIpConfig = new TcpIpConfig();
tcpIpConfig.setEnabled(true);
List<String> members = Arrays.asList(hzNetworkMembers.split(","));
tcpIpConfig.setMembers(members);
joinConfig.setTcpIpConfig(tcpIpConfig);
networkConfig.setJoin(joinConfig);
//Finish Network
hazelcastConfig.setNetworkConfig(networkConfig);
clusterManager = new HazelcastClusterManager(hazelcastConfig);
VertxOptions options = new VertxOptions().setClusterManager(clusterManager);
options.setClusterHost(interfaces.get(0));
options.setMaxWorkerExecuteTime(VertxOptions.DEFAULT_MAX_WORKER_EXECUTE_TIME * workerVerticleMaxExecutionTime);
options.setBlockedThreadCheckInterval(1000 * 60 * 60);
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
vertx = res.result();
} else {
throw new RuntimeException("Unable to launch Vert.x");
}
});
********* Alternate Solution **********
we actually changed our distributed caching implementation from hazelcast to Redis (Amazon ElastiCache).
We coudnt rely on hazelcast for 3 reasons.
1) because of its inconsistency during server restarts 2) we were using embedded hazelcast and we ended up restarting our app when hazelcast data in inconsistent and we want our app to be independent of other services 3) memory allocation (hazelcast data) now is independent of application server
Vertx 3.2.0 now supports handing it a preconfigured Hazelcast instance for which to build a cluster. Therefore you have complete control over the Hazelcast configuration including how and where you want data stored. But you also need a bug fix from Vert.x 3.2.1 release to really use this.
See updated documentation at https://github.com/vert-x3/vertx-hazelcast/blob/master/src/main/asciidoc/index.adoc#using-an-existing-hazelcast-cluster
Note: When you create your own cluster, you need to have the extra Hazelcast settings required by Vertx. And those are noted in the documentation above.
Vert.x 3.2.1 release fixes an issue that blocks the use of client connections. Be aware that if you do distributed locks with Hazelcast clients, the default timeout is 60 seconds for the lock to go away if the network connection is stalled in a way that isn't obvious to the server nodes (all other JVM exits should immediately clear a lock).
You can lower this amount using:
// This is checked every 10 seconds, so any value < 10 will be treated the same
System.setProperty("hazelcast.client.max.no.heartbeat.seconds", "9");
Also be aware that with Hazelcast clients you may want to use near caching for some maps and look at other advanced configuration options for performance tuning a client which will behave differently than a full data node.
Since version 3.2.1 you can run other full Hazelcast nodes configured correctly with the map settings required by Vertx. And then create custom Hazelcast clients when starting Vertx (taken from a new unit test case):
ClientConfig clientConfig = new ClientConfig().setGroupConfig(new GroupConfig("dev", "dev-pass"));
HazelcastInstance clientNode1 = HazelcastClient.newHazelcastClient(clientConfig);
HazelcastClusterManager mgr1 = new HazelcastClusterManager(clientNode1);
VertxOptions options1 = new VertxOptions().setClusterManager(mgr1).setClustered(true).setClusterHost("127.0.0.1");
Vertx.clusteredVertx(options1, ...)
Obviously your client configuration and needs will differ. Consult the Hazelcast documentation for Client configuration: http://docs.hazelcast.org/docs/3.5/manual/html-single/index.html

Getting UnsupportedOperationException when i do HazelcastInstance.getConfig() from remote client

I Am using a Hazelcast java client(on node1), and creating Hazelcast maps on the different node(different laptop--node2).
My setup:
on node2 - Hazelcast is running.
on node1 - Stand -alone java program which acts like a Hazelcast java client.
ClientConfig config = new ClientConfig();
config.getGroupConfig().setName("dev").setPassword("dev-pass");
config.addAddress("<node2-ip>:5701");
HazelcastInstance inst = HazelcastClient.newHazelcastClient(config);
//Creating a mapconfig
MapConfig mcfg = new MapConfig();
mcfg.setName("democache");
//creating a mapstore config
MapStoreConfig mapStoreCfg = new MapStoreConfig();
mapStoreCfg.setClassName("com.main.MyMapStore").setEnabled(true);
MyMapStore is my implementation of Hazelcast MapStore. This class resides on
mcfg.setMapStoreConfig(mapStoreCfg);
**inst.getConfig()**.addMapConfig(mcfg);
I am getting "UnsupportedOperationException" when i run this code.. When i do inst.getConfig(), getting this exception.. Can anyone please let me know what is work around for this!
Stacktrace is:
Exception in thread "main" java.lang.UnsupportedOperationException
at com.hazelcast.client.HazelcastClient.getConfig(HazelcastClient.java:144)
at ClientClass.main(ClientClass.java:34)
Hazelcast clients can not access cluster nodes' configuration. This operation is unsupported.
Also you should not update/change configuration after cluster is up.
UnsupportedOperationException, when doing HazelcastInstance.getConfig() from hazelcast client
Client do not store data, so it does not use MapStore, so you should configure mapstore not on client, but the other hazelcast server instances. Like that:
Config config = new Config();
config.addMapConfig(mapconfig);
HazelcastInstance node1 = Hazelcast.newHazelcastInstance(cfg);

Resources