Cassandra Cluster Recovery - cassandra

I have a Spring Boot Application that uses Spring Data for Cassandra. One of the requirements is that the application will start even if the Cassandra Cluster is unavailable. The Application logs the situation and all its endpoints will not work properly but the Application does not shutdown. It should retry to connect to the cluster during this time. When the cluster is available the application should start to operate normally.
If I am able to connect during the application start and the cluster becomes unavailable after that, the cassandra java driver is capable of managing the retries.
How can I manage the retries during application start and still use Cassandra Repositories from Spring Data?
Thanx

It is possible to start a Spring Boot application if Apache Cassandra is not available but you need to define the Session and CassandraTemplate beans on your own with #Lazy. The beans are provided out of the box with CassandraAutoConfiguration but are initialized eagerly (default behavior) which creates a Session. The Session requires a connection to Cassandra which will prevent a startup if it's not initialized lazily.
The following code will initialize the resources lazily:
#Configuration
public class MyCassandraConfiguration {
#Bean
#Lazy
public CassandraTemplate cassandraTemplate(#Lazy Session session, CassandraConverter converter) throws Exception {
return new CassandraTemplate(session, converter);
}
#Bean
#Lazy
public Session session(CassandraConverter converter, Cluster cluster,
CassandraProperties cassandraProperties) throws Exception {
CassandraSessionFactoryBean session = new CassandraSessionFactoryBean();
session.setCluster(cluster);
session.setConverter(converter);
session.setKeyspaceName(cassandraProperties.getKeyspaceName());
session.setSchemaAction(SchemaAction.NONE);
return session.getObject();
}
}

One of the requirements is that the application will start even if the Cassandra Cluster is unavailable
I think you should read this session from the Java driver doc: http://datastax.github.io/java-driver/manual/#cluster-initialization
The Cluster object does not connect automatically unless some calls are executed.
Since you're using Spring Data Cassandra (that I do not recommend since it has less feature than the plain Mapper Module of the Java driver ...) I don't know if the Cluster object or Session object are exposed directly to the users ...
For retry, you can put the cluster.init() call in a try/catch block and if the cluster is still unavaible, you'll catch an NoHostAvailableException according to the docs. Upon the exception, you can schedule a retry of cluster.init() later

Related

CqlSession cannot connect to Azure Cosmos DB Cassandra

I have this code
#Bean
public CqlSession getCqlSession() {
return CqlSession.builder()
.addContactPoint(new InetSocketAddress(cassandraHost, cassandraPort))
.withAuthCredentials(cassandraUsername, cassandraPassword)
.build();
}
The connection is failing with this exception:
Failed to instantiate [com.datastax.oss.driver.api.core.CqlSession]: Factory method 'getCqlSession' threw
exception; nested exception is com.datastax.oss.driver.api.core.AllNodesFailedException: Could not reach
any contact point, make sure you've provided valid addresses (showing first 1 nodes, use getAllErrors()
for more): Node(endPoint=tinyurl-cassandra.cassandra.cosmos.azure.com/52.230.23.170:10350, hostId=null,
hashCode=237f706): [com.datastax.oss.driver.api.core.DriverTimeoutException: [s0|control|id: 0xb89dacff,
L:/192.168.0.101:59158 - R:tinyurl-cassandra.cassandra.cosmos.azure.com/52.230.23.170:10350] Protocol
initialization request, step 1 (OPTIONS): timed out after 5000 ms]
I am new to Cassandra and have tried the following:
Validated that the credentials are okay.
Try with csqlsh - could not connect as well.
Check there's no firewall setup in my machine. Can telnet to host and port.
Can open Cassandra Shell from Azure Data Explorer.
What am I missing? I am new to this. Any help will be appreciated.
Looks like you are using the v.4x version of the Java Driver. The default load balancing in this driver mandates that you provide local data center, e.g:
CqlSession.builder().withSslContext(sc)
.addContactPoint(new InetSocketAddress(cassandraHost, cassandraPort)).**withLocalDatacenter("UK South")**
.withAuthCredentials(cassandraUsername, cassandraPassword).build();
You could take a look at this getting started sample for further reference: https://github.com/Azure-Samples/azure-cosmos-db-cassandra-java-getting-started-v4

Hazelcast not injecting spring dependencies

I'm using hazelcast 3.8.5 as the store for jcache.
It appears hazelcast is not injecting SpringAware dependencies into the CacheLoader.
I took a peek at AbstractCacheRecordStore and it seems like only Hazelcast InstanceAware dependencies are injected, not SpringAware + Autowired
I'm setting up the cluster managedContext programatically like:
config.setManagedContext(springManagedContext);
Update
A work around I've found is put the ApplicationContext into the UserContext of hazelcast. Make the CacheLoader implement HazelcastInstanceAware. Pull the context out of there and finish autowiring the CacheLoader. Not ideal, but it works.
Created https://github.com/hazelcast/hazelcast/issues/11384
Only work around is getting spring app context out of hazelcast user context.

How to leverage a spark cluster from a web app?

A lot of people have asked this question but there is no clear answer except links and references and also most of them are not recent. The question is this :
I have a web app that needs to leverage a spark cluster to run a spark-sql query. My understanding is that submit-job script is asynchronous hence this won't work here. How do I leverage spark in such a setup? Can I just write code in the web app like I do in a self-contained spark application i.e. create a context, set the master URL and do what I need to do ? Will this work in a web app ? If yes, then when would I need the job server that provides REST APIs to submit jobs?
Library for launching Spark applications.
This library allows applications to launch Spark programmatically. There's only one entry point to the library - the SparkLauncher class.
To launch a Spark application, just instantiate a SparkLauncher and configure the application to run. For example:
import org.apache.spark.launcher.SparkLauncher;
public class MyLauncher {
public static void main(String[] args) throws Exception {
Process spark = new SparkLauncher()
.setAppResource("/my/app.jar")
.setMainClass("my.spark.app.Main")
.setMaster("local")
.setConf(SparkLauncher.DRIVER_MEMORY, "2g")
.launch();
spark.waitFor();
}
}
References:
https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/launcher/package-summary.html
I think options will be
Through rest api like Livy (Livy is a new open source Spark REST
Server for submitting and interacting with your Spark jobs from
anywhere. ) or spark server (REST APIs) - See how they connect to
spark interactively from using kernel -
https://www.youtube.com/watch?v=TD1J7MzYcFo&feature=youtu.be&t=33m19s
https://developer.ibm.com/open/apache-toree/
Through jdbc (Running via the Thrift JDBC/ODBC server)
Through ssh and submit a job and wait for yarn status (this will
be SSH to the cluster and do a spark submit through YARN - YARN
give you an application ID and you can keep track of application
status with yarn application status command)

Can Hazelcast connect as a client to existing Hazelcast cluster instead of joining as a member of the cluster to implement vertx clustering

We are currently using vertx and hazelcast as its clustering implementation. For it to work as per the docs hazelcast is embedded inside our application meaning it will join as a member of the cluster. We would like our application to be independent of Hazelcast. The reason is when ever Hazelcast cache becomes inconsistent we are bringing down all our servers and restarting. Instead we would like to keep Hazelcast to its own server and connect vertx as a client so we restart hazelcast independent of our application server. Zookeeper cluster implementation does exactly how we would like but we don't want to maintain another cluster for just this purpose because we are also using Hazelcast for other cache purposes internal to our application. Currently we are doing some thing like this to make vertx work.
Config hazelcastConfig = new Config();
//Group
GroupConfig groupConfig = new GroupConfig();
groupConfig.setName(hzGroupName);
groupConfig.setPassword(groupPassword);
hazelcastConfig.setGroupConfig(groupConfig);
//Properties
Properties properties = new Properties();
properties.setProperty("hazelcast.mancenter.enabled", "false");
properties.setProperty("hazelcast.memcache.enabled", "false");
properties.setProperty("hazelcast.rest.enabled", "false");
properties.setProperty("hazelcast.wait.seconds.before.join", "0");
properties.setProperty("hazelcast.logging.type", "jdk");
hazelcastConfig.setProperties(properties);
//Network
NetworkConfig networkConfig = new NetworkConfig();
networkConfig.setPort(networkPort);
networkConfig.setPortAutoIncrement(networkPortAutoincrement);
//Interfaces
InterfacesConfig interfacesConfig = new InterfacesConfig();
interfacesConfig.setEnabled(true);
interfacesConfig.setInterfaces(interfaces);
networkConfig.setInterfaces(interfacesConfig);
//Join
JoinConfig joinConfig = new JoinConfig();
MulticastConfig multicastConfig = new MulticastConfig();
multicastConfig.setEnabled(false);
joinConfig.setMulticastConfig(multicastConfig);
TcpIpConfig tcpIpConfig = new TcpIpConfig();
tcpIpConfig.setEnabled(true);
List<String> members = Arrays.asList(hzNetworkMembers.split(","));
tcpIpConfig.setMembers(members);
joinConfig.setTcpIpConfig(tcpIpConfig);
networkConfig.setJoin(joinConfig);
//Finish Network
hazelcastConfig.setNetworkConfig(networkConfig);
clusterManager = new HazelcastClusterManager(hazelcastConfig);
VertxOptions options = new VertxOptions().setClusterManager(clusterManager);
options.setClusterHost(interfaces.get(0));
options.setMaxWorkerExecuteTime(VertxOptions.DEFAULT_MAX_WORKER_EXECUTE_TIME * workerVerticleMaxExecutionTime);
options.setBlockedThreadCheckInterval(1000 * 60 * 60);
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
vertx = res.result();
} else {
throw new RuntimeException("Unable to launch Vert.x");
}
});
********* Alternate Solution **********
we actually changed our distributed caching implementation from hazelcast to Redis (Amazon ElastiCache).
We coudnt rely on hazelcast for 3 reasons.
1) because of its inconsistency during server restarts 2) we were using embedded hazelcast and we ended up restarting our app when hazelcast data in inconsistent and we want our app to be independent of other services 3) memory allocation (hazelcast data) now is independent of application server
Vertx 3.2.0 now supports handing it a preconfigured Hazelcast instance for which to build a cluster. Therefore you have complete control over the Hazelcast configuration including how and where you want data stored. But you also need a bug fix from Vert.x 3.2.1 release to really use this.
See updated documentation at https://github.com/vert-x3/vertx-hazelcast/blob/master/src/main/asciidoc/index.adoc#using-an-existing-hazelcast-cluster
Note: When you create your own cluster, you need to have the extra Hazelcast settings required by Vertx. And those are noted in the documentation above.
Vert.x 3.2.1 release fixes an issue that blocks the use of client connections. Be aware that if you do distributed locks with Hazelcast clients, the default timeout is 60 seconds for the lock to go away if the network connection is stalled in a way that isn't obvious to the server nodes (all other JVM exits should immediately clear a lock).
You can lower this amount using:
// This is checked every 10 seconds, so any value < 10 will be treated the same
System.setProperty("hazelcast.client.max.no.heartbeat.seconds", "9");
Also be aware that with Hazelcast clients you may want to use near caching for some maps and look at other advanced configuration options for performance tuning a client which will behave differently than a full data node.
Since version 3.2.1 you can run other full Hazelcast nodes configured correctly with the map settings required by Vertx. And then create custom Hazelcast clients when starting Vertx (taken from a new unit test case):
ClientConfig clientConfig = new ClientConfig().setGroupConfig(new GroupConfig("dev", "dev-pass"));
HazelcastInstance clientNode1 = HazelcastClient.newHazelcastClient(clientConfig);
HazelcastClusterManager mgr1 = new HazelcastClusterManager(clientNode1);
VertxOptions options1 = new VertxOptions().setClusterManager(mgr1).setClustered(true).setClusterHost("127.0.0.1");
Vertx.clusteredVertx(options1, ...)
Obviously your client configuration and needs will differ. Consult the Hazelcast documentation for Client configuration: http://docs.hazelcast.org/docs/3.5/manual/html-single/index.html

Transaction management and Multithreading in Hibernate 4

I have a requirement of executing parent task which may or maynot have child task. Each parent and child task should be run in thread. If something goes wrong in parent or child execution the transaction of both parent and child task must be rollback. I am using hibernate4.
If I got it, the parent and the child task will run in differents threads.
According to me it's a very bad idea that does not worth considering.
While it may be possible using jta transaction, it's clearly not the case using hibernate transaction management delegation to underlying jdbc connection (you have one connection per session and MUST NOT share an hibernate session between threads).
Using jta you will have to handle connection retrieval and transactions yourself and can't so take advantages of connection pooling and container managed transaction (spring or java ee ones). It may be overcomplicated for about no performance improvments as sharing the database connection between two threads will just probably move the bottleneck one level below.
See how to share one transaction between multi threads
According to OP expectation here is a pseudo code for Hibernate 4 standalone session management with jdbc transaction (I personnaly advise to go with a container (Java ee or spring) and JTA container managed transaction)
In hibernate.cfg.xml
<property name="hibernate.current_session_context_class">thread</property>
SessionFactory :
Configuration configuration = new Configuration();
configuration.configure("hibernate.cfg.xml");
StandardServiceRegistryBuilder builder = new StandardServiceRegistryBuilder().applySettings(configuration.getProperties());
SessionFactory sessionFactory = configuration.buildSessionFactory(builder.build());
The session factory should be exposed using a singleton (any way you choose you must have only one instance for the whole app)
public void executeParentTask() {
try {
sessionFactory.getCurrentSession().beginTransaction();
sessionFactory.getCurrentSession().persist(someEntity);
myChildTask.execute();
sessionFactory.getCurrentSession().getTransaction().commit();
}
catch (RuntimeException e) {
sessionFactory .getCurrentSession().getTransaction().rollback();
throw e; // or display error message
}
}
getCurrentSession() will return the session bound to the current thread. If you manage the thread execution yourself you should create the session at the beginning of the thread execution and close it at the end.
the child task will retrieve the same session than the parent one using sessionFactory.getCurrentSession()
See https://docs.jboss.org/hibernate/orm/4.3/manual/en-US/html/ch03.html#configuration-sessionfactory
http://docs.jboss.org/hibernate/orm/4.3/manual/en-US/html_single/#transactions-demarcation-nonmanaged
You may find this interesting too : How to configure and get session in Hibernate 4.3.4.Final?

Resources