Multiple Hazelcast instances or one instance - hazelcast

I am new to Hazelcast I could setup the Hazelcast server,start.
My web application is a monolothic application and need to introduce a distributed caching mechanism.There will be relatively good amount of hits will be coming so my question is if I write the code something like below will it be a good approach as it will be created many instances. Or is that the behaviour is expected? Sorry for my dump question.
import com.hazelcast.client.config.ClientConfig;
import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.map.IMap;
public class HazlecastMain {
public static void main(String[] args) {
//Client configuration
ClientConfig clientConfig = new ClientConfig();
clientConfig.setClusterName("dev");
clientConfig.getNetworkConfig().addAddress("http://localhost:8080");
HazelcastInstance newHazelcastInstance = Hazelcast.newHazelcastInstance();
IMap<Object, Object> map = newHazelcastInstance.getMap("customers");
map.put("1", "AA");
map.put("2", "BB");
map.put("3", "CC");
System.out.println(map.get("1"));
HazelcastInstance newHazelcastInstance2 = Hazelcast.newHazelcastInstance();
IMap<Object, Object> map2 = newHazelcastInstance2.getMap("customers");
System.out.println(map2.get("2"));
}
}

You don't need to initiate a new instance of Hazelcast each time you want to access a map, same with the getMap invocation, as both these are expensive remote operations. Get a handle to your map with getMap once and continue to use the same reference for all map operations. Also, avoid creating multiple new instances for Hazelcast access and use the one that you had already created - newHazelcastInstance

Related

Injecting custom datastax.session into Spring-Data Cassandra

Is it possible to use a custom datastax session for Spring-Data?
Hi, I know Spring-Data for Cassandra uses datastax session internally. However I have a custom datastax session object (given by another service) that I would like Spring-Data to use instead of the one prewired. Assuming the versions of both datastax sessions are the same, is this possible?
Yes, it's possible.
Depending on your setup, there are a couple of approaches. Let me explain the two most common scenarios:
Direct usage of Template API
Session yourSession = …;
CqlTemplate cqlTemplate = new CqlTemplate(yourSession);
CassandraTemplate cassandraTemplate = new CassandraTemplate(yourSession);
Exposing the Session as #Bean
This one might require a bit more setup as configuration support expects usage of CassandraSessionFactoryBean and CassandraClusterFactoryBean.
Take a look at AbstractCassandraConfiguration to see what supporting beans (CassandraConverter, CassandraMappingContext) are configured to configure Spring Data's Cassandra support.
#Configuration
class MyCassandraConfig {
private final Session mySession;
public MyCassandraConfig(Session mySession) {
this.mySession = mySession;
}
#Bean
public CassandraConverter cassandraConverter() {
MappingCassandraConverter mappingCassandraConverter = new MappingCassandraConverter(cassandraMapping());
mappingCassandraConverter.setCustomConversions(customConversions());
return mappingCassandraConverter;
}
#Bean
public CassandraMappingContext cassandraMapping() {
Cluster cluster = mySession.getCluster();
String keyspace = mySession.getLoggedKeyspace();
CassandraMappingContext mappingContext = new CassandraMappingContext(
new SimpleUserTypeResolver(cluster, keyspace), new SimpleTupleTypeFactory(cluster));
CustomConversions customConversions = customConversions();
mappingContext.setCustomConversions(customConversions);
mappingContext.setSimpleTypeHolder(customConversions.getSimpleTypeHolder());
return mappingContext;
}
#Bean
public CustomConversions customConversions() {
return new CassandraCustomConversions(Collections.emptyList());
}
#Bean
public CassandraTemplate cassandraTemplate() {
return new CassandraTemplate(mySession, cassandraConverter());
}
}

How to load balance leader using zookeeper and spring integration

Using spring integration and zookeeper, one can implement a leader to perform activities such as polling.
However how do we distribute the leader responsibility to all nodes in the cluster to load balance?
Given below code, once the application starts, I see that the same node is maintaining the leader role and fetching events. I want to distribute this activity to every node in the cluster to better load balance.
Is there any way I can schedule each node in the cluster to gain leadership and revoke in round robin manner?
#Bean
public LeaderInitiatorFactoryBean fooLeaderInitiator(CuratorFramework client) {
new LeaderInitiatorFactoryBean()
.setClient(client)
.setPath("/foofeed")
.setRole("foo");
}
#Bean
#InboundChannelAdapter(channel = "fooIncomingEvents", autoStartup = "false", poller = #Poller(fixedDelay = "5000"))
#Role("foo")
public FooTriggerMessageSource fooInboundChannelAdapter() {
new FooMessageSource("foo")
}
I could simulate load balancing using below code. Not sure if this is the correct approach. I could see fetching events log statement only from one node at a time in the cluster. This code yields leadership after performing gaining leadership and performing its job.
#Bean
public LeaderInitiator fooLeaderInitiator(CuratorFramework client,
FooPollingCandidate fooPollingCandidate) {
LeaderInitiator leader = new LeaderInitiator(client, fooPollingCandidate, zooKeeperNamespace)
leader.start()
leader
}
#Component
class FooPollingCandidate extends DefaultCandidate {
final Logger log = LoggerFactory.getLogger(this.getClass());
FooPollingCandidate() {
super("fooPoller", "foo")
}
#Override
void onGranted(Context ctx) {
log.debug("Leadership granted {}", ctx)
pullEvents()
ctx.yield();
}
#Override
void onRevoked(Context ctx) {
log.debug("Leadership revoked")
}
#Override
void yieldLeadership() {
log.debug("yielding Leadership")
}
//pull events and drop them on any channel needed
void pullEvents() {
log.debug("fetching events")
//simulate delay
sleep(5000)
}
}
What you are suggesting is an abuse of the leader election technology, which is intended for warm failover when the current leader fails, manually yielding leadership after each event is an anti-pattern
What you probably want is competing pollers where all pollers are active, but use a shared store to prevent duplicate processing.
For example, if you are polling a shared directory for files to process, you would use a FileSystemPersistentFileListFilter with a shared MetadataStore (such as the zookeeper implementation) to prevent multiple instances from processing the same file.
You can use the same technique (shared metadata store) for any polled message source.

How to write JavaRDD to marklogic database

I am evaluating spark with marklogic database. I have read a csv file, now i have a JavaRDD object which i have to dump into marklogic database.
SparkConf conf = new SparkConf().setAppName("org.sparkexample.Dataload").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> data = sc.textFile("/root/ml/workArea/data.csv");
SQLContext sqlContext = new SQLContext(sc);
JavaRDD<Record> rdd_records = data.map(
new Function<String, Record>() {
public Record call(String line) throws Exception {
String[] fields = line.split(",");
Record sd = new Record(fields[0], fields[1], fields[2], fields[3],fields[4]);
return sd;
}
});
This JavaRDD object i want to write to marklogic database.
Is there any spark api available for faster writing to the marklogic database ?
Lets say, If we could not write JavaRDD directly to marklogic then what is the currect approach to achieve this ?
Here is the code which i am using to write the JavaRDD data to marklogic database, let me know if it is wrong way to do that.
final DatabaseClient client = DatabaseClientFactory.newClient("localhost",8070, "MLTest");
final XMLDocumentManager docMgr = client.newXMLDocumentManager();
rdd_records.foreachPartition(new VoidFunction<Iterator<Record>>() {
public void call(Iterator<Record> partitionOfRecords) {
while (partitionOfRecords.hasNext()) {
Record record = partitionOfRecords.next();
System.out.println("partitionOfRecords - "+record.toString());
String docId = "/example/"+record.getID()+".xml";
JAXBContext context = JAXBContext.newInstance(Record.class);
JAXBHandle<Record> handle = new JAXBHandle<Record>(context);
handle.set(record);
docMgr.writeAs(docId, handle);
}
}
});
client.release();
I have used java client api to write the data, but i am getting below exception even though POJO class Record is implementing Serializable interface. Please let me know what could be the reason & how to solve that.
org.apache.spark.sparkexception task not Serializable .
The easiest way to get data into MarkLogic is via HTTP and the client REST API - specifically the /v1/documents endpoints - http://docs.marklogic.com/REST/client/management .
There are a variety of ways to optimize this, such as via a write set, but based on your question, I think the first thing to decide is - what kind of document do you want to write for each Record? Your example shows 5 columns in the CSV - typically, you'll write either a JSON or XML document with 5 fields/elements, each named based on the column index. So you'd need to write a little code to generate that JSON/XML, and then use whatever HTTP client you prefer (and one option is the MarkLogic Java Client API) to write that document to MarkLogic.
That addresses your question of how to write a JavaRDD to MarkLogic - but if your goal is to get data from a CSV into MarkLogic as fast as possible, then skip Spark and use mlcp - https://docs.marklogic.com/guide/mlcp/import#id_70366 - which involves zero coding.
Modified example from spark streaming guide, Here you will have to implement connection and writing logic specific to database.
public void send(JavaRDD<String> rdd) {
rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
#Override
public void call(Iterator<String> partitionOfRecords) {
// ConnectionPool is a static, lazily initialized pool of
Connection connection = ConnectionPool.getConnection();
while (partitionOfRecords.hasNext()) {
connection.send(partitionOfRecords.next());
}
ConnectionPool.returnConnection(connection); // return to the pool
// for future reuse
}
});
}
I'm wondering if you just need to make sure everything you access inside your VoidFunction that was instantiated outside it is serializable (see this page). DatabaseClient and XMLDocumentManager are of course not serializable, as they're connected resources. You're right, however, to not instantiate DatabaseClient inside your VoidFunction as that would be less efficient (though it would work). I don't know if the following idea would work with spark. But I'm guessing you could create a class that keeps hold of a singleton DatabaseClient instance:
public static class MLClient {
private static DatabaseClient singleton;
private MLClient() {}
public static DatabaseClient get(DatabaseClientFactory.Bean connectionInfo) {
if ( connectionInfo == null ) {
throw new IllegalArgumentException("connectionInfo cannot be null");
}
if ( singleton == null ) {
singleton = connectionInfo.newClient();
}
return singleton;
}
}
then you just create a serializable DatabaseClientFactory.Bean outside your VoidFunction so your auth info is still centralized
DatabaseClientFactory.Bean connectionInfo =
new DatabaseClientFactory.Bean();
connectionInfo.setHost("localhost");
connectionInfo.setPort(8000);
connectionInfo.setUser("admin");
connectionInfo.setPassword("admin");
connectionInfo.setAuthenticationValue("digest");
Then inside your VoidFunction you could get that singleton DatabaseClient and new XMLDocumentManager like so:
DatabaseClient client = MLClient.get(connectionInfo);
XMLDocumentManager docMgr = client.newXMLDocumentManager();

how to pass cassandra cluster connection from one bolt to another bolt

Storm Topology reads data from kafka and write into cassandra tables
In Storm i am creating cassandra cluster connection and session in prepare method.
cassandraCluster = Cluster.builder().withoutJMXReporting().withoutMetrics()
.addContactPoints(nodes)
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withReconnectionPolicy(new ExponentialReconnectionPolicy(100L,
TimeUnit.MINUTES.toMillis(5)))
.withLoadBalancingPolicy(
new TokenAwarePolicy(new RoundRobinPolicy()))
.build();
session = cassandraCluster.connect(keyspace);
In execute method i can process the tuple and save it in cassandra table
Suppose if i want to write data from single tuple into multiple table
Writing separate bolt for each table will be good choice. But i have to create cluster connection and session each table in each bolt.
But in this link single connection per cluster will be a good idea for performance
http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
Did any of you have any idea on creating cluster connection in one bolt and use this connection in other bolt?
It depends on how storm allocates the bolts and spouts to the workers. You can't assume that you can can share connections between bolts because they might be running in different workers (read: JVMs) or on different nodes entirely.
See my answer here: Mongo connection pooling for Storm topology
Might look something like this pseudocode:
public class CassandraBolt extends BaseRichBolt {
private static final long serialVersionUID = 1L;
private static Logger LOG = LoggerFactory.getLogger(CassandraBolt.class);
OutputCollector _collector;
// whatever your cassandra session is
// has to be transient because session is not serializable
protected transient CassandraSession _session;
#SuppressWarnings("rawtypes")
#Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
_collector = collector;
// maybe get properties from stormConf instead of hard coding them
cassandraCluster = Cluster.builder().withoutJMXReporting().withoutMetrics()
.addContactPoints(nodes)
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withReconnectionPolicy(new ExponentialReconnectionPolicy(100L,
TimeUnit.MINUTES.toMillis(5)))
.withLoadBalancingPolicy(
new TokenAwarePolicy(new RoundRobinPolicy()))
.build();
_session = cassandraCluster.connect(keyspace);
}
#Override
public void execute(Tuple input) {
try {
// use _session to talk to cassandra
} catch (Exception e) {
LOG.error("CassandraBolt error", e);
_collector.reportError(e);
}
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// TODO Auto-generated method stub
}
}

Global Static Dictionary Initialization from Database in Webapi

I want to Initialize a global Dictionary from Database in my web Api. Do i need to inject my DBContext in Global.Asax or Owin Startup. Any example would be much appreciated.
Any kind initialization purposes can be made in your custom defined OWIN Startup class class, like this:
using Microsoft.Owin;
using Microsoft.Owin.Security.OAuth;
using Owin;
using System;
[assembly: OwinStartup(typeof(WebAPIRestWithNest.Startup))]
namespace YourNamespace
{
public class Startup
{
public Dictionary<string, string> Table {get; private set;}
public void Configuration(IAppBuilder app)
{
// token generation
app.UseOAuthAuthorizationServer(new OAuthAuthorizationServerOptions
{
AllowInsecureHttp = false,
TokenEndpointPath = new PathString("/token"),
AccessTokenExpireTimeSpan = TimeSpan.FromHours(8),
Provider = new SimpleAuthorizationServerProvider()
});
// token consumption
app.UseOAuthBearerAuthentication(new OAuthBearerAuthenticationOptions());
app.UseWebApi(WebApiConfig.Register());
Table = ... Connect from DB and fill your table logic ...
}
}
}
After that you can use your Startup.Table property from your application.
In general, it is bad practice to access objects using static field in the asp.net applications because this may lead to bugs that are hardly detected and reproduced: especially this is true for non-immutable/not-thread-safe objects like Dictionary.
I assume you want to cache some DB data in memory to avoid excessive SQL queries. It is good idea to use standard asp.net caching for this purpose:
public IDictionary GetDict() {
var dict = HttpRuntime.Cache.Get("uniqueCacheKey") as IDictionary;
if (pvtData==null) {
dict = doLoadDictionaryFromDB(); // your code that loads data from DB
HttpRuntime.Cache.Add(cacheKey, dict,
null, Cache.NoAbsoluteExpiration,
new TimeSpan(0,5,0), // cache at least for 5 minutes after last access
CacheItemPriority.Normal, null);
}
return dict;
}
This approach allows you to select appropriate expiration policy (without the need to reinventing the wheel with static dictionary).
If you still want to use static dictionary, you can populate it on the application start (global.asax):
void Application_Start(object sender, EventArgs e)
{
// your code that initializes dictionary with data from DB
}

Resources