How to load balance leader using zookeeper and spring integration - spring-integration

Using spring integration and zookeeper, one can implement a leader to perform activities such as polling.
However how do we distribute the leader responsibility to all nodes in the cluster to load balance?
Given below code, once the application starts, I see that the same node is maintaining the leader role and fetching events. I want to distribute this activity to every node in the cluster to better load balance.
Is there any way I can schedule each node in the cluster to gain leadership and revoke in round robin manner?
#Bean
public LeaderInitiatorFactoryBean fooLeaderInitiator(CuratorFramework client) {
new LeaderInitiatorFactoryBean()
.setClient(client)
.setPath("/foofeed")
.setRole("foo");
}
#Bean
#InboundChannelAdapter(channel = "fooIncomingEvents", autoStartup = "false", poller = #Poller(fixedDelay = "5000"))
#Role("foo")
public FooTriggerMessageSource fooInboundChannelAdapter() {
new FooMessageSource("foo")
}

I could simulate load balancing using below code. Not sure if this is the correct approach. I could see fetching events log statement only from one node at a time in the cluster. This code yields leadership after performing gaining leadership and performing its job.
#Bean
public LeaderInitiator fooLeaderInitiator(CuratorFramework client,
FooPollingCandidate fooPollingCandidate) {
LeaderInitiator leader = new LeaderInitiator(client, fooPollingCandidate, zooKeeperNamespace)
leader.start()
leader
}
#Component
class FooPollingCandidate extends DefaultCandidate {
final Logger log = LoggerFactory.getLogger(this.getClass());
FooPollingCandidate() {
super("fooPoller", "foo")
}
#Override
void onGranted(Context ctx) {
log.debug("Leadership granted {}", ctx)
pullEvents()
ctx.yield();
}
#Override
void onRevoked(Context ctx) {
log.debug("Leadership revoked")
}
#Override
void yieldLeadership() {
log.debug("yielding Leadership")
}
//pull events and drop them on any channel needed
void pullEvents() {
log.debug("fetching events")
//simulate delay
sleep(5000)
}
}

What you are suggesting is an abuse of the leader election technology, which is intended for warm failover when the current leader fails, manually yielding leadership after each event is an anti-pattern
What you probably want is competing pollers where all pollers are active, but use a shared store to prevent duplicate processing.
For example, if you are polling a shared directory for files to process, you would use a FileSystemPersistentFileListFilter with a shared MetadataStore (such as the zookeeper implementation) to prevent multiple instances from processing the same file.
You can use the same technique (shared metadata store) for any polled message source.

Related

Spring data reactive cassandra: limit concurrent threads

I want to limit concurrent threads when some operation in cassandra is being performed. It could be limited with direct use of cassandraTemplate like this:
public void insert(List<MyEntity> entities, int maxConcurrentThreadsAllowed) {
Flux.fromIterable(entities)
.flatMap(this::insert, maxConcurrentThreadsAllowed)
.subscribe();
}
public Mono<MyEntity> insert(MyEntity e) {
return cassandraTemplate.insert(e);
}
Is it possible to achieve the same using reactive repositories? Some kind of configuration on cassandraSession level?

How to read messages in MQs using spark streaming,i.e ZeroMQ,RabbitMQ?

As the spark docs says,it support kafka as data streaming source.but I use ZeroMQ,And there is not a ZeroMQUtils.so how can I use it? and generally,how about other MQs. I am totally new to spark and spark streaming, so I am sorry if the question is stupid.Could anyone give me a solution.Thanks
BTW,I use python.
Update, I finally did it in java with a Custom Receiver. Below is my solution
public class ZeroMQReceiver extends Receiver<T> {
private static final ObjectMapper mapper = new ObjectMapper();
public ZeroMQReceiver() {
super(StorageLevel.MEMORY_AND_DISK_2());
}
#Override
public void onStart() {
// Start the thread that receives data over a connection
new Thread(this::receive).start();
}
#Override
public void onStop() {
// There is nothing much to do as the thread calling receive()
// is designed to stop by itself if isStopped() returns false
}
/** Create a socket connection and receive data until receiver is stopped */
private void receive() {
String message = null;
try {
ZMQ.Context context = ZMQ.context(1);
ZMQ.Socket subscriber = context.socket(ZMQ.SUB);
subscriber.connect("tcp://ip:port");
subscriber.subscribe("".getBytes());
// Until stopped or connection broken continue reading
while (!isStopped() && (message = subscriber.recvStr()) != null) {
List<T> results = mapper.readValue(message,
new TypeReference<List<T>>(){} );
for (T item : results) {
store(item);
}
}
// Restart in an attempt to connect again when server is active again
restart("Trying to connect again");
} catch(Throwable t) {
// restart if there is any other error
restart("Error receiving data", t);
}
}
}
I assume you are talking about Structured Streaming.
I am not familiar with ZeroMQ, but an important point in Spark Structured Streaming sources is replayability (in order to ensure fault tolerance), which, if I understand correctly, ZeroMQ doesn't deliver out-of-the-box.
A practical approach would be buffering the data either in Kafka and using the KafkaSource or as files in a (local FS/NFS, HDFS, S3) directory and using the FileSource for reading. Cf. Spark Docs. If you use the FileSource, make sure not to append anything to an existing file in the FileSource's input directory, but move them into the directory atomically.

how to pass cassandra cluster connection from one bolt to another bolt

Storm Topology reads data from kafka and write into cassandra tables
In Storm i am creating cassandra cluster connection and session in prepare method.
cassandraCluster = Cluster.builder().withoutJMXReporting().withoutMetrics()
.addContactPoints(nodes)
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withReconnectionPolicy(new ExponentialReconnectionPolicy(100L,
TimeUnit.MINUTES.toMillis(5)))
.withLoadBalancingPolicy(
new TokenAwarePolicy(new RoundRobinPolicy()))
.build();
session = cassandraCluster.connect(keyspace);
In execute method i can process the tuple and save it in cassandra table
Suppose if i want to write data from single tuple into multiple table
Writing separate bolt for each table will be good choice. But i have to create cluster connection and session each table in each bolt.
But in this link single connection per cluster will be a good idea for performance
http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
Did any of you have any idea on creating cluster connection in one bolt and use this connection in other bolt?
It depends on how storm allocates the bolts and spouts to the workers. You can't assume that you can can share connections between bolts because they might be running in different workers (read: JVMs) or on different nodes entirely.
See my answer here: Mongo connection pooling for Storm topology
Might look something like this pseudocode:
public class CassandraBolt extends BaseRichBolt {
private static final long serialVersionUID = 1L;
private static Logger LOG = LoggerFactory.getLogger(CassandraBolt.class);
OutputCollector _collector;
// whatever your cassandra session is
// has to be transient because session is not serializable
protected transient CassandraSession _session;
#SuppressWarnings("rawtypes")
#Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
_collector = collector;
// maybe get properties from stormConf instead of hard coding them
cassandraCluster = Cluster.builder().withoutJMXReporting().withoutMetrics()
.addContactPoints(nodes)
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withReconnectionPolicy(new ExponentialReconnectionPolicy(100L,
TimeUnit.MINUTES.toMillis(5)))
.withLoadBalancingPolicy(
new TokenAwarePolicy(new RoundRobinPolicy()))
.build();
_session = cassandraCluster.connect(keyspace);
}
#Override
public void execute(Tuple input) {
try {
// use _session to talk to cassandra
} catch (Exception e) {
LOG.error("CassandraBolt error", e);
_collector.reportError(e);
}
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// TODO Auto-generated method stub
}
}

Akka.net Passivation for DDD Aggregate Coordinator (repository)

I am using Akka.net and looking to implement a reactive equivalent of a 'DDD repository', from what I have seen from here http://qnalist.com/questions/5585484/ddd-eventsourcing-with-akka-persistence and https://gitter.im/petabridge/akka-bootcamp/archives/2015/06/25
I understand the idea of having a coordinator that keeps a number of actors in memory according to some live in-memory count or some amount of elapsed time.
As a summary (based on the links above) I am trying to:
Create an Aggregate coordinator (for each actor type) that returns aggregates on request.
Each aggregate uses Context.SetReceiveTimeout method to identify if it's not used for some period of time. If so, it will receive ReceiveTimeout message.
On receipt of timeout message, the Child will send a Passivate message back to coordinator (which in turn will then cause the coordinator to shut the child down).
Whilst the child is being shutdown, all messages to child are intercepted by the coordinator and buffered.
Once shutdown of child has been confirmed (in the coordinator), if there are buffered messages for that child it is recreated and all messages flushed through to the recreated child.
How would one intercept the messages that are being attempted to be sent to the child (step 4) and instead route them to the parent? Or in other words I want the child to say at the point of sending the Passivate message to also say "hey don't send me anymore messages, send them to my parent instead".
This would save me routing everything through the coordinator (or am i going about it in the wrong way and message intercept impossible to do, and should instead proxy everything through the parent)?
I have my message contracts:
public class GetActor
{
public readonly string Identity;
public GetActor(string identity)
{
Identity = identity;
}
}
public class GetActorReply
{
public readonly IActorRef ActorRef;
public GetActorReply(IActorRef actorRef)
{
ActorRef = actorRef;
}
}
public class Passivate // sent from child aggregate to parent coordinator
{
}
Coordinator class, which for every aggregate type there is a unique instance:
public class ActorLifetimeCoordinator r<T> : ReceiveActor where T : ActorBase
{
protected Dictionary<Identity,IActorRef> Actors = new Dictionary<Identity, IActorRef>();
protected Dictionary<Identity, List<object>> BufferedMsgs = new Dictionary<Identity, List<object>>();
public ActorLifetimeCoordinator()
{
Receive<GetActor>(message =>
{
var actor = GetActor(message.Identity);
Sender.Tell(new GetActorReply(actor), Self); // reply with the retrieved actor
});
Receive<Passivate>(message =>
{
var actorToUnload = Context.Sender;
var task = actorToUnload.GracefulStop(TimeSpan.FromSeconds(10));
// the time between the above and below lines, we need to intercept messages to the child that is being
// removed from memory - how to do this?
task.Wait(); // dont block thread, use pipeto instead?
});
}
protected IActorRef GetActor(string identity)
{
IActorRef value;
return Actors.TryGetValue(identity, out value)
? value : Context.System.ActorOf(Props.Create<T>(identity));
}
}
Aggregate base class from which all aggregates derive:
public abstract class AggregateRoot : ReceivePersistentActor
{
private readonly DispatchByReflectionStrategy _dispatchStrategy
= new DispatchByReflectionStrategy("When");
protected AggregateRoot(Identity identity)
{
PersistenceId = Context.Parent.Path.Name + "/" + Self.Path.Name + "/" + identity;
Recover((Action<IDomainEvent>)Dispatch);
Command<ReceiveTimeout>(message =>
{
Context.Parent.Tell(new Passivate());
});
Context.SetReceiveTimeout(TimeSpan.FromMinutes(5));
}
public override string PersistenceId { get; }
private void Dispatch(IDomainEvent domainEvent)
{
_dispatchStrategy.Dispatch(this, domainEvent);
}
protected void Emit(IDomainEvent domainEvent)
{
Persist(domainEvent, success =>
{
Dispatch(domainEvent);
});
}
}
Easiest (but not simplest) option here is to use Akka.Cluster.Sharding module, which covers areas of coordinator pattern with support for actors distribution and balancing across the cluster.
If you will choose that you don't need it, unfortunately you'll need to pass messages through coordinator - messages themselves need to provide identifier used to determine recipient. Otherwise you may end up sending messages to dead actor.

Netty OrderedMemoryAwareThreadPoolExecutor not creating multiple threads

I use Netty for a multithreaded TCP server and a single client persistent connection.
The client sends many binary messages (10000 in my use case) and is supposed to receive an answer for each message. I added an OrderedMemoryAwareThreadPoolExecutor to the pipeline to handle the execution of DB calls on multiple threads.
If I run a DB call in the method messageReceived() (or simulate it with Thread.currentThread().sleep(50)) then all events are handled by a single thread.
5 count of {main}
1 count of {New
10000 count of {pool-3-thread-4}
For a simple implementation of messageReceived() the server creates many executor threads as expected.
How should I configure the ExecutionHandler to get multiple threads executors for the business logic, please?
Here is my code:
public class MyServer {
public void run() {
OrderedMemoryAwareThreadPoolExecutor eventExecutor = new OrderedMemoryAwareThreadPoolExecutor(16, 1048576L, 1048576L, 1000, TimeUnit.MILLISECONDS, Executors.defaultThreadFactory());
ExecutionHandler executionHandler = new ExecutionHandler(eventExecutor);
bootstrap.setPipelineFactory(new ServerChannelPipelineFactory(executionHandler));
}
}
public class ServerChannelPipelineFactory implements ChannelPipelineFactory {
public ChannelPipeline getPipeline() throws Exception {
pipeline.addLast("encoder", new MyProtocolEncoder());
pipeline.addLast("decoder", new MyProtocolDecoder());
pipeline.addLast("executor", executionHandler);
pipeline.addLast("myHandler", new MyServerHandler(dataSource));
}
}
public class MyServerHandler extends SimpleChannelHandler {
public void messageReceived(ChannelHandlerContext ctx, final MessageEvent e) throws DBException {
// long running DB call simulation
try {
Thread.currentThread().sleep(50);
} catch (InterruptedException ex) {
}
// a simple message
final MyMessage answerMsg = new MyMessage();
if (e.getChannel().isWritable()) {
e.getChannel().write(answerMsg);
}
}
}
OrderedMemoryAwareThreadPoolExecutor guarantees that events from a single channel are processed in order. You can think of it as binding a channel to a specific thread in the pool and then processing all events on that thread - although it's a bit more complex than that, so don't depend on a channel always being processed by the same thread.
If you start up a second client you'll see it (most likely) being processed on another thread from the pool. If you really can process a single client's requests in parallel then you probably want MemoryAwareThreadPoolExecutor but be aware that this offers no guarantees on the order of channel events.

Resources