Azure EventProcessorHost and Worker role - azure

I was hoping for some guidance on how to use the EventProcessorHost with a worker role. Basically I am hoping to have the EventProcessorHost process the partitions in parallel and I'm wondering where I should go about placing this type of code within the worker role and if I'm missing anything key.
var manager = NamespaceManager.CreateFromConnectionString(connectionString);
var desc = manager.CreateEventHubIfNotExistsAsync(path).Result;
var client = Microsoft.ServiceBus.Messaging.EventHubClient.CreateFromConnectionString(connectionString, path);
var host = new EventProcessorHost(hostname, path, consumerGroup, connectionString, blobStorageConnectionString);
EventHubProcessorFactory<EventData> factory = new EventHubProcessorFactory<EventData>();
host.RegisterEventProcessorFactoryAsync(factory);
Everything I've read says the EventProcessorHost will divide up the partitions on its own, but is the above code sufficient to process all the partitions asynchronously?

Here's a simplified version of how we process our event hub from an Worker Role. We keep the instance in the mainWorker role and call the IEventProcessor to start processing it.
This way we can call it and close it down when the Worker Responds to shutdown events etc.
EDIT:
As for the processing it in parallel, the IEventProcessor class will just grab 10 more events from the event hub when it's finished processing the current one. Handling all the fancy partition leasing for you.
It's a synchronous workflow, When I scale to multiple worker roles I start to see the partitions get split between instances and it gets faster etc. You'd have to roll your own solution if you wanted it to process the event hub in a different way.
public class WorkerRole : RoleEntryPoint
{
private readonly CancellationTokenSource _cancellationTokenSource = new CancellationTokenSource();
private readonly ManualResetEvent _runCompleteEvent = new ManualResetEvent(false);
private EventProcessorHost _eventProcessorHost;
public override bool OnStart()
{
ThreadPool.SetMaxThreads(4096, 2048);
ServicePointManager.DefaultConnectionLimit = 500;
ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;
var eventClient = EventHubClient.CreateFromConnectionString("consumersConnectionString",
"eventHubName");
_eventProcessorHost = new EventProcessorHost(Dns.GetHostName(), eventClient.Path,
eventClient.GetDefaultConsumerGroup().GroupName,
"consumersConnectionString", "blobLeaseConnectionString");
return base.OnStart();
}
public override void Run()
{
try
{
RunAsync(this._cancellationTokenSource.Token).Wait();
}
finally
{
_runCompleteEvent.Set();
}
}
private async Task RunAsync(CancellationToken cancellationToken)
{
// starts processing here
await _eventProcessorHost.RegisterEventProcessorAsync<EventProcessor>();
while (!cancellationToken.IsCancellationRequested)
{
await Task.Delay(TimeSpan.FromMinutes(1));
}
}
public override void OnStop()
{
_eventProcessorHost.UnregisterEventProcessorAsync().Wait();
_cancellationTokenSource.Cancel();
_runCompleteEvent.WaitOne();
base.OnStop();
}
}
I have multiple processors for the specific partitions (you can guarantee FIFO this way), but you can implement you're own logic easily i.e. skip the use of a EventDataProcessor class and Dictionary lookup in my example and just implement some logic within the ProcessEventsAsync method.
public class EventProcessor : IEventProcessor
{
private readonly Dictionary<string, IEventDataProcessor> _eventDataProcessors;
public EventProcessor()
{
_eventDataProcessors = new Dictionary<string, IEventDataProcessor>
{
{"A", new EventDataProcessorA()},
{"B", new EventDataProcessorB()},
{"C", new EventDataProcessorC()}
}
}
public Task OpenAsync(PartitionContext context)
{
return Task.FromResult<object>(null);
}
public async Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
{
foreach(EventData eventData in messages)
{
// implement your own logic here, you could just process the data here, just remember that they will all be from the same partition in this block
try
{
IEventDataProcessor eventDataProcessor;
if(_eventDataProcessors.TryGetValue(eventData.PartitionKey, out eventDataProcessor))
{
await eventDataProcessor.ProcessMessage(eventData);
}
}
catch (Exception ex)
{
_//log exception
}
}
await context.CheckpointAsync();
}
public async Task CloseAsync(PartitionContext context, CloseReason reason)
{
if (reason == CloseReason.Shutdown)
await context.CheckpointAsync();
}
}
Example of one of our EventDataProcessors
public interface IEventDataProcessor
{
Task ProcessMessage(EventData eventData);
}
public class EventDataProcessorA : IEventDataProcessor
{
public async Task ProcessMessage(EventData eventData)
{
// Do Something specific with data from Partition "A"
}
}
public class EventDataProcessorB : IEventDataProcessor
{
public async Task ProcessMessage(EventData eventData)
{
// Do Something specific with data from Partition "B"
}
}
Hope this helps, it's been rock solid for us so far and scales easily to multiple instances

Related

Azure Service Bus - Subscribe multiple topics inside the same worker/hosted service

we have a scenario where we must integrate requests with the same destination system, which exposes its operations with REST APIs (provided by a third party, most likely not Azure). So this is a scenario where n messages are mapped in n actions on the same destination system. There is no multicast or broadcast.
So we are considering Service Bus to achieve this, based on previous experiences on other use cases, and taking advantage of dead letter mechanism among other things.
We need to integrate 6 or 7 different actions with the 3rd party. So on Service Bus we can achieve this by creating 1 topic per action, and this is important because the data that travels on the message is different from action to action.
But we are facing a situation when consuming topics. We are able to have an hosted service in Azure (App Service) that listens on a specific topic and does its stuff.
But since we are trying to listen on several topics, we would like to avoid writing and deploying multiple app services, we would like (if possible) to have a single app service where we 'trigger' each ServiceBusProcessor (one per topic) and even though they all rely on the limits of the app service itself, each processor is independent and is listening on its topic in parallel and processing.
I'll share a code sample below of our hosted service, but we found out two options, we would like to have opinions:
Option 1: we send all messages to the same topic, then by using filters we determine which is the appropriate action. This would make code simple, but it would put all messages on the same 'line' which would make the topic an all purpose topic, which seems wrong
Option 2: based on our sample below, which represents a single hosted service which listens on a single topic, we would break it and inject a List of listeners that implement the same interface, and each one of them would be working independently on its topic and its message. We are not sure if this is feasible and if it works properly, because the app service would have to handle multiple ServiceBusProcessors side by side.
We'd like to know if we are missing some option, or if there is any other better way to achieve this. Hope I've explained it well.
I send below a sample of our hosted service. Thanks a lot.
public class MyService : IHostedService, IMyService
{
private ILogger<MyService> _logger;
public MyService(ILogger<MyService> logger)
{
_logger = logger;
}
public Task StartAsync(CancellationToken cancellationToken)
{
ServiceBusClient client = new ServiceBusClient("connectionString");
ServiceBusProcessor processor = client.CreateProcessor("topicName", "subscriptionName");
processor.ProcessMessageAsync += ProcessMessageAsync;
processor.ProcessErrorAsync += ProcessErrorAsync;
_logger.LogInformation("Listener initialized");
return Task.CompletedTask;
}
public Task StopAsync(CancellationToken cancellationToken)
{
return Task.CompletedTask;
}
public async Task ProcessMessageAsync(ProcessMessageEventArgs args)
{
var body = args.Message.Body;
// Do stuff with this body...
await args.CompleteMessageAsync(args.Message);
}
public Task ProcessErrorAsync(ProcessErrorEventArgs args)
{
_logger.LogError($"Error ocurred: {args.Exception.ToString()} with message: {args.Exception.Message}");
return Task.CompletedTask;
}
}
Then at ConfigureServices:
services.AddHostedService<MyService>();
So, following option 2, the sample above would be transformed in the following, considering 2 listeners:
public interface IMyService
{
}
public interface IMyListener
{
Task Initialize();
Task ProcessMessageAsync(ProcessMessageEventArgs args);
Task ProcessErrorAsync(ProcessErrorEventArgs args);
}
public class BaseListener
{
private string _connectionString;
private string _topicName;
private string _subscriptionName;
private ILogger<BaseListener> _logger;
public BaseListener(ILogger<BaseListener> logger, string connectionString, string topicName, string subscriptionName)
{
this._connectionString = connectionString;
this._topicName = topicName;
this._subscriptionName = subscriptionName;
this._logger = logger;
}
public Task Initialize()
{
ServiceBusClient client = new ServiceBusClient(this._connectionString);
ServiceBusProcessor processor = client.CreateProcessor(this._topicName, this._subscriptionName);
processor.ProcessMessageAsync += ProcessMessageAsync;
processor.ProcessErrorAsync += ProcessErrorAsync;
_logger.LogInformation("Listener initialized");
return Task.CompletedTask;
}
public async Task ProcessMessageAsync(ProcessMessageEventArgs args)
{
var body = args.Message.Body;
// Do stuff with this body...
await args.CompleteMessageAsync(args.Message);
}
public Task ProcessErrorAsync(ProcessErrorEventArgs args)
{
return Task.CompletedTask;
}
}
public class MyListener1: BaseListener, IMyListener
{
public MyListener1(ILogger<MyListener1> logger) : base(logger, "connectionString", "topic1", "subscription")
{
}
}
public class MyListener2 : BaseListener, IMyListener
{
public MyListener2(ILogger<MyListener2> logger) : base(logger, "connectionString", "topic2", "subscription")
{
}
}
public class MyService : IHostedService, IMyService
{
private ILogger<MyService> _logger;
private IEnumerable<IMyListener> _listeners;
public MyService(ILogger<MyService> logger, IEnumerable<IMyListener> listeners)
{
_logger = logger;
_listeners = listeners;
}
public Task StartAsync(CancellationToken cancellationToken)
{
foreach(var listener in this._listeners)
{
listener.Initialize();
}
_logger.LogInformation("Listeners initialized");
return Task.CompletedTask;
}
public Task StopAsync(CancellationToken cancellationToken)
{
return Task.CompletedTask;
}
}
And on ConfigureServices:
services.AddHostedService<MyService>();
services.AddSingleton<IMyListener, MyListener1>();
services.AddSingleton<IMyListener, MyListener2>();

How do RSocket issue lease to multiple clients?

I could create a server lease to a single client as follows:
#Slf4j
public class LeaseServer {
private static final String SERVER_TAG = "server";
public static void main(String[] args) throws InterruptedException {
// Queue for incoming messages represented as Flux
// Imagine that every fireAndForget that is pushed is processed by a worker
int queueCapacity = 50;
BlockingQueue<String> messagesQueue = new ArrayBlockingQueue<>(queueCapacity);
// emulating a worker that process data from the queue
Thread workerThread =
new Thread(
() -> {
try {
while (!Thread.currentThread().isInterrupted()) {
String message = messagesQueue.take();
System.out.println("consume message:" + message);
Thread.sleep(100000); // emulating processing
}
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
});
workerThread.start();
CloseableChannel server = getFireAndForgetServer(messagesQueue, workerThread);
TimeUnit.MINUTES.sleep(10);
server.dispose();
}
private static CloseableChannel getFireAndForgetServer(BlockingQueue<String> messagesQueue, Thread workerThread) {
CloseableChannel server =
RSocketServer.create((setup, sendingSocket) ->
Mono.just(new RSocket() {
#Override
public Mono<Void> fireAndForget(Payload payload) {
// add element. if overflows errors and terminates execution
// specifically to show that lease can limit rate of fnf requests in
// that example
try {
if (!messagesQueue.offer(payload.getDataUtf8())) {
System.out.println("Queue has been overflowed. Terminating execution");
sendingSocket.dispose();
workerThread.interrupt();
}
} finally {
payload.release();
}
return Mono.empty();
}
}))
.lease(() -> Leases.create().sender(new LeaseCalculator(SERVER_TAG, messagesQueue)))
.bindNow(TcpServerTransport.create("localhost", 7000));
return server;
}
}
But how do I issue a lease to multiple clients connected to that server?
Otherwise my queue will be written multiple times by multiple clients, resulting in an overflow of the service.
I can't find the details in the public documents and materials.
Your help was very much appreciated.

How can I parallel consumption kafka with spark streaming? I set concurrentJobs but something error [duplicate]

The doc of kafka give an approach about with following describes:
One Consumer Per Thread:A simple option is to give each thread its own consumer > instance.
My code:
public class KafkaConsumerRunner implements Runnable {
private final AtomicBoolean closed = new AtomicBoolean(false);
private final CloudKafkaConsumer consumer;
private final String topicName;
public KafkaConsumerRunner(CloudKafkaConsumer consumer, String topicName) {
this.consumer = consumer;
this.topicName = topicName;
}
#Override
public void run() {
try {
this.consumer.subscribe(topicName);
ConsumerRecords<String, String> records;
while (!closed.get()) {
synchronized (consumer) {
records = consumer.poll(100);
}
for (ConsumerRecord<String, String> tmp : records) {
System.out.println(tmp.value());
}
}
} catch (WakeupException e) {
// Ignore exception if closing
System.out.println(e);
//if (!closed.get()) throw e;
}
}
// Shutdown hook which can be called from a separate thread
public void shutdown() {
closed.set(true);
consumer.wakeup();
}
public static void main(String[] args) {
CloudKafkaConsumer kafkaConsumer = KafkaConsumerBuilder.builder()
.withBootstrapServers("172.31.1.159:9092")
.withGroupId("test")
.build();
ExecutorService executorService = Executors.newFixedThreadPool(5);
executorService.execute(new KafkaConsumerRunner(kafkaConsumer, "log"));
executorService.execute(new KafkaConsumerRunner(kafkaConsumer, "log.info"));
executorService.shutdown();
}
}
but it doesn't work and throws an exception:
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
Furthermore, I read the source of Flink (an open source platform for distributed stream and batch data processing). Flink using multi-thread consumer is similar to mine.
long pollTimeout = Long.parseLong(flinkKafkaConsumer.properties.getProperty(KEY_POLL_TIMEOUT, Long.toString(DEFAULT_POLL_TIMEOUT)));
pollLoop: while (running) {
ConsumerRecords<byte[], byte[]> records;
//noinspection SynchronizeOnNonFinalField
synchronized (flinkKafkaConsumer.consumer) {
try {
records = flinkKafkaConsumer.consumer.poll(pollTimeout);
} catch (WakeupException we) {
if (running) {
throw we;
}
// leave loop
continue;
}
}
flink code of mutli-thread
What's wrong?
Kafka consumer is not thread safe. As you pointed out in your question, the document stated that
A simple option is to give each thread its own consumer instance
But in your code, you have the same consumer instance wrapped by different KafkaConsumerRunner instances. Thus multiple threads are accessing the same consumer instance. The kafka documentation clearly stated
The Kafka consumer is NOT thread-safe. All network I/O happens in the
thread of the application making the call. It is the responsibility of
the user to ensure that multi-threaded access is properly
synchronized. Un-synchronized access will result in
ConcurrentModificationException.
That's exactly the exception you received.
It is throwing the exception on your call to subscribe. this.consumer.subscribe(topicName);
Move that block into a synchronized block like this:
#Override
public void run() {
try {
synchronized (consumer) {
this.consumer.subscribe(topicName);
}
ConsumerRecords<String, String> records;
while (!closed.get()) {
synchronized (consumer) {
records = consumer.poll(100);
}
for (ConsumerRecord<String, String> tmp : records) {
System.out.println(tmp.value());
}
}
} catch (WakeupException e) {
// Ignore exception if closing
System.out.println(e);
//if (!closed.get()) throw e;
}
}
Maybe is not your case, but if you are mergin processing of data of serveral topics, then you can read data from multiple topics with the same consumer. If not, then is preferable to create separate jobs consuming each topic.

Azure Service Bus SessionHandler issue with partitioned queue

I got into an issue with IMessageSessionAsyncHandlerFactory where new instances of IMessageSessionAsyncHandler are not created when the volume of writing goes to 0 and then up to a normal level.
To be more precise, I'm using SessionHandlerOptions with a value of 500 for MaxConcurrentSessions. This allows reading at a speed of more than 1k msg/s.
The queue I'm reading from is a partitioned queue.
The volume of messages in the queue is rather constant, but from time to time it gets down to 0. When the volume gets back to the normal level, the SessionFactory is not spawning any handlers so I'm not able to read messages anymore. It's like the sessions were not correctly recycled or are held into a sort of continuous waiting.
Here is the code for the factory registering:
private void RegisterHandler()
{
var sessionHandlerOptions = new SessionHandlerOptions
{
AutoRenewTimeout = TimeSpan.FromMinutes(1),
MessageWaitTimeout = TimeSpan.FromSeconds(1),
MaxConcurrentSessions = 500
};
_queueClient.RegisterSessionHandlerFactoryAsync(new SessionHandlerFactory(_callback), sessionHandlerOptions);
}
The factory class:
public class SessionHandlerFactory : IMessageSessionAsyncHandlerFactory
{
private readonly Action<BrokeredMessage> _callback;
public SessionHandlerFactory(Action<BrokeredMessage> callback)
{
_callback = callback;
}
public IMessageSessionAsyncHandler CreateInstance(MessageSession session, BrokeredMessage message)
{
return new SessionHandler(session.SessionId, _callback);
}
public void DisposeInstance(IMessageSessionAsyncHandler handler)
{
var disposable = handler as IDisposable;
disposable?.Dispose();
}
}
And the handler:
public class SessionHandler : MessageSessionAsyncHandler
{
private readonly Action<BrokeredMessage> _callback;
public SessionHandler(string sessionId, Action<BrokeredMessage> callback)
{
SessionId = sessionId;
_callback = callback;
}
public string SessionId { get; }
protected override async Task OnMessageAsync(MessageSession session, BrokeredMessage message)
{
try
{
_callback(message);
}
catch (Exception ex)
{
Logger.Error(...);
}
}
I can see that the session handlers are closed and that the factories are disposed when the writing/reading is at a normal level. However, once the queue empties, there's no way new session handlers are created. Is there a policy for allocating session IDs that forbids reallocating the same sessions after a period of inactivity?
Edit 1:
I'm adding two pictures to illustrate the behavior:
When the writer is stopped and restarted, the running reader is not able to read as much as before.
The number of sessions created after that moment is also much lower than before:
The volume of messages in the queue is rather constant, but from time to time it gets down to 0. When the volume gets back to the normal level, the SessionFactory is not spawning any handlers so I'm not able to read messages anymore. It's like the sessions were not correctly recycled or are held into a sort of continuous waiting.
When using IMessageSessionHandlerFactory to control how the IMessageSessionAsyncHandler instances are created, you could try to log the creation and destruction for all of your IMessageSessionAsyncHandler instances.
Based on your code, I created a console application to this issue on my side. Here is my code snippet for initializing queue client and handling messages:
InitializeReceiver
static void InitializeReceiver(string connectionString, string queuePath)
{
_queueClient = QueueClient.CreateFromConnectionString(connectionString, queuePath, ReceiveMode.PeekLock);
var sessionHandlerOptions = new SessionHandlerOptions
{
AutoRenewTimeout = TimeSpan.FromMinutes(1),
MessageWaitTimeout = TimeSpan.FromSeconds(5),
MaxConcurrentSessions = 500
};
_queueClient.RegisterSessionHandlerFactoryAsync(new SessionHandlerFactory(OnMessageHandler), sessionHandlerOptions);
}
OnMessageHandler
static void OnMessageHandler(BrokeredMessage message)
{
var body = message.GetBody<Stream>();
dynamic recipeStep = JsonConvert.DeserializeObject(new StreamReader(body, true).ReadToEnd());
lock (Console.Out)
{
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine(
"Message received: \n\tSessionId = {0}, \n\tMessageId = {1}, \n\tSequenceNumber = {2}," +
"\n\tContent: [ title = {3} ]",
message.SessionId,
message.MessageId,
message.SequenceNumber,
recipeStep.title);
Console.ResetColor();
}
Task.Delay(TimeSpan.FromSeconds(3)).Wait();
message.Complete();
}
Per my test, the SessionHandler could work as expected when the volume of messages in the queue from normal to zero and from zero to normal for some time as follows:
I also tried to leverage QueueClient.RegisterSessionHandlerAsync to test this issue and it works as well. Additionally, I found this git sample about Service Bus Sessions, you could refer to it.

Threading multiple async calls

Part of my Silverlight application requires data from three service requests. Up until now I've been chaining the requests so as one completes the other starts... until the end of the chain where I do what I need to do with the data.
Now, I know thats not the best method(!). I've been looking at AutoResetEvent (link to MSDN example) to thread and then synchronize the results but cannot seem to get this to work with async service calls.
Does anyone have any reason to doubt this method or should this work? Code samples gratefully received!
Take a look at this example:
Will fire Completed event and print 'done' to Debug Output once both services returned.
Key thing is that waiting for AutoResetEvents happens in background thread.
public partial class MainPage : UserControl
{
public MainPage()
{
InitializeComponent();
Completed += (s, a) => { Debug.WriteLine("done"); };
wrk.DoWork += (s, a) =>
{
Start();
};
wrk.RunWorkerAsync();
}
public event EventHandler Completed;
private void Start()
{
auto1.WaitOne();
auto2.WaitOne();
Completed(this, EventArgs.Empty);
}
public AutoResetEvent auto1 = new AutoResetEvent(false);
public AutoResetEvent auto2 = new AutoResetEvent(false);
BackgroundWorker wrk = new BackgroundWorker();
private void Button_Click(object sender, RoutedEventArgs e)
{
ServiceReference1.Service1Client clien = new SilverlightAsyncTest.ServiceReference1.Service1Client();
clien.DoWorkCompleted += new EventHandler<SilverlightAsyncTest.ServiceReference1.DoWorkCompletedEventArgs>(clien_DoWorkCompleted);
clien.DoWork2Completed += new EventHandler<SilverlightAsyncTest.ServiceReference1.DoWork2CompletedEventArgs>(clien_DoWork2Completed);
clien.DoWorkAsync();
clien.DoWork2Async();
}
void clien_DoWork2Completed(object sender, SilverlightAsyncTest.ServiceReference1.DoWork2CompletedEventArgs e)
{
Debug.WriteLine("2");
auto1.Set();
}
void clien_DoWorkCompleted(object sender, SilverlightAsyncTest.ServiceReference1.DoWorkCompletedEventArgs e)
{
Debug.WriteLine("1");
auto2.Set();
}
}
It could be done using the WaitHandle in the IAsyncResult returned by each async method.
The code is simple. In Silverlight I just do 10 service calls that will add an item to a ListBox. I'll wait until all the service calls end to add another message to the list (this has to run in a different thread to avoid blocking the UI). Also note that adding items to the list have to be done through the Dispatcher since they will modify the UI. There're a bunch of lamdas, but it's easy to follow.
public MainPage()
{
InitializeComponent();
var results = new ObservableCollection<string>();
var asyncResults = new List<IAsyncResult>();
resultsList.ItemsSource = results;
var service = new Service1Client() as Service1;
1.To(10).Do(i=>
asyncResults.Add(service.BeginDoWork(ar =>
Dispatcher.BeginInvoke(() => results.Add(String.Format("Call {0} finished: {1}", i, service.EndDoWork(ar)))),
null))
);
new Thread(()=>
{
asyncResults.ForEach(a => a.AsyncWaitHandle.WaitOne());
Dispatcher.BeginInvoke(() => results.Add("Everything finished"));
}).Start();
}
Just to help with the testing, this is the service
public class Service1
{
private const int maxMilliSecs = 500;
private const int minMillisSecs = 100;
[OperationContract]
public int DoWork()
{
int millisSecsToWait = new Random().Next(maxMilliSecs - minMillisSecs) + minMillisSecs;
Thread.Sleep(millisSecsToWait);
return millisSecsToWait;
}
}

Resources