About checkpoint strategy in event hub processor - azure

I use event hubs processor host to receive and process the events from event hubs. For better performance, I call checkpoint every 3 minutes instead of every time when receiving the events:
public async Task ProcessEventAsync(context, messages)
{
foreach (var eventData in messages)
{
// do something
}
if (checkpointStopWatth.Elapsed > TimeSpan.FromMinutes(3);
{
await context.CheckpointAsync();
}
}
But the problem is, that there might be some events never being checkpoint if not new events sending to event hubs, as the ProcessEventAsync won't be invoked if no new messages.
Any suggestions to make sure all processed events being checkpoint, but still checkpoint every several mins?
Update: Per Sreeram's suggestion, I updated the code as below:
public async Task ProcessEventAsync(context, messages)
{
foreach (var eventData in messages)
{
// do something
}
this.lastProcessedEventsCount += messages.Count();
if (this.checkpointStopWatth.Elapsed > TimeSpan.FromMinutes(3);
{
this.checkpointStopWatch.Restart();
if (this.lastProcessedEventsCount > 0)
{
await context.CheckpointAsync();
this.lastProcessedEventsCount = 0;
}
}
}

Great case - you are covering!
You could experience loss of event checkpoints (and as a result event replay) in the below 2 cases:
when you have sparse data flow (for ex: a batch of messages every 5 mins and your checkpoint interval is 3 mins) and EventProcessorHost instance closes for some reason - you could see 2 min of EventData - re-processing. To handle that case,
Keep track of the lastProcessedEvent after completing IEventProcessor.onEvents/IEventProcessor.ProcessEventsAsync & checkpoint when you get notified on close - IEventProcessor.onClose/IEventProcessor.CloseAsync.
There might just be a case when - there are no more events to a specific EventHubs partition. In this case, you would never see the last event being checkpointed - with your Checkpointing strategy. However, this is uncommon, when you have continuous flow of EventData and you are not sending to specific EventHubs partition (EventHubClient.send(EventData_Without_PartitionKey)). If you think - you could run into this situation, use the:
EventProcessorOptions.setInvokeProcessorAfterReceiveTimeout(true); // in java or
EventProcessorOptions.InvokeProcessorAfterReceiveTimeout = true; // in C#
flag to wake up the processEventsAsync every so often. Then, keep track of, LastProcessedEventData and LastCheckpointedEventData and make a judgement whether to checkpoint when no Events are received, based on EventData.SequenceNumber property on those events.

Related

Event Hub Consumer in Service Fabric

I'm trying to get a service fabric to consistently pull messages from an azure event hub. I seem to have everything wired up but notice that my consumer just stops pulling events.
I have a hub with a couple thousand events I've pushed to it. Configured the hub with 1 partition and have my service fabric service with also only 1 partition to ease debugging.
Service starts, creates the EventHubClient, from there uses it to create a PartitionReceiver. The receiver is passed to an "EventLoop" that enters an "infinite" while that calls receiver.ReceiveAsync. The code for the EventLoop is below.
What I am observing is the first time through the loop I almost always get 1 message. Second time through I get somewhere between 103 and 200ish messages. After that, I get no messages. Also seems like if I restart the service, I get the same messages again - but that's because when I restart the service I'm having it start back at the beginning of the stream.
Would expect this to keep running until my 2000 messages were consumed and then it would wait for me (polling ocassionally).
Is there something specific I need to do with the Azure.Messaging.EventHubs 5.3.0 package to make it keep pulling events?
//Here is how I am creating the EventHubClient:
var connectionString = "something secret";
var connectionStringBuilder = new EventHubsConnectionStringBuilder(connectionString)
{
EntityPath = "NameOfMyEventHub"
};
try
{
m_eventHubClient = EventHubClient.Create(connectionStringBuilder);
}
//Here is how I am getting the partition receiver
var receiver = m_eventHubClient.CreateReceiver("$Default", m_partitionId, EventPosition.FromStart());
//The event loop which the receiver is passed to
private async Task EventLoop(PartitionReceiver receiver)
{
m_started = true;
while (m_keepRunning)
{
var events = await receiver.ReceiveAsync(m_options.BatchSize, TimeSpan.FromSeconds(5));
if (events != null) //First 2/3 times events aren't null. After that, always null and I know there are more in the partition/
{
var eventsArray = events as EventData[] ?? events.ToArray();
m_state.NumProcessedSinceLastSave += eventsArray.Count();
foreach (var evt in eventsArray)
{
//Process the event
await m_options.Processor.ProcessMessageAsync(evt, null);
string lastOffset = evt.SystemProperties.Offset;
if (m_state.NumProcessedSinceLastSave >= m_options.BatchSize)
{
m_state.Offset = lastOffset;
m_state.NumProcessedSinceLastSave = 0;
await m_state.SaveAsync();
}
}
}
}
m_started = false;
}
**EDIT, a question was asked on the number of partitions. The event hub has a single partition and the SF service also has a single one.
Intending to use service fabric state to keep track of my offset into the hub, but that's not the concern for now.
Partition listeners are created for each partition. I get the partitions like this:
public async Task StartAsync()
{
// slice the pie according to distribution
// this partition can get one or more assigned Event Hub Partition ids
string[] eventHubPartitionIds = (await m_eventHubClient.GetRuntimeInformationAsync()).PartitionIds;
string[] resolvedEventHubPartitionIds = m_options.ResolveAssignedEventHubPartitions(eventHubPartitionIds);
foreach (var resolvedPartition in resolvedEventHubPartitionIds)
{
var partitionReceiver = new EventHubListenerPartitionReceiver(m_eventHubClient, resolvedPartition, m_options);
await partitionReceiver.StartAsync();
m_partitionReceivers.Add(partitionReceiver);
}
}
When the partitionListener.StartAsync is called, it actually creates the PartitionListener, like this (it's actually a bit more than this, but the branch taken is this one:
m_eventHubClient.CreateReceiver(m_options.EventHubConsumerGroupName, m_partitionId, EventPosition.FromStart());
Thanks for any tips.
Will
How many partition do you have? I can't see in your code how you make sure you read all partitions in the default consumer group.
Any specific reason why you are using PartitionReceiver instead of using an EventProcessorHost?
To me, SF seems like a perfect fit for using the event processor host. I see there is already a SF integrated solution that uses stateful services for checkpointing.

Queue triggered based function app not getting completed

We are using queue trigger based function app on premium plan where messages contains some details like azure subscriptions name. Based on which for each subscription we do many api calls specially to azure storage accounts(around 400 to 500). Since 'list' api call to storage account is limited to 100 call/5min, we get 429 response error on 101th call. To mitigate this we have applied exponential retry logic(tried both our own or polly library) which call after certain delay of time. This works for some subscription but fails for many where the retry logic does not try after first trying(we kept 3 retries with 60 sec delay). Even while monitoring the function app through live metrics we observed that sometimes cpu usage of some function instance goes to zero(although we do some operation like logging or use for loop in delay operation so that the function can be alive) which leads to killing of that particular function instance and pushing the message back to queue and start the process again with a fresh instance.
Note that since many subscription are processed in parallel, function app automatically scale up as required. Also since we are using premium plan one VM is always on state. So killing of any instance(which call around 400 to 500 storage api call for any particular subscription) is weird since in our delay the thread sleep time is only 10 sec for around 6,12,18(Time_delay) iteration. The below delay function is used in our retry logic code.
private void Delay(int Time_delay, string requestUri, int retryCount)
{
for (int i = 0; i < Time_delay; i++)
{
_logger.LogWarning($"Sleep initiated for id: {requestUri.ToString()}, RetryCount: {retryCount} CurrentTimeDelay: {Time_delay}");
Thread.Sleep(10000);
_logger.LogWarning($"Sleep completed for id: {requestUri.ToString()}, RetryCount: {retryCount} CurrentTimeDelay: {Time_delay}");
}
}
Note** Function app is not throwing any other exception other than dependency of 429 error response.
Would it be possible for you to requeue instead of using Thread.Sleep? You can use initial visibility delay when requeuing:
public class Function1
{
[FunctionName(nameof(TryDoWork))]
public static async Task TryDoWork(
[QueueTrigger("some-queue")] SomeItem item,
[Queue("some-queue")] CloudQueue queue)
{
var result = _SomeService.SomeWork(item);
if (result == 429)
{
item.Retries++;
var json = JsonConvert.SerializeObject(item);
var message = new CloudQueueMessage(json);
var delay = TimeSpan.FromSeconds(item.Retries);
await queue.AddMessageAsync(message, null, delay, null, null);
}
}
}
It might be that the sleeping is causing some wonky function app behavior. I think I remember reading some issues pertaining to the usage of Thread.Sleep, but I can't find it right now.
Also, you might want to add some sort of handling of messages that end up retrying more than 3 times (or however many you think is reasonable).

IEventProcessor processes message two times

Hi I am using Event hub for ingesting data at the frequency of 1 second.
I am continuously pushing simulated data from console application to event hub and then storing into the SQL data base.
Now its been more than 5 days and I found every day some times my receiver process data two times that why i got duplicate records into the database.
Since it happen only once or twice in a day so I am not even able to trace.
Can any one faced such situation so far ?
Or is it possible then host can process same messages twice ?
Or is it an issue of async behavior of receiver ?
Looking forward for the help....
Code snippet :
public class SimpleEventProcessor : IEventProcessor
{
Stopwatch checkpointStopWatch;
async Task IEventProcessor.CloseAsync(PartitionContext context, CloseReason reason)
{
Console.WriteLine("Processor Shutting Down. Partition '{0}', Reason: '{1}'.", context.Lease.PartitionId, reason);
if (reason == CloseReason.Shutdown)
{
await context.CheckpointAsync();
}
}
Task IEventProcessor.OpenAsync(PartitionContext context)
{
Console.WriteLine("SimpleEventProcessor initialized. Partition: '{0}', Offset: '{1}'", context.Lease.PartitionId, context.Lease.Offset);
this.checkpointStopWatch = new Stopwatch();
this.checkpointStopWatch.Start();
return Task.FromResult<object>(null);
}
async Task IEventProcessor.ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
{
foreach (EventData eventData in messages)
{
string data = Encoding.UTF8.GetString(eventData.GetBytes());
// store data into SQL database / database call.
}
// Call checkpoint every 5 minutes, so that worker can resume processing from 5 minutes back if it restarts.
if (this.checkpointStopWatch.Elapsed > TimeSpan.FromMinutes(0))
{
await context.CheckpointAsync();
this.checkpointStopWatch.Restart();
}
if (messages.Count() > 0)
await context.CheckpointAsync();
}
}
Event Hub guarantees at least once delivery:
It has the following characteristics:
low latency
capable of receiving and processing millions of events per second
at least once delivery
So you can expect this to happen.
Also take in account the situation that checkpointing just has occurred, then some more message (lets call them A and B) are processed and then the process fails. The next time the reading process is started again after the failure message consumption will start at the last checkpointed message, so in other words, message A and B will be processed again.

Event Hub receiver does not read all messages

I implemented 2 simple services in Service Fabric, which communicate over Event Hub and I encounter very strange behavior.
The listener service reads the messages using PartitionReciever with ReceiveAsync method. It reads the messages always from the start of the partition, but even though the maxMessageCount parameter is set to very high number, which definitely exceeds the number of messages in the partition, it reads only "random" amount of messages but never the full list. It always starts to read correctly from the beginning of the partition but it almost never reads the full list of messages which should be present there...
Did I miss something in documentation and this is normal behavior, or am I right, that this is very strange bahviour?
A code snippet of my receiver service:
PartitionReceiver receiver = eventHubClient.CreateReceiver(PartitionReceiver.DefaultConsumerGroupName, Convert.ToString(partition), PartitionReceiver.StartOfStream);
ServiceEventSource.Current.Write("RecieveStart");
IEnumerable<EventData> ehEvents = null;
int i = 0;
do
{
try
{
ehEvents = await receiver.ReceiveAsync(1000);
break;
}
catch (OperationCanceledException)
{
if (i == NUM_OF_RETRIES-1)
{
await eventHubClient.CloseAsync();
StatusCode(500);
}
}
i++;
} while (i < NUM_OF_RETRIES);

Getting Data from EventHub is delayed

I have an EventHub configured in Azure, also a consumer group for reading the data. It was working fine for some days. Suddenly, I see there is a delay in incoming data(around 3 days). I use Windows Service to consume data in my server. I have around 500 incoming messages per minute. Can anyone help me out to figure this out ?
It might be that you are processing them items too slow. Therefore the work to be done grows and you will lag behind.
To get some insight in where you are in the event stream you can use code like this:
private void LogProgressRecord(PartitionContext context)
{
if (namespaceManager == null)
return;
var currentSeqNo = context.Lease.SequenceNumber;
var lastSeqNo = namespaceManager.GetEventHubPartition(context.EventHubPath, context.ConsumerGroupName, context.Lease.PartitionId).EndSequenceNumber;
var delta = lastSeqNo - currentSeqNo;
logWriter.Write(
$"Last processed seqnr for partition {context.Lease.PartitionId}: {currentSeqNo} of {lastSeqNo} in consumergroup '{context.ConsumerGroupName}' (lag: {delta})",
EventLevel.Informational);
}
the namespaceManager is build like this:
namespaceManager = NamespaceManager.CreateFromConnectionString("Endpoint=sb://xxx.servicebus.windows.net/;SharedAccessKeyName=yyy;SharedAccessKey=zzz");
I call this logging method in the CloseAsync method:
public Task CloseAsync(PartitionContext context, CloseReason reason)
{
LogProgressRecord(context);
return Task.CompletedTask;
}
logWriter is just some logging class I have used to write info to blob storage.
It now outputs messages like
Last processed seqnr for partition 3: 32780931 of 32823804 in consumergroup 'telemetry' (lag: 42873)
so when the lag is very high you could be processing events that have occurred a long time ago. In that case you need to scale up/out your processor.
If you notice a lag you should measure how long it takes to process a given number of item. You can then try to optimize performance and see whether it improves. We did it like:
public async Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> events)
{
try
{
stopwatch.Restart();
// process items here
stopwatch.Stop();
await CheckPointAsync(context);
logWriter.Write(
$"Processed {events.Count()} events in {stopwatch.ElapsedMilliseconds}ms using partition {context.Lease.PartitionId} in consumergroup {context.ConsumerGroupName}.",
EventLevel.Informational);
}
}

Resources