Azure eventhub : Offset vs Sequence number - azure

I see this question being asked on a lot of forums but none of them are solving my confusion.
This documentation seems to suggest that both offset and sequence number are unique within a partition.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.servicebus.messaging.eventdata?view=azure-dotnet
It is clearly understood that sequence number is an integer which increments sequentially :
https://social.msdn.microsoft.com/Forums/azure/en-US/acc25820-a28a-4da4-95ce-4139aac9bc44/sequence-number-vs-offset?forum=azureiothub#:~:text=SequenceNumber%20is%20the%20logical%20sequence,the%20Event%20Hub%20partition%20stream.&text=The%20sequence%20number%20can%20be,authority%20and%20not%20by%20clients.
But what of the offset ? Is it unique only within a partition , or across all partitions within a consumer group ? If it is the former condition, why have two different variables ?

An offset is a relative position within the partition's event stream. In the current Event Hubs implementation, it represents the number of bytes from the beginning of the partition to the first byte in a given event.
Within the context of a partition, the offset is unique. The same offset value may appear in other partitions - it should not be treated as globally unique across the Event Hub.
If it is the former condition, why have two different variables?
The offset is guaranteed only to uniquely identify an event within the partition. It is not safe to reason about the value or how it changes from event-to-event.
Sequence Number, on the other hand, follows a predictable pattern where numbering is contiguous and unique within the scope of a partition. Because of this, it is safe to use in calculations like "if I want to rewind by 5 events, I'll take the current sequence number and subtract 5."

Offset refers to consumer groups but not to partitions.
Offset is a small storage container created for every consumer group, every consumer group has its own read offset, you can have several consumer groups and every group will read the event hub data at its own peace. In other words, the offset container holds a small blob with data regards the read checkpoint that will advance every time you will execute the context.CheckpointAsync(). If you will delete the container created by the consumer group, then the group will begin to read the data from the beginning,
List<EventProcessorHost> eventProcessorHosts = new List<EventProcessorHost>();
var eventProcessorHost = new EventProcessorHost(
EventHubName,
PartitionReceiver.DefaultConsumerGroupName,
EventHubConnectionString,
StorageConnectionString,
StorageContainerName);
eventProcessorHosts.Add(eventProcessorHost);
eventProcessorHosts[0].RegisterEventProcessorAsync<SimpleEventProcessor>();
...
public Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
{
foreach (var eventData in messages)
{
var data = Encoding.UTF8.GetString(eventData.Body.Array, eventData.Body.Offset, eventData.Body.Count);
Console.WriteLine($"messages count: {messages.Count()} Message received. Partition: '{context.PartitionId}', Data: '{data}', thread:{Thread.CurrentThread.ManagedThreadId}");
}
// Writes the current offset and sequenceNumber to the checkpoint store via the
// checkpoint manager.
return context.CheckpointAsync();
}
Check the storage container that was passed to EventProcessorHost constructor.

Related

Azure Function with Event Hub trigger receives weird amount of events

I have an Event Hub and Azure Function connected to it. With small amounts of data all works well, but when I tested it with 10 000 events, I got very peculiar results.
For test purposes I send into Event hub numbers from 0 to 9999 and log data in application insights and in service bus. For the first test I see in Azure that hub got exactly 10 000 events, but service bus and AI got all messages between 0 and 4500, and every second message after 4500 (so it lost about 30%). In second test, I got all messages from 0 to 9999, but every second message between 3500 and 3200 was duplicated. I would like to get all messages once, what did I do wrong?
public async Task Run([EventHubTrigger("%EventHubName%", Connection = "AzureEventHubConnectionString")] EventData[] events, ILogger log)
{
int id = _random.Next(1, 100000);
_context.Log.TraceInfo("Started. Count: " + events.Length + ". " + id); //AI log
foreach (var message in events)
{
//log with ASB
var mess = new Message();
mess.Body = message.EventBody.ToArray();
await queueClient.SendAsync(mess);
}
_context.Log.TraceInfo("Completed. " + id); //AI log
}
By using EventData[] events, you are reading events from hub in batch mode, thats why you see X events processing at a time then next seconds you process next batch.
Instead of EventData[] use simply EventData.
When you send events to hub check that all events are sent with the same partition key if you want try batch processing otherwise they can be splitted in several partitions depending on TU (throughput units), PU (Processing Units) and CU (Capacity Units).
Egress: Up to 2 MB per second or 4096 events per second.
Refer to this document.
Throughput limits for Basic, Standard, Premium..:
There are a couple of things likely happening, though I can only speculate with the limited context that we have. Knowing more about the testing methodology, tier of your Event Hubs namespace, and the number of partitions in your Event Hub would help.
The first thing to be aware of is that the timing between when an event is published and when it is available in a partition to be read is non-deterministic. When a publish operation completes, the Event Hubs broker has acknowledged receipt of the events and taken responsibility for ensuring they are persisted to multiple replicas and made available in a specific partition. However, it is not a guarantee that the event can immediately be read.
Depending on how you sent the events, the broker may also need to route events from a gateway by performing a round-robin or applying a hash algorithm. If you're looking to optimize the time from publish to availability, taking ownership of partition distribution and publishing directly to a partition can help, as can ensuring that you're publishing with the right degree of concurrency for your host environment and scenario.
With respect to duplication, it's important to be aware that Event Hubs offers an "at least once" guarantee; your consuming application should expect some duplicates and needs to be able to handle them in the way that is appropriate for your application scenario.
Azure Functions uses a set of event processors in its infrastructure to read events. The processors collaborate with one another to share work and distribute the responsibility for partitions between them. Because collaboration takes place using storage as an intermediary to synchronize, there is an overlap of partition ownership when instances are scaled up or scaled down, during which time the potential for duplication is increased.
Functions makes the decision to scale based on the number of events that it sees waiting in partitions to be read. In the case of your test, if your publication pattern increases rapidly and Functions sees "the event backlog" grow to the point that it feels the need to scale by multiple instances, you'll see more duplication than you otherwise would for a period of 10-30 seconds until partition ownership normalizes. To mitigate this, using an approach of gradually increasing speed of publishing over a 1-2 minute period can help to smooth out the scaling and reduce (but not eliminate) duplication.

Stream Analytics query hits size limit

I'm new to Azure Stream Analytics. I have an Event hub as input source and now I'm trying to execute a simple query on this stream. An example query is like this:
SELECT
count(*)
INTO [output1]
FROM
[input1] TIMESTAMP BY Time
GROUP BY TumblingWindow(second, 10)
So I want to count the events which arrived within a certain time frame.
When executing this query, I always get the following error:
Request exceeded maximum allowed size limit
As I already narrowed down the checked time window and I'm certain that the amount of events within this time frame is not very big (at most several 100)
I'm not sure how to avoid this error.
Do you have a hint?
Thanks!
Request exceeded maximum allowed size limit
This error(i believe it should be more explicit) indicates that you violated the azure stream analytic resource and object limits.
It's not just about quantity, it's also about size.Please check your source inputs' size or try to reduce the windowsize and test again.
1.Does the record size of the source query mean that one event can only have 64 KB or does this parameter mean 64 K events?
It means the size of one event should below 64KB.
Is there a possibility to use Stream Analytics to select only certain
subfields of the event or is the only way to reduce the event size
before it is sent to the event hub?
As i know,ASA only collects data for processing it,so the size is all depends on the source side and your query sql. Since you need to use COUNT, i'm afraid that you have to do something on the eventhub side.Please refer to my thoughts:
Use Event Hub Azure Function Trigger, when an event streams into event hub,trigger the function and pick only partial key-values and save it into another event hub namespace.(Just in order to reduce the size of source event) Anyway you only need to COUNT records, i think it works for you.

LMAX Distruptor Partition and join batch

So currently I have a Executor implementation with blocking queue and the implementation specific is like, I have list of items per request and I divide them into partitions each partition is then computed and finally they are joined to have the final list.
How do I go about implementing it in LMAX? I see that once I have partition and push them into RingBuffer, each partition is treated as separate item so I am custom joining them.
something like,
ConcurrentHashMap<Long, LongAdder> map = new ConcurrentHashMap<>();
#Override
public List<SomeTask> score(final List<SomeTask> tasks) {
long id = tasks.get(0).id;
map.put(id, new LongAdder());
for (SomeTask task : tasks) {
producer.onData(task);
}
while (map.get(id).intValue() != tasks.size()) ;
map.remove(id);
return tasks;
}
Is there a clean way to do it ? I looked at https://github.com/LMAX-Exchange/disruptor/tree/master/src/test/java/com/lmax/disruptor/example and KeyedBatching specifically but they seem to batch and execute on one thread.
Currently for me each partition takes up around 200ms and I wanted to parallel execute them.
Any help is greatly appreciated.
I think you should take a look at the worker-pool options and followed by a final event processor which re-combines the shards.

Hazelcast Jet stream processing end window emission

I've stomped across an interesting observation trying to cross check results of aggregation for my stream processing. I've created a test case when pre-defined data set was fed into a journaled map and aggregation was supposed to populate 1 result as it was inline with window size/sliding and amount of data with pre-determined timestamps. However result was never published. Window was not emitted however few accumulate/combine operations where executed. It works differently with real data, but result of aggregation is always 'behind' the amount of data drawn from the source. I guess this has something to do with Watermarks? How can I make sure in my test case that it doesn't wait for more data to come. Will allowed lateness help?
First, I'll refer you to the two sections in the manual which describe how watermarks work and also talk about the concept of stream skew:
http://docs.hazelcast.org/docs/jet/0.6.1/manual/#unbounded-stream-processing
http://docs.hazelcast.org/docs/jet/0.6.1/manual/#stream-skew
The concept of "current time" in Jet only advances as long as there's events with advancing timestamps. There's typically several factors at play here:
Allowed lateness: This defines your lag per partition, assuming you are using a partitioned source like Kafka. This describes the tolerable degree of out of orderness in terms of timestamps in a single partition. If allowed lateness is 2 sec, the window will only close when you have received an event at N + 2 seconds across all input partitions.
Stream skew: This can happen when for example you have 10 Kafka partitions but only 3 are producing any events. As Jet coalesces watermarks from all partitions, this will cause the stream to wait until the other 7 partitions have some data. There's a timeout after which these partitions are considered idle, but this is by default 60 sec and currently not configurable in the pipeline API. So in this case you won't have any output until these partitions are marked as idle.
When using test data, it's quite common to have very low volume of events and many partitions, which can make it a challenge to advance the time correctly.
Points in Can Gencer's answer are valid. But for test, you can also use a batch source, such as Sources.list. By adding timestamps to a BatchStage you convert it to a StreamStage, on which you can do window aggregation. The aggregate transform will emit pending windows at the end of the batch.
JetInstance inst = Jet.newJetInstance();
IListJet<TimestampedEntry<String, Integer>> list = inst.getList("data");
list.add(new TimestampedEntry(1, "a", 1));
list.add(new TimestampedEntry(1, "b", 2));
list.add(new TimestampedEntry(1, "a", 3));
list.add(new TimestampedEntry(1, "b", 4));
Pipeline p = Pipeline.create();
p.drawFrom(Sources.<TimestampedEntry<String, Integer>>list("data"))
.addTimestamps(TimestampedEntry::getTimestamp, 0)
.groupingKey(TimestampedEntry::getKey)
.window(tumbling(1))
.aggregate(AggregateOperations.summingLong(TimestampedEntry::getValue))
.drainTo(Sinks.logger());
inst.newJob(p).join();
inst.shutdown();
The above code prints:
TimestampedEntry{ts=01:00:00.002, key='a', value='4'}
TimestampedEntry{ts=01:00:00.002, key='b', value='6'}
Remember to keep your data in the list ordered by time as we use allowedLag=0.
Answer is valid for Jet 0.6.1.

Spring Kafka Listening to all topics and adjusting partition offsets

Based on the documentation at spring-kafka, I am using Annotation based #KafkaListener to configure my consumer.
What I see is that -
Unless I specify the offset to zero, up on start, Kafka consumer picks up the future messages and not the existing ones. (I understand this is an expected result because I am not specifying the offset to what I want)
I see an option in the documentation to specify a topic + partition combination and along with that an offset of zero, but if I do this - I have to explicitly specify which topic I want my consumer to listen to.
Using approach 2 above, this is how my consumer looks now -
#KafkaListener(id = "{group.id}",
topicPartitions = {
#TopicPartition(topic = "${kafka.topic.name}",
partitionOffsets = #PartitionOffset(partition = "0", initialOffset = "0"))
},
containerFactory = "kafkaListenerContainerFactory")
public void listen(#Payload String payload,
Acknowledgment ack) throws InterruptedException, IOException {
logger.debug("This is what we received in the Kafka Consumer = " + payload);
idService.process(payload);
ack.acknowledge();
}
While I understand that there is an option to specify the "topicPattern" wild card or a "topics" list as a part of the annotation configuration, I don't see a place where I can provide the offset value to start from zero for the topics / topic patterns listed. Is there a way to do a combination of both? Please advise.
When using topics and topicPatterns (rather than explicitly declaring the partitions), Kafka decides which consumer instance will get which partitions.
Kafka will allocate the partitions and the initial offset will be the last committed for that group id. You cannot currently change that offset but we are considering adding a seek function.
If you always want to start at the first available offset, use a unique group id (e.g. UUID.randomUUID().toString()) and set
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
Since Kafka will have no existing offset for that group id it will use that property to determine where to start.
You can also use MANUAL ack mode and never ack, which will effectively do the same thing.

Resources