Understanding ConsumerGroup behaviour in Azure EventHub - azure

I am writing a pub/sub implementation which uses Azure EventHub as the underlying event ingestion service. In my application, the publishers will publish events to a particular EventHub partition and the consumers who are subscribed to that particular partition will receive events. Usually a consumer will be assigned to a unique EventHub ConsumerGroup, and in some cases there can be multiple consumer assignments to the same ConsumerGroup.
Let's say I have two consumers (consumer-1, consumer-2) in the same ConsumerGroup (consumer-group-1) who are subscribed to events of a particular EventHub partition (partition '0' of event-hub-1).
When we send an event to the partition '0' of 'event-hub-1', how would the message delivery happens ?
Will both consumers (consumer-1, consumer-2) get the same message ?
Or will the ConsumerGroup load-balance the messages among the consumers as in traditional Kafka and only one consumer gets the message ?
Sample Code: https://github.com/ballerina-platform/ballerina-standard-library/issues/3483#issuecomment-1272824977
Note:
Application is written in ballerina language which internally uses Kafka Java Client

A consumergroup is a "group of consumers" as the name suggests. Each consumergroup gets a copy of the message and one consumer of that consumergroup out of many receives that message. So, regarding your scenario, either consumer-1 or consumer-2 will get the message since they are in the same consumergroup.

Kafka Consumer supports two models to consume messages from a topic.
Join a ConsumerGroup and subscribe to the topic.
If the consumer know from which partition to read events, assign itself to the relevant partition of the topic.
Both models are mutually exclusive and given a Kafka Consumer, it should use only one model to consumer messages.
When a Kafka Consumer joins a particular ConsumerGroup, the consumer will be assigned to a set of partitions from the topic to which it has subscribed. Two consumers from the same ConsumerGroup are not assigned to the same partition(s) of a given topic. As per the Kafka documentation this is either handled by the zookeeper or the Kafka cluster itself.
But when we assign partitions manually to a consumer, the consumer will not use the consumer's group management functionality.
In the above scenario I have manually assigned partitions to the consumers and hence both consumers will not use the group management functionality. So, the both consumers will get all the messages sent to that particular partition. This is properly explained in the EventHub documentation.
For more information about the inner workings of the Kafka Consumer we could refer to Standalone Consumer: Why and How to Use a Consumer Without a Group section of Kafka Definitive Guide - Chapter 04.

Related

When does it make sense for multiple consumers from the same consumer group to read from the same partition?

I'm learning event hubs concept in Azure event hubs and kafka.
A consumer group can have 1 or more consumers. And one or more consumers from a consumer group can read 1 or more partitions.
1 consumer from the consumer group should ideally consume from 1 Partition.
I'm trying to understand in what scenario does it make sense for multiple consumers from the same consumer group to read from the same partition?
I'm trying to understand in what scenario does it make sense for multiple consumers from the same consumer group to read from the same partition?
I can think of none. If you use the official SDKs by using an event processor for example, it isn't even possible since each consumer locks a single partition, see the docs.
1 consumer from the consumer group should ideally consume from 1 Partition.
That is correct. When using the SDK a consumer in a consumer group takes a lock on the partition it is reading from. If multiple consumers in the same consumer group want to read from the same partition they will have to compete to get the lock, which is not efficient.

Azure Event Hub - Partitions usecase question

I'm new to Azure Event Hubs and I'm having a hard time understanding the Partitions.
I have the following scenario:
1 Event Hub Namespace
1 actual Event Hub
2 Partitions in the Event Hub
2 Consumer groups
1 Event Producer
2 Event Consumers, one per Consumer group
The Event Producer sends out 10 events to the Event hub. The events gets distributed to the partitions with a round-robin mechanism. So the Event hub looks like this:
Partition 1: [0] [2] [5] [6] [8]
Partition 2: [1] [3] [4] [7] [9]
When the Event Consumers start reading, each consumer would end up with only a part of the events, like so:
Consumer 1: Gets events 0,2,5,6,8
Consumer 2: Gets events 1,3,4,7,9
Is it true that a Consumer group can only access a subset of the Partitions?
My assumption is that the Event Hub architecture supports broadcasting of events to multiple consumers. And that every consumer wants all the events.
But it seems to me that Event Hub isn't designed to have all consumers get all the events, but I don't understand why that would be useful.
Can anyone help me understand Partitions?
Each Event Hubs partition is a persistent stream of events that is available to all consumers, regardless of which consumer group they are associated with. Any consumer can read from any partition at any point in the event stream.
Partitions are used to help scale resources to support a greater degree of concurrency and increase throughput for the Event Hub. Generally speaking, the more partitions that are in use, the more concurrent operations the Event Hub can handle. More information can be found in the Event Hubs overview.
My assumption is that the Event Hub architecture supports broadcasting of events to multiple consumers.
Not quite; consumers are responsible for pulling events from the partitions of an Event Hub, they are not pushed to consumers. Any consumer with permissions can connect to a partition and read independently. Events are not removed once read, they exist in the partition until their age exceeds the retention period.
But it seems to me that Event Hub isn't designed to have all consumers get all the events
That is not correct. Event Hubs exposes the events for any consumer wishing to read them. Using a client like the EventProcessorClient from the Event Hubs SDK allows an application to consume from all partitions without having to manage each partition consumer individually.

Azure Service Bus ordered processing of message

If for a Azure Service Bus topic, there is single subscription with some filter. There is a microservice A which has created a SubscriptionClient for the Subscription with concurrency of 1 for reading messages. Also if there are 2 such replicas of this service A, and say, there are 3 messages in an unpartitioned service bus topic inserted to topic at t1, t2 and t3 time.
t1 < t2 < t3
Is there a possibility that t2 message can get delivered by service bus to Replica-2 before t1 gets delivered to Replica-1 ?
If not, what is scaling strategy for service bus topics while processing subscriptions and adding replicas of consuming microservice.
Note: Compared to kafka, it ensures that a message for 1 partition is delivered to only one replica and to one thread that is listening to that partition and thus ordered processing of message is guaranteed. But not sure w.r.t service bus topic like Azure Service bus, if multiple replicas are listening to same subscription with different subscriptionClients, can they receive/process out-of-order messages ?
If you want to enable ordered message processing with Azure Service Bus, you have to use Sessions.
You can use a message's SessionId as an equivalent of the partitionId you may use in Kafka. This way, you can still scale your consumers, limited by the number of distinct SessionId values at any given time.
Message sessions. Implement workflows that require message ordering or message deferral.
Sessions provide concurrent de-multiplexing of interleaved message streams while preserving and guaranteeing ordered delivery.
When multiple concurrent receivers pull from the queue, the messages belonging to a particular session are dispatched to the specific receiver that currently holds the lock for that session. With that operation, an interleaved message stream in one queue or subscription is cleanly de-multiplexed to different receivers and those receivers can also live on different client machines, since the lock management happens service-side, inside Service Bus.
Service Bus Partitions spread out the load across multiple nodes, and don't provide any ordering guarantees.
Partitioning means that the overall throughput of a partitioned entity is no longer limited by the performance of a single message broker or messaging store. In addition, a temporary outage of a messaging store does not render a partitioned queue or topic unavailable.

IoT Hub Routing Messages to Only One Partition of Event Hub

I have a data pipeline set up in Azure where I send messages to an IoTHub which then routes those messages to an EventHub. When I read from the EventHub using the standard EventProcessorHost method, I find that only one of the partitions is being read from. I assume that only one partition is actually having messages routed to it. I have not specified a partition key anywhere and expect that the messages would be routed to all of the partitions of the event hub using round robin (as per the documentation at https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-programming-guide).
How can I configure my setup to route messages to all partitions of the event hub?
Like I said in the comment:
Is it possible you are only receiving data from one device? IoT Hub does automatic partitioning based on the deviceId, so the partition affinity might be the cause.

Kafka - can the consumer group ids be configured with spring-integration simple consumers?

Is it possible to configure kafka consumer group-ids with spring-integration simple consumers - the message driven channel option?
If not, which would be the best alternative if one topic is needed to function both as queue (inside consumers of the same group id) and as a classical topic (between consumers with different group ids)

Resources