Azure Event Hub - Partitions usecase question - azure

I'm new to Azure Event Hubs and I'm having a hard time understanding the Partitions.
I have the following scenario:
1 Event Hub Namespace
1 actual Event Hub
2 Partitions in the Event Hub
2 Consumer groups
1 Event Producer
2 Event Consumers, one per Consumer group
The Event Producer sends out 10 events to the Event hub. The events gets distributed to the partitions with a round-robin mechanism. So the Event hub looks like this:
Partition 1: [0] [2] [5] [6] [8]
Partition 2: [1] [3] [4] [7] [9]
When the Event Consumers start reading, each consumer would end up with only a part of the events, like so:
Consumer 1: Gets events 0,2,5,6,8
Consumer 2: Gets events 1,3,4,7,9
Is it true that a Consumer group can only access a subset of the Partitions?
My assumption is that the Event Hub architecture supports broadcasting of events to multiple consumers. And that every consumer wants all the events.
But it seems to me that Event Hub isn't designed to have all consumers get all the events, but I don't understand why that would be useful.
Can anyone help me understand Partitions?

Each Event Hubs partition is a persistent stream of events that is available to all consumers, regardless of which consumer group they are associated with. Any consumer can read from any partition at any point in the event stream.
Partitions are used to help scale resources to support a greater degree of concurrency and increase throughput for the Event Hub. Generally speaking, the more partitions that are in use, the more concurrent operations the Event Hub can handle. More information can be found in the Event Hubs overview.
My assumption is that the Event Hub architecture supports broadcasting of events to multiple consumers.
Not quite; consumers are responsible for pulling events from the partitions of an Event Hub, they are not pushed to consumers. Any consumer with permissions can connect to a partition and read independently. Events are not removed once read, they exist in the partition until their age exceeds the retention period.
But it seems to me that Event Hub isn't designed to have all consumers get all the events
That is not correct. Event Hubs exposes the events for any consumer wishing to read them. Using a client like the EventProcessorClient from the Event Hubs SDK allows an application to consume from all partitions without having to manage each partition consumer individually.

Related

Azure Service Bus ordered processing of message

If for a Azure Service Bus topic, there is single subscription with some filter. There is a microservice A which has created a SubscriptionClient for the Subscription with concurrency of 1 for reading messages. Also if there are 2 such replicas of this service A, and say, there are 3 messages in an unpartitioned service bus topic inserted to topic at t1, t2 and t3 time.
t1 < t2 < t3
Is there a possibility that t2 message can get delivered by service bus to Replica-2 before t1 gets delivered to Replica-1 ?
If not, what is scaling strategy for service bus topics while processing subscriptions and adding replicas of consuming microservice.
Note: Compared to kafka, it ensures that a message for 1 partition is delivered to only one replica and to one thread that is listening to that partition and thus ordered processing of message is guaranteed. But not sure w.r.t service bus topic like Azure Service bus, if multiple replicas are listening to same subscription with different subscriptionClients, can they receive/process out-of-order messages ?
If you want to enable ordered message processing with Azure Service Bus, you have to use Sessions.
You can use a message's SessionId as an equivalent of the partitionId you may use in Kafka. This way, you can still scale your consumers, limited by the number of distinct SessionId values at any given time.
Message sessions. Implement workflows that require message ordering or message deferral.
Sessions provide concurrent de-multiplexing of interleaved message streams while preserving and guaranteeing ordered delivery.
When multiple concurrent receivers pull from the queue, the messages belonging to a particular session are dispatched to the specific receiver that currently holds the lock for that session. With that operation, an interleaved message stream in one queue or subscription is cleanly de-multiplexed to different receivers and those receivers can also live on different client machines, since the lock management happens service-side, inside Service Bus.
Service Bus Partitions spread out the load across multiple nodes, and don't provide any ordering guarantees.
Partitioning means that the overall throughput of a partitioned entity is no longer limited by the performance of a single message broker or messaging store. In addition, a temporary outage of a messaging store does not render a partitioned queue or topic unavailable.

Is it possible to configure Azure Event Hub to retain message if Azure Function fails processing it?

I have an Azure Function that listens for messages in an Event Hub. The function takes messages from the Event Hub, processes them, and passes them to another Hub. At this point the messages are removed from the Event Hub.
If the Function fails processing the message for whatever reason, is it possible to tell the Event Hub to not remove the message, and to try to deliver it to the Function again at some point in the future?
I understand that the Event Hubs have a maximum retention period of 7 days. I would like for the Event Hub & Function to continue trying during that period.
Readers never "remove" messages from an Event Hub. They are different from Service Bus Topics and Queues in this.
Event Hubs rely on clients to maintain their own bookmarks for each partition. The high-level API EventProcessorHost does this for you:
The EventProcessorHost class also implements an Azure storage-based
checkpointing mechanism. This mechanism stores the offset on a per
partition basis, so that each consumer can determine what the last
checkpoint from the previous consumer was.
But the lower-level EventHubReceiver exposes the StartingSequenceNumber property for you to control this explicitly.
However a desire for guaranteed delivery strongly suggests that you may want to copy the messages requiring guaranteed delivery from an Event Hub to a Service Bus Topic or Queue or perhaps an Azure SQL Database table for processing.

Can I create thousands of event hubs in one Azure Event Hubs namespace

I need to send messages from few thousands of devices to central hub and be able to get live stream of messages for specific device from that hub. So far, Azure Event Hubs seems to the cheapest option in terms of messages count. Event Hub namespace allows to create distinct event hubs in it.
Can I create few thousands of such hubs, one per device?
Is it a good idea? What could be potential drawbacks?
How price is calculated - per namespace or per event hub? (I think per namespace, but I cannot find this info)
If per namespace, does it mean that purchased throughput units are shared among all event hubs? If yes, will single event hub namespace with 1000 event hubs will consume same amount of resources as single event hub namespace with single event hub but which receives messages from 1000 devices?
No, you are limited to 10 Event Hubs per namespace.
Event Hub per device is not the recommended usage. Usual scenario is to put all messages from all devices to the same Event Hub, and then you can separate them again in the processing side. This will scale much better.
Event Hubs quotas
Azure Event hub is an event ingestion service to which you can send events from the event publishers.The events will be available in the event hub partitions to which different consumer groups subscribe.The partitions can be designed to accept only specific kind of events.
You can also create multiple event hubs within an event hub namespace. You can create a maximum of 10 event hubs per Event hub namespace, 32 event hub partitions within an event hub and 20 consumer groups per event hub. So, You can use event hub partitions to separate the events from the event publishers and consume the events in the processing side very easily.
The pricing is at event hub level and not at namespace level. Based on the tier you choose you will be provided with variable features like:
Basic tier:
You can have only 1 consumer group
Standard and Dedicated tier:
You can create up to 20 consumer groups.
For example,
If you choose Basic or Standard tier and region as East US, You will be charged $0.028 per million events for ingress and $0.015 per unit/hour for throughput.
If you choose Dedicated tier, you will be charged 6.849$ per hour which includes the unlimited ingress and throughput charges, but the minimum hours charged are 4hrs.
The main advantage of using dedicated tier is the message retention period is 7 days whereas in basic and standard tier it is just 1 day, and the message size is up to 1 MB whereas in basic and standard tier it is just 256 KB.
Refer https://azure.microsoft.com/en-in/pricing/details/event-hubs/.

purpose of Azure iot hub device-to-cloud partitions

When creating a new Azure IOT Hub you are asked how many device-to-cloud partitions you need. You can select between 2-32 partitions for standard tiers.
I understand that the SKU and number of units determine the maximum daily quota of messages that you can send to IOT Hub. And that it is recommended to shard your devices into multiple IOT hubs to smooth traffic bursts. However, device-to-cloud partitions need clarification.
1>> What is the purpose of those device-to-cloud partitions under a single IOT hub?
2>> How are we supposed to take advantage of those IOT Hub device-to-cloud partitions? 
Thanks.
1>> What is the purpose of those device-to-cloud partitions under a
single IOT hub?
Partition property is setting for Event Hub-compatible messaging endpoint(messages/events) built in Azure IoT Hub. From here we can see "partitions" is a concept belongs to Event Hub.
Event Hubs is designed to allow a single partition reader per consumer group. A single partition within a consumer group cannot have more than 5 concurrent readers connected at any time. More partitions enables you to have more concurrent readers processing your data, improving your aggregate throughput.
Ref: Built-in endpoint: messages/events and How many partitions do I need?
2>> How are we supposed to take advantage of those IOT Hub
device-to-cloud partitions?
Event Hubs has two primary models for event consumption: direct receivers and higher-level abstractions, such as EventProcessorHost. Direct receivers are responsible for their own coordination of access to partitions within a consumer group.
Ref:Event consumers.
More information about the partitioning model of Azure Event Hubs are here.

What is Partition Id,Offset,Host Name in Azure Event Hub Receiver?

I am working in azure event hub. I have some doubts.
What is Partition Id in Azure event hub receiver? Is this Id is same as partition Key in Azure event hub Publisher?
What is Offset? ,What the use of it in azure event hub consumer?
Can I consume the message with out using consumer group?
Can I consume the message with single receiver?
What is the use of blob in event hub consumer? I want only view the message what ever I sent.
This article Event Hubs Overview should answer your questions in detail, but to summarize:
When you create a new Event Hub in the portal, you specify how many partitions you need. The Publisher hashes the partition key of an event to determine which partition to send the event to. An event hub receiver receives events from those partitions.
An event hub consumer tracks which events it has received by by using an offset into each partition. By changing the offset you can, for example, re-read events from a partition.
You must have at least one consumer group (there is a default one). Each consumer group has it's own view of the partitions (different offset values) that let it read the events from the partitions independently of the other consumer groups.
Typically, you have one receiver per partition to enable scale out. An event hub has between 8 and 16 partitions.
Offset values are managed by the client. You can checkpoint your latest position in each partition to enable you to restart at the latest event if the client restarts. The checkpoint mechanism writes the latest offset values to blob storage.

Resources