High scale message processing in eventhub - azure

As per my understanding, eventhub can process/ingest millions of messages per seconds. And to tune the ingesting, we can use throughput.
More throughput= more ingesting power.
But on receiving/consuming side, You can create upto 32 receivers(since we can create 32 partitions and one partition can be consumed by one receiver).
Based on above, if one single message takes 100 milisencond to process, one consumer can process 10 message per second and 32 consumer can process 32*10= 320 message per second.
How can I make my receiver consume more messages (for ex. 5-10k per seond).
1) Either I have to process message asynchronously inside ProcessEventsAsync. But in this case I would not be able to maintain ordering.
2) Or I have to ask Microsoft to allow me to create more partitions.
Please advice

TLDR: You will need to ask Microsoft to increase the number of partitions you are allowed, and remember that there is currently no way to increase the number on an already extant Event Hub.
You are correct that your unit of consumption parallelism is the partition. If your consumers can only do 10/seconds in order or even 100/second in order, then you will need more partitions to consume millions of events. While 100ms/event certainly seems slow to me and I think you should look for optimizations there (ie farm out work you don't need to wait for, commit less often etc), you will reach the point of needing more partitions at scale.
Some things to keep in mind: 32 partitions gives you only 32 Mb/s of ingress and 64Mb/s of egress. Both of these factors matter since that egress throughput is shared by all the consumer groups you use. So if you have 4 consumer groups reading the data (16Mb/s each) you'll need twice as many partitions (or at least throughput units) for input as you would based solely on your data ingress (because otherwise you would fall behind).
With regards to your comment about multitenancy, you will have one 'database consumer' group that handles all your tenants all of whose data will be flowing through the same hub? If so that sounds like a sensible use, what would not be so sensible is having one consumer group per tenant each consuming the entire stream.

Related

what is the Maximum number of put operation allowed in partitioned azure service bus Queue?

I have a use case where I need to send very large number of messages to Azure Service Bus queue. From this https://github.com/Huachao/azure-content/blob/master/articles/service-bus/service-bus-azure-and-service-bus-queues-compared-contrasted.md I got to know that the Azure Service Bus queue supports 2000 put operations per second (1kb message).
But my writes will be more than 2000 per second.
From the Microsoft's docs
https://learn.microsoft.com/en-us/azure/service-bus-messaging/enable-partitions#:~:text=Service%20Bus%20partitions%20enable%20queues,message%20broker%20or%20messaging%20store.
I have seen that we can create a partitioned queue which will create 16 partitions and the size of queue will be increased by 16 times. But I am not able to find will this have any impact on the put operations? will the put operations also be increased by 16 times resulting in 32000 writes per second?
You are looking at a very outdated document, current limits for standard SKU is 1000 credits per second (per namespace), take a look at this doc for more info on how credits works.
Regarding your question, what partitioned entities do is dividing your entitiy into multiple logical components in order to achieve higher resiliency, when you send a message to a partitioned entitiy there will be an internal load balancing mechanism that distribute messages across all partitions, this is not counted as additional operations, hence if you send 1000 messages per second that is equivalent to 1000 credits.

Why single Broker setup performs better with single topic partition rather than multiple partitions

We are exploring Kafka for coordination across multiple tasks in a Spark job. Each Spark task acts as both a producer AND consumer of messages on the SAME topic. So far we are seeing decent performance, but I am wondering if there is a way to improve it, considering that we are getting the best performance by doing things CONTRARY to what the docs suggest. At the moment we use only a single Broker machine with multiple CPUs, but we can use more if needed.
So far we have tried the following setups:
Single topic, single partition, multiple consumers, NOT using Group ID: BEST PERFORMANCE
Single topic, single partition, multiple consumers each using its own Group ID: 2x slower than (1)
Single topic, single partition, multiple consumers, all using the same Group ID: stuck or dead slow
Single topic, as many partitions as consumers, single Group ID: stuck or dead slow
Single topic, as many partitions as consumers, each using its own Group ID or no Group ID: works, but a lot slower than (1) or (2)
I don't understand why we are getting best performance by doing things against what the docs suggest.
My questions are:
There's a lot written out there about the benefits of having multiple partitions, even on a single broker, but clearly here we are seeing performance deterioration.
Apart from resilience considerations, what's the benefit of adding additional Brokers? We see that our single Broker CPU utilization never goes above 50% even in times of stress. And its easier to simply increase the CPU count on a single VM rather than manage multiple VMs. Is there any merit in getting more Brokers? (for speed considerations, not resilience)
If the above is YES, then clearly we can't have a broker per each consumer. Right now we are running 30-60 Spark tasks, but it can go up to hundreds. So almost inevitably we will be in a situation that each Broker is responsible for tens of partitions, if each task were to have a partition. So based on the above tests, we are still going to see worse performance?
Note that we are setting up the producer to not wait for acknowledgment from the Brokers, as we'd seen in the docs that with many partitions that can slow things down:
producer = KafkaProducer(bootstrap_servers=[SERVER], acks=0)
Thanks for your thoughts.
I think you are missing an important concept: Kafka allows only one consumer per topic partition while there may be multiple consumer groups reading from the same partition. It seems that you have a problem with committing the offsets or too many group re-balancing problems.
Here are my thoughts;
Single topic, single partition, multiple consumers, NOT using Group ID: BEST PERFORMANCE
What actually happens here is -> one of your consumers is idle.
Single topic, single partition, multiple consumers each using its own Group ID: 2x slower than (1)
Both consumers are fetching and processing the same messages independently.
Single topic, single partition, multiple consumers, all using the same Group ID: stuck or dead slow
Only one member of the same group can read from a single partition. This should not give results different than the first case.
Single topic, as many partitions as consumers, single Group ID: stuck or dead slow
This is the situation where each consumer is assigned to different partitions. And, this is the case where we expect to consume as fast as we are.
Single topic, as many partitions as consumers, each using its own Group ID or no Group ID: works, but a lot slower than (1) or (2)
Same remarks on the first and second step.
There's a lot written out there about the benefits of having multiple partitions, even on a single broker, but clearly here we are seeing performance deterioration.
Indeed, by having multiple partitions, we can parallelize the consumers. If the consumers have the same group id, then they will consume from different partitions. Otherwise, each consumer will consume from all partitions.
Apart from resilience considerations, what's the benefit of adding additional Brokers? We see that our single Broker CPU utilization never goes above 50% even in times of stress. And its easier to simply increase the CPU count on a single VM rather than manage multiple VMs. Is there any merit in getting more Brokers? (for speed considerations, not resilience)
If the above is YES, then clearly we can't have a broker per each consumer. Right now we are running 30-60 Spark tasks, but it can go up to hundreds. So almost inevitably we will be in a situation that each Broker is responsible for tens of partitions, if each task were to have a partition. So based on the above tests, we are still going to see worse performance?
When a new topic is created, one of the brokers in the cluster is selected as partition leader, where all read/write operations are handled. So, when you have many topics, it will automatically distribute the workload between the brokers. If you have a single broker with many topics, all producers/consumers will be producing/consume from/to the same broker.

In-order processing in Azure event hubs with Partitions and multiple "event processor" clients

I plan to utilize all 32 partitions in Azure event hubs.
Requirement: "Ordered" processing per partition is critical..
Question: If I increase the TU's (Throughput Units) to max available of 20 across all 32 partitions, I get 40 MB of egress. Let's say I calculated that I need 500 parallel client threads processing in parallel (EventProcessorClient) to achieve my throughput needs. How do I achieve this level of parallelism with EventProcessorClient while honoring my "Ordering" requirement?
Btw, In Kafka, I can create 500 partitions in a topic and Kafka allows only 1 thread per partition guaranteeing event order.
In short, you really can't do what you're looking to do in the way that you're describing.
The EventProcessorClient is bound to a given Event Hub and consumer group combination and will collaborate with other processors using the same Event Hub/consumer group to evenly distribute the load. Adding more processors than the number of partitions would result in them being idle. You could work around this by using additional consumer groups, but the EventProcessorClient instances will only coordinate with others in the same consumer group; the processors for each consumer group would act independently and you'd end up processing the same events multiple times.
There are also quotas on the service side that you may not be taking into account.
Assuming that you're using the Standard tier, the maximum number of concurrent reads that you could have for one Event Hub, across all partitions, with the standard tier is 100. For a given Event Hub, you can create a maximum of 20 consumer groups. Each consumer group may have a maximum of 5 active readers at a time. The Event Hubs Quotas page discusses these limits. That said, a dedicated instance allows higher limits but you would still have a gap with the strict ordering that you're looking to achieve.
Without knowing more about your specific application scenarios, how long it takes for an event to be processed, the relative size of the event body, and what your throughput target is, its difficult to offer alternative suggestions that may better fit your needs.

How do I decide how many partitions to use in Auzre Event Hub

Or phrased differently: what reason do I have to not take the max number of partitions (currently 32 without contacting Microsoft directly).
As far as I can tell more partitions means (potential) larger egress throughput, at no added monetary or computational cost. What's the catch? When would I not want to use as many partitions as I am possibly allowed to provision?
You are right in the observation that having a larger number of partitions won't cost you an extra dime when provisioning the event hub. But when the data comes in at scale you will have to allocate more TUs, so it will cost you extra based on the amount of data flowing in and out.
from the docs
Throughput in Event Hubs defines the amount of data in mega bytes or the number (in thousands) of 1-KB events that ingress and egress through Event Hubs. This throughput is measured in throughput units (TUs). Purchase TUs before you can start using the Event Hubs service. You can explicitly select Event Hubs TUs either by using portal or Event Hubs Resource Manager templates.
Another thing is that if you are using for example the Event Processor Host to process the data it has to spin up listeners for all partitions. If the incoming data is not that much and the data is divided over all those partitions you will have a lot of partitions dealing with small amount of data flowing in making it possible that there is not an optimal processing of this data.
From the docs:
The partition count on an event hub cannot be modified after setup. With that in mind, it is important to think about how many partitions you need before getting started.
Event Hubs is designed to allow a single partition reader per consumer group. In most use cases, the default setting of four partitions is sufficient. If you are looking to scale your event processing, you may want to consider adding additional partitions. There is no specific throughput limit on a partition, however the aggregate throughput in your namespace is limited by the number of throughput units. As you increase the number of throughput units in your namespace, you may want additional partitions to allow concurrent readers to achieve their own maximum throughput.
However, if you have a model in which your application has an affinity to a particular partition, increasing the number of partitions may not be of any benefit to you. For more information, see availability and consistency.
Your data processing pipeline has to deal with those partitions. If you have just one process/machine that has to process the insane amount of data that can theoretically can be send to an event hub.

Tradeoffs involved in count of partitions of event hub with azure functions

I realize this may be a duplicate of Why not always configure for max number of event hub partitions?. However, the product has evolved and the defaults have changed, and the original issues may no longer be a factor.
Consider the scenario where the primary consumers of an event hub will be eventHubTrigger Azure Functions. The trigger function is backed by an EventProcessorHost which will automatically scale up or down to consume from all available partitions as needed without any administration effort.
As I understand it, the monetary cost of the Azure Function is based on the execution duration and count of invocations, which would be driven by only by the count of events consumed and not affected by the degree parallelism due to the count of partitions.
In this case, would there be any higher costs or complexity from creating a hub with the max of 32 partitions, compared to one with a small number like 4?
Your thought process makes sense: I found myself creating 32 partitions by default in this exact scenario and had no issues with that so far.
You pay per provisioned in-/egress, partition count doesn't add cost.
The only requirement is that your partition key has enough unique values to load partitions more or less evenly.

Resources