In the service high-level description Microsoft mentions that I can stream millions of events per second and it is highly scalable
Event Hubs is a fully managed, real-time data ingestion service that’s simple, trusted, and scalable. Stream millions of events per second from any source
https://azure.microsoft.com/en-us/services/event-hubs/
But when I go to the official documentation the maximum throughput units (TUs) limit is 20, which translates into 1000 event per TU * 20 TUs = 20,000 events:
Event Hubs traffic is controlled by throughput units. A single throughput unit allows 1 MB per second or 1000 events per second of ingress and twice that amount of egress. Standard Event Hubs can be configured with 1-20 throughput units, and you can purchase more with a quota increase support request.
https://azure.microsoft.com/en-us/services/event-hubs/
How does 20TUs translate into streaming millions of events?
You can increase 20-TUs by raising a support request.
But if you need to go very high you can also use Dedicated Clusters for Event Hubs.
Two important notes from the docs
A Dedicated cluster guarantees capacity at full scale, and can ingress up to gigabytes of streaming data with fully durable storage and sub-second latency to accommodate any burst in traffic.
At high ingress volumes (>100 TUs), a cluster costs significantly less per hour than purchasing a comparable quantity of throughput units in the Standard offering.
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-dedicated-overview
The throughput capacity of Event Hubs is controlled by throughput units. Throughput units are pre-purchased units of capacity. A single throughput unit lets you: Ingress: Up to 1 MB per second or 1000 events per second (whichever comes first). Egress: Up to 2 MB per second or 4096 events per second.
Related
Is there any message receiving limit per device on Azure IoTHub?
If any, can I remove or raise the upper limit without registering additional devices?
I tested 2 things to make sure if I can place enough load (ideally, 18000 message/s)on Azure IoT Hub in the future load tests.
① Send a certain amount of mqtt messages from a VM.
② Send a certain amount of mqtt messages from two VMs.
I expected that the traffic of ② would be twice as large as that of ①. But it wasn't. Maximum messages per minute on IoTHub of ② is not so different from that of ①. Both of them are around 3.6k [message/min]. At that time, I registered only one device on IoT Hub. So I added another device and tested ② again to see if the second device could increase the traffic. As a result, it increased the traffic and IoT Hub had bigger messages per minute.
Judging from this result, I thought IoTHub has some kind of limit on receiving message per device. But I am not sure. So if anyone know about the limit, could you tell me what kind of limit it is and how to raise the upper limit without registering additional devices because in production we use only one device.
For your information, I know there is a "unit" to increase the throughput in IoTHub. To increase the load I changed the number of unit from 2 to 20 in both ① and ②. However, it did not make messages/min in IotHub bigger. I'd also like to know why the "unit" did not work as expected.
Thank you for reading, in advance. Any comment would be my help.
Every basic (B1,B2, B3) or standard unit of IoT Hub SKU (S1, S2, S3) has specific daily message quota as per https://azure.microsoft.com/en-us/pricing/details/iot-hub/. A single IoTHub can support 1 million devices and there is no per device cost associated, only the msg/day quota as above.
e.g. S1 SKU has 400,000 msg/day quota and you can add multiple units of S1 to increase capacity. S2 has 6000,000 msg/day and S3 has 300,000,000 msg/day quota per unit and more units can be added.
Before this limit is reached IoTHub will raise alert which can be used to automatically add more units or jump to higher SKU.
Regarding your test, there are specific throttling limits to avoid misuse of the service here -
https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-quotas-throttling
As an example, for 18000 msg/sec you will need 3 units of S3 SKU (each with 6000 msg/sec rate limit). In addition there are other limits like how quickly connections can be attempted, if using Azure IoT SDK's the built-in retry logic helps overcome this otherwise you need to have retry policy. Basically you dont want million device trying to connect at the same time, IoTHub will only accept connections at a certain rate. This is not concurrent connection limit but a rate at which new connnections are accepted.
How do I calculate the incoming bytes per second for an event hub namespace?
I do not control the data producer and so cannot predict the incoming bytes upfront.
I am interested in adjusting the maximum throughput units I need, without using the auto-inflate feature.
1 TU provides 1 MB/s ingress & 2 MB/s egress, but the metrics are reported per minute, not per second.
Can I make a decision based on the sum/avg/max incoming bytes reported in the Azure portal?
I believe you'll need to use Stream Analytics to query your stream and based on the query output change your TU on Event Hub.
You can also try to use Azure Monitor, but I believe it won't group per second as you need, so you'd better try the first option.
Per second metrics cannot be reliable due to very nature of potential intermittent spikes at the traffic in and out. 1 minute averages are good to monitor and you can easily take action via a Logic App.
Check messaging metrics to monitor here - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-metrics-azure-monitor#message-metrics
I'm new to azure event grid concepts and currently doing research on event grid to implement in our project.
Can any one tell about the throughput of event grid,
how many events I can push per second and what is the egress of event grid per second, means count of output events per second from event grid.
I asked Microsoft about this topic and this was their response:
Nape:
Publish rate limit is about 5000 events per second. This is the events you can publish to Event Grid. You can achieve rates higher than this if the service instance is not as loaded. Keep in mind EG is a multitenant system.
https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-event-grid-routing-comparison
High: Capable of routing 10,000,000 events per second per region.
Above represents the capability of the entire service in dispatching events, per region.
Event Grid is build for large scale - millions of events per second in on throughput for both ingress and egress: https://learn.microsoft.com/en-us/azure/event-grid/overview#capabilities
There isn't a concept of a namespace in Event Grid, so you don't pre-provision a certain throughput capacity. It scales as you use it on a pay-per-operation basis.
I am working on the POC for Azure Event hubs to implement the same into our application.
Quick Brief on flow.
Created tool to read the CSV data from local folder and send it to event hub.
We are sending Event Data in Batch to event hub.
With 12 instance of tool (Parallel), I can send a total of 600 000 lines of messages to Event hub within 1 min.
But, On receiver side, to receive the 600 000 lines of data, it takes more than 10 mins.
Need to achieve
I would like to Match/double my egress speed on the receiver to
process the data. Existing Configuration
The configuration I have made user are
TU - 10 One Event hub with 32 Partition.
Coding logic goes same as mentioned in MSDN
Only difference is, I am sending line of data in a batch.
EventProcessorhost with options {MaxBatchSize= 1000000,
PrefetchCount=1000000
To achieve higher egress rate (aka faster processing pipeline) in eventhubs:
Create a Scaled-out pipeline - each partition in EventHub is the unit-of-scale for processing events out of EventHub. With the Scale you described (6Lakh events per min --> 10K events per sec - with 32 partitions - you already got this right). Make sure you create as many partitions as you envision your pipeline need in near future. Imagine analyzing traffic on a Highway and no. of lanes is the only limitation for the amount of traffic.
Equal load distribution across partitions: if you are using SendToASpecificPartition or SendUsingPartitionKey - you will need to take care of equal load distribution. If you use EventHubClient.Send(EventDataWithOutPartitionKey) - EventHubs service will make sure all of your partitions are equally loaded. If a single EventHub Partition is heavily loaded - the amount of time you can process all events on EventHub will be bound by no. of events on this Partition.
Scale-out physical resources on the Receiver/EventProcessorHost: most importantly Network (Sockets & bandwidth) & after-a-point, CPU & Memory. Use PartitionManagerOptions.MaxReceiveClients to increase the maximum number of EventHubClients (which has a dedicated MessagingFactory, which maps to 1 socket) created per EventProcessorHost instance. By default it is 16.
Let me know how it went... :)
More on Event Hubs.
We are experiencing lots of these exceptions sending events to EventHubs during peak traffic:
"Failed to send event to EventHub. Exception : Microsoft.ServiceBus.Messaging.MessagingException: The server was unable to process the request; please retry the operation. If the problem persists, please contact your Service Bus administrator and provide the tracking id."
or
"Failed to send event to EventHub. Exception : System.TimeoutException: The operation did not complete within the allocated time "
You can see it clearly here:
As you can see, we got lots of Internal Errors, Server Busy Errors, Failed Request when Incoming messages are over 400K events/hour (or ~270 MB/hour). This is not just a transient issue. It's clearly related to throughput.
Our EH has 32 partitions, message retention of 7 days, and 5 throughput units assigned. OperationTimeout is set to 5 mins, and we are using the default RetryPolicy.
Is it anything we still need to tweak here? We are really concerned about the scalability of EH.
Thanks
Send throughput tuning can be achieved using efficient partition distribution strategies. There isn't any single knob which can do this. Below is the basic information you will need to be able to design for High-Thruput Scenarios.
1) Lets start from the Namespace: Throughput Units(aka TUs) are configured at Namespace level. Pls. bear in mind, that, TUs configured is applied - aggregate of all EventHubs under that Namespace. If you have 5 TUs on your Namespace and 5 eventhubs under it - it will be divided among all 5 eventhubs.
2) Now lets look at EventHub level: If the EventHub is allocated with 5 TUs and it has 32 partitions - No single partition can use all 5 TUs. For ex. if you are trying to send 5TU of data to 1 partition and 'Zero' to all other 31 partitions - this is not possible. Maximum you should plan per Partition is 1 TU. In general, you will need to ensure that the data is distributed evenly across all partitions. EventHubs support 3 types of sends - which gives users different level of control on Partition distribution:
EventHubClient.Send(EventDataWithoutPartitionKey) -> if you are using this API to send - eventhub will take care of evenly distributing the data across all partitions. EventHubs service gateway will round-robin the data to all partitions. When a specific partition is down - the Gateways auto-detect and ensure Clients doesn't see any impact. This is the most recommended way to Send to EventHubs.
EventHubClient.Send(EventDataWithPartitionKey) -> if you are using this API to send to EventHubs - the partitionKey will determine the distribution of your data. PartitionKey is used to Hash the EventData to the appropriate partition (algo. to hash is Microsoft Proprietary and not Shared). Typically users who require correlation of a group of messages will use this variant of Send.
EventHubSender.Send(EventData) -> In this variant, the Sender is already attached to the Partition. So - this gives complete control of Distribution across partitions to the Client.
To measure your present distribution of Data - use EventHubClient.GetPartitionRuntimeInfo Api to estimate which Partition is overloaded. The difference b/w BeginSequenceNumber and LastEnqueuedSequenceNumber is supposed to give an estimate of that partitions load compared to others.
3) Last but not the least - you can tune performance (not Throughput) at send operation level - using the SendBatch API.
1 TU can buy a Max of 1000 msgs/sec or 1MBPS - you will be throttled with whichever limit hits first - this cannot be changed.
If your messages are small - lets say 100 bytes and you can send only 1000 msgs/sec (as per the TU limit) - you will first hit the 1000 events/sec limit. However, overall using SendBatch API - you can batch lets say 10 of 100byte msgs and push at the same rate - 1000 msgs/sec with just 100 API calls and improve the end-to-end latency of the system (as it helps service also to persist messages efficiently). Remember, the only limitation here is the Max. Msg Size that can be sent - which is 256 kb (this limit will apply on your BatchSize if you use SendBatch API).
Given that background, in your case:
- Having 32 partitions and 5 TUs - I would really double-check the Partition distribution strategy.
here's some more general reading on Event Hubs...
After a lot of digging we decided to stop setting the PK for posted messages, and the issue simply went away!. We were using GUID as PK. We start to get very few erros on the Azure Portal, and no more exceptions. Hope this helps someone else