I'm new to azure event grid concepts and currently doing research on event grid to implement in our project.
Can any one tell about the throughput of event grid,
how many events I can push per second and what is the egress of event grid per second, means count of output events per second from event grid.
I asked Microsoft about this topic and this was their response:
Nape:
Publish rate limit is about 5000 events per second. This is the events you can publish to Event Grid. You can achieve rates higher than this if the service instance is not as loaded. Keep in mind EG is a multitenant system.
https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-event-grid-routing-comparison
High: Capable of routing 10,000,000 events per second per region.
Above represents the capability of the entire service in dispatching events, per region.
Event Grid is build for large scale - millions of events per second in on throughput for both ingress and egress: https://learn.microsoft.com/en-us/azure/event-grid/overview#capabilities
There isn't a concept of a namespace in Event Grid, so you don't pre-provision a certain throughput capacity. It scales as you use it on a pay-per-operation basis.
Related
I'm seeing multi-second pauses in the event stream, even reading from the retention pool.
Here's the main nugget of EH setup:
BlobContainerClient storageClient = new BlobContainerClient(blobcon, BLOB_NAME);
RTMTest.eventProcessor = new EventProcessorClient(storageClient, consumerGroup, ehubcon, EVENTHUB_NAME);
And then the do nothing processor:
static async Task processEventHandler(ProcessEventArgs eventArgs)
{
RTMTest.eventsPerSecond++;
RTMTest.eventCount++;
if ((RTMTest.eventCount % 16) == 0)
{
await eventArgs.UpdateCheckpointAsync(eventArgs.CancellationToken);
}
}
And then a typical execution:
15:02:23: no events
15:02:24: no events
15:02:25: reqs=643
15:02:26: reqs=656
15:02:27: reqs=1280
15:02:28: reqs=2221
15:02:29: no events
15:02:30: no events
15:02:31: no events
15:02:32: no events
15:02:33: no events
15:02:34: no events
15:02:35: no events
15:02:36: no events
15:02:37: no events
15:02:38: no events
15:02:39: no events
15:02:40: no events
15:02:41: no events
15:02:42: no events
15:02:43: no events
15:02:44: reqs=3027
15:02:45: reqs=3440
15:02:47: reqs=4320
15:02:48: reqs=9232
15:02:49: reqs=4064
15:02:50: reqs=395
15:02:51: no events
15:02:52: no events
15:02:53: no events
The event hub, blob storage and RTMTest webjob are all in US West 2. The event hub as 16 partitions. It's correctly calling my handler as evidenced by the bursts of data. The error handler is not called.
Here are two applications side by side, left using Redis, right using Event Hub. The events turn into the animations so you can visually watch the long stalls. Note: these are vaccines being reported around the US, either live or via batch reconciliations from the pharmacies.
vaccine reporting animations
Any idea why I see the multi-second stalls?
Thanks.
Event Hubs consumers make use of a prefetch queue when reading. This is essentially a local cache of events that the consumer tries to keep full by streaming in continually from the service. To prioritize throughput and avoid waiting on the network, consumers read exclusively from prefetch.
The pattern that you're describing falls into the "many smaller events" category, which will often drain the prefetch quickly if event processing is also quick. If your application is reading more quickly than the prefetch can refill, reads will start to take longer and return fewer events, as it waits on network operations.
One thing that may help is to test using higher values for PrefetchCount and CacheEventCount in the options when creating your processor. These default to a prefetch of 300 and cache event count of 100. You may want try testing with something like 750/250 and see what happens. We recommend keeping at least a 3:1 ratio.
It is also possible that your processor is being asked to do more work than is recommended for consistent performance across all partitions it owns. There's good discussion of different behaviors in the Troubleshooting Guide, and ultimately, capturing a +/- 5-minute slice of the SDK logs described here would give us the best view of what's going on. That's more detail and requires more back-and-forth discussion than works well on StackOverflow; I'd invite you to open an issue in the Azure SDK repository if you go down that path.
Something to keep in mind is that Event Hubs is optimized to maximize overall throughput and not for minimizing latency for individual events. The service offers no SLA for the time between when an event is received by the service and when it becomes available to be read from a partition.
When the service receives an event, it acknowledges receipt to the publisher and the send call completes. At this point, the event still needs to be committed to a partition. Until that process is complete, it isn't available to be read. Normally, this takes milliseconds but may occasionally take longer for the Standard tier because it is a shared instance. Transient failures, such as a partition node being rebooted/migrated, can also impact this.
With you near real-time reading, you may be processing quickly enough that there's nothing client-side that will help. In this case, you'd need to consider adding more TUs, moving to a Premium/Dedicated tier, or using more partitions to increase concurrency.
Update:
For those interested without access to the chat, log analysis shows a pattern of errors that indicates that either the host owns too many partitions and load balancing is unhealthy or there is a rogue processor running in the same consumer group but not using the same storage container.
In either case, partition ownership is bouncing frequently causing them to stop, move to a new host, reinitialize, and restart - only to stop and have to move again.
I've suggested reading through the Troubleshooting Guide, as this scenario and some of the other symptoms tare discussed in detail.
I've also suggested reading through the samples for the processor - particularly Event Processor Configuration and Event Processor Handlers. Each has guidance around processor use and configuration that should be followed to maximize throughput.
#jesse very patiently examined my logs and led me to the "duh" moment of realizing I just needed a separate consumer group for this 2nd application of the EventHub data. Now things are rock solid. Thanks Jesse!
In the service high-level description Microsoft mentions that I can stream millions of events per second and it is highly scalable
Event Hubs is a fully managed, real-time data ingestion service that’s simple, trusted, and scalable. Stream millions of events per second from any source
https://azure.microsoft.com/en-us/services/event-hubs/
But when I go to the official documentation the maximum throughput units (TUs) limit is 20, which translates into 1000 event per TU * 20 TUs = 20,000 events:
Event Hubs traffic is controlled by throughput units. A single throughput unit allows 1 MB per second or 1000 events per second of ingress and twice that amount of egress. Standard Event Hubs can be configured with 1-20 throughput units, and you can purchase more with a quota increase support request.
https://azure.microsoft.com/en-us/services/event-hubs/
How does 20TUs translate into streaming millions of events?
You can increase 20-TUs by raising a support request.
But if you need to go very high you can also use Dedicated Clusters for Event Hubs.
Two important notes from the docs
A Dedicated cluster guarantees capacity at full scale, and can ingress up to gigabytes of streaming data with fully durable storage and sub-second latency to accommodate any burst in traffic.
At high ingress volumes (>100 TUs), a cluster costs significantly less per hour than purchasing a comparable quantity of throughput units in the Standard offering.
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-dedicated-overview
The throughput capacity of Event Hubs is controlled by throughput units. Throughput units are pre-purchased units of capacity. A single throughput unit lets you: Ingress: Up to 1 MB per second or 1000 events per second (whichever comes first). Egress: Up to 2 MB per second or 4096 events per second.
How do I calculate the incoming bytes per second for an event hub namespace?
I do not control the data producer and so cannot predict the incoming bytes upfront.
I am interested in adjusting the maximum throughput units I need, without using the auto-inflate feature.
1 TU provides 1 MB/s ingress & 2 MB/s egress, but the metrics are reported per minute, not per second.
Can I make a decision based on the sum/avg/max incoming bytes reported in the Azure portal?
I believe you'll need to use Stream Analytics to query your stream and based on the query output change your TU on Event Hub.
You can also try to use Azure Monitor, but I believe it won't group per second as you need, so you'd better try the first option.
Per second metrics cannot be reliable due to very nature of potential intermittent spikes at the traffic in and out. 1 minute averages are good to monitor and you can easily take action via a Logic App.
Check messaging metrics to monitor here - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-metrics-azure-monitor#message-metrics
I'm testing placing scheduled messages onto my Azure service bus queues. No more then 10 or so in total but it seems like the statistics in my dashboard show otherwise!
It looks like it's showing thousands of incoming requests!
Question - am I not reading the chart correctly when it says 93.63k as 93,000+ ?
Incoming messages is the metric you need to select for determining the number of incoming messages, check the metrics list here.
As the graph by default displays the metrics of the Namespace and not the metrics of a particular Queue or Topic, the values may look high. Use dimension filter to display the metrics specific to a particular Queue or Topic.
I am working on the POC for Azure Event hubs to implement the same into our application.
Quick Brief on flow.
Created tool to read the CSV data from local folder and send it to event hub.
We are sending Event Data in Batch to event hub.
With 12 instance of tool (Parallel), I can send a total of 600 000 lines of messages to Event hub within 1 min.
But, On receiver side, to receive the 600 000 lines of data, it takes more than 10 mins.
Need to achieve
I would like to Match/double my egress speed on the receiver to
process the data. Existing Configuration
The configuration I have made user are
TU - 10 One Event hub with 32 Partition.
Coding logic goes same as mentioned in MSDN
Only difference is, I am sending line of data in a batch.
EventProcessorhost with options {MaxBatchSize= 1000000,
PrefetchCount=1000000
To achieve higher egress rate (aka faster processing pipeline) in eventhubs:
Create a Scaled-out pipeline - each partition in EventHub is the unit-of-scale for processing events out of EventHub. With the Scale you described (6Lakh events per min --> 10K events per sec - with 32 partitions - you already got this right). Make sure you create as many partitions as you envision your pipeline need in near future. Imagine analyzing traffic on a Highway and no. of lanes is the only limitation for the amount of traffic.
Equal load distribution across partitions: if you are using SendToASpecificPartition or SendUsingPartitionKey - you will need to take care of equal load distribution. If you use EventHubClient.Send(EventDataWithOutPartitionKey) - EventHubs service will make sure all of your partitions are equally loaded. If a single EventHub Partition is heavily loaded - the amount of time you can process all events on EventHub will be bound by no. of events on this Partition.
Scale-out physical resources on the Receiver/EventProcessorHost: most importantly Network (Sockets & bandwidth) & after-a-point, CPU & Memory. Use PartitionManagerOptions.MaxReceiveClients to increase the maximum number of EventHubClients (which has a dedicated MessagingFactory, which maps to 1 socket) created per EventProcessorHost instance. By default it is 16.
Let me know how it went... :)
More on Event Hubs.