When to use EventGrid and when to use ServiceBus / Storage Queue? - azure

In Azure, we have two separate messaging technologies and it's not very well documented when to use what? While EventGrid is really cool, I did not come across when to use EventGrid(scenarios) vs the Storage/ServiceBus queue? Can someone help?
E.g. if I have the following scenario :
A status of a flag changes and based on that, I want to trigger an algorithm that would do recalculations, few inserts/updates etc. in the database.
For implementing this - I can either use EventGrid or Storage Queue. How do we figure what to use in such scenario? I was looking for some kind of guidance.

Basically, Azure Event Grid handles events and Azure ServiceBus handles messages.A message is raw data produced by a service to be consumed or stored. Events are also messages (lightweigth), but they don’t generally convey a publisher intent, other than to inform.
1) If the purpose is to just to store the information ServiceBus can be used.
2) If the information received is used to trigger another service Azure Event Grid can be used.
Find more info here
https://learn.microsoft.com/en-us/azure/event-grid/compare-messaging-services
https://azure.microsoft.com/en-us/blog/events-data-points-and-messages-choosing-the-right-azure-messaging-service-for-your-data/

Events are like notifications from a service to inform the world that something happened in the domain of the publisher (similar to an email notification). There is no expectations from the publisher to have any actions taken. A message is a command you send to a specific receiver with the expectation of the message to be processed (like an asynchronous post request).
Events will work in pub/sub pattern and multiple subscribers could be configured to the events. The service that needs to react to an event will get notified by the event grid when an event occurs (http call from event grid to the receiver). The event will remain in the event grid until deletion (cleanup) and there is no garantie of keeping the original order (no FIFO).
In the other hand, messages will be added to a queue and will be deleted once the “message processor” is done with it. The messages in the queue will keep the original order (FIFO). The message processor has to pull messages from the queue.
In your scenario, you could use a combination of both. Service A sends an event “StatusChanged”, then you can configure a subscription to that event and send a message to a queue, then have your logic to process that message. This will end up with a fully async communication pattern. This is ideal to support scenarios where you processor is down or too busy. The incoming messages will simply get accumulated in the queue and eventually being processed once the service is back up and running. And without affecting the original service that sent the “StatusChanged” event..

Related

Azure service bus, Auto forwarding does not wait to message completed

I want to use the auto-forwarding feature of the Azure service bus. I have a topic called "trip" and has a subscription called "test".
I have set the auto-forwarding enabled and set to forward the message to another Topic called "trip_elaborated". This is working fine. But, It does not wait for the message to complete and then auto-forward to another topic.
e.g the "test" subscription takes 30 seconds to process the message and before it completed it forwards the message to the "trip_elaborated" topic. I want this operation do in sync.
Is there any configuration needed? Or any other way to achieve this kind of scenario?
I would prefer to manage this using service bus explorer(without explicitly do in the consumer using code).
When Auto forwarding is enabled on an entity, messages will be forwarded automatically, and cannot be processed from the entity they were originally sent to. If you want to process the message and forward it in a synchronous manner, you'd need to do it in your processer. Azure service bus will forward the message from the subscription straight to the destination the moment the message arriving at the topic meets the filter criteria.
To achieve processing and forwarding, you can process the incoming message in a transactional manner, something Azure Service Bus supports. See documentation for more details.
In case you can tolerate processing and forwarding in parallel, you'd have two subscriptions, one for processing and another for solely auto-forwarding.

Diagnosing failures in azure event grid?

I did not find much in the way of troubleshooting events lost scenario in the azure event grid.
Hence I am asking question in relation to following scenario:
Our code publishes the events to the domain.
The events are delivered to the configured web hook in the subscription.
This works for a while.
The consumer (who owns the web hook endpoint) complains that he is not receiving some events but most are coming through.
We look in the configured dead-letter queue and find that there are no events. It has been more than a day and hence all retries are already exhausted.
Hence we assume that all events are being delivered because there are no failed delivery events in the metrics.
We also make sure that we indeed submitted these mysterious events to the grid.
But consumer insists about the problem and proves that there is nothing wrong with his side.
Now we need to figure out if some of these events are being swallowed by the event grid.
How do I go about troubleshooting this scenario?
The current version of the AEG is not integrated for Diagnostic settings feature which can be help very well for streaming the metrics and logs.
For your scenario which is based on the Event Domains (still in the public preview, see limits) can help an Azure Monitoring REST API, to see all metrics in the specific your Event Domain.
The valid metrics are:
PublishSuccessCount,PublishFailCount,PublishSuccessLatencyInMs,MatchedEventCount,DeliveryAttemptFailCount,DeliverySuccessCount,DestinationProcessingDurationInMs,DroppedEventCount,DeadLetteredCount
The following example is a REST GET request to obtain all metrics values within your event domain for specific timespan and interval:
https://management.azure.com/subscriptions/{mySubId}/resourceGroups/{myRG}/providers/Microsoft.EventGrid/domains/{myDomain}/providers/Microsoft.Insights/metrics?api-version=2018-01-01&interval=PT1H&aggregation=count,total&timespan=2019-02-06T07:58:12Z/2019-02-07T08:58:12Z&metricnames=PublishSuccessCount,PublishFailCount,PublishSuccessLatencyInMs,MatchedEventCount,DeliveryAttemptFailCount,DeliverySuccessCount,DestinationProcessingDurationInMs,DroppedEventCount,DeadLetteredCount
Based on the response values, you can see metrics of the AEG behavior from the publisher side and the event delivery to the subscriber. For your production version, I do recommend to use a polling technique to obtain all metrics from AEG and pushing them to the Event Hub for a streaming analyzing, alerting, etc. Based on the query parameters (such as timespan, interval, etc.), it can be close to the real-time. When the Diagnostic settings will be supported by AEG, than this polling and publishing all metrics is obsoleted and small modification at the analyzing stream job can be continued.
The other point is to extend your eventing model for auditing part. I do recommend the following:
Add a domain scope subscription to capture all events in the event domain and push them to the Event Hub for streaming purposes. Note, that any published event within that event domain should be in this published stream pipeline.
Add a storage subscription for dead-letter messages and push them to the same Event Hub for streaming purposes.
(optional) Add the Diagnostic settings (some metrics) of the dead-letter storage to the same Event Hub for streaming purposes. Note, that the dead-letter message is dropped after 4 hours trying to store it in the blob container. There is no any log message for that failed process, just only metric counter.
For the customer side, I do recommend that each subscriber will create a log message (aeg headers + event message) for auditing and troubleshooting purposes. It should be stored in the blob container or locally and then uploaded, etc. The point is, that this reference can be very useful for analyzing stream job to quickly figure out where is the problem.
In addition to your eventing model, your publisher should periodically (for instance once per hour) probes the event domain endpoint and also should send a probe event message to the probe topic for test purposes. The event subscription for that probe topic will configure a deadlettering option. The subscriber webhook handler should be always failed with a error code = HttpStatusCode.BadRequest such as no retrying action. Note, that there is a 300 seconds delay time, when the deadletter message will be stored in the storage. In other words, after probe event + 5 minutes, the deadlettering message should be in the stream pipeline. This probe scenario in your eventing model will probe a functionality of the AEG from the publisher and delivery point of the view.
The above described solution is shown in the following screen snippet:

How do you maintain idempotency with Azure EventGrid webhooks?

I have configured an EventGrid subscription to initiate a web hook call for events in a resource group when a resource is created.
The web hook call is successfully handled, and I return a 200 OK. To maintain idempotency, I store all events that have occurred in a webhook_events table with the id of the event. Any new events are checked to see if they exist in that table by their id.
Azure EventGrid attempts to remove the event from the retry queue after returning a 200 OK. No matter how quickly I respond with a 200 OK, EventGrid reliably retries sending.
I am receiving the same event multiple times (as I said, EventGrid always retries, as it cannot remove the event from the retry queue fast enough). This however is not the focus of my question; rather, the issue exists in the fact that each of these retries presents me with a different id for the event. This means that I cannot logically determine the uniqueness of an event, and my application code is not being executed in an idempotent fashion.
How can I maintain idempotency between my application and Azure despite there being no unique identifier between event retries?
It's the way EventGrid is implemented if you look at the documentation
If the endpoint responds within 3 minutes, Event Grid will attempt to
remove the event from the retry queue on a best effort basis but
duplicates may still be received.
you can use back-end code to clean up logs and stored data, using event and message IDs to identify duplicates.
The id field is in fact unique per event and kept identical between retries & therefore can be used for dedupe.
What you're running into is a specific issue with some events generated by Azure Resource Manager (ARM). Specifically, the two events you are seeing are in fact distinct events, not duplicates, generated by ARM at different stages of the creative flow for some resource types.
ARM is acting as the API front door to the various Azure services and emits a set of events for that are generalized and often to get the details of what has occurred, you need to look in the data payload. For example, ARM will emit a success event for each 2xx status code it receives from an Azure service, so a 202 accepted and a 201 created can result in two events being emitted and the only way to see the difference would be in the data payload.
This is a known pain point, and we are working to emit more high-fidelity events that will be clearer and easier to react to in these scenarios. The ideal state will be a change-feed of sorts for the Azure control plane.

Should I change how our microservices communicate?

Our application consist of 7 microservices that have some intercommunication. Currently we're using simple storage queues that a microservice publish events to (the number of events is relative low). Then we have a azurefunction for each queue that might call another microservices. This is working fine for us right now the services uses about 20 queues with a corresponding function.
Now we need to handle an blobstorage event, and I did some googling and a started to get really confused. Suddenly there was a lot of questions:
Should we switch to Azure Event Grid
It handles blobstorage without any limitations (functions blobstorage trigger has some)
It allows for multiple subscribers (storage queues does not)
It has a lot of fuz - maybe this is the new recommended way
I like the idea of one central thing, but it reminds me a bit about biztalk...
Should I switch to Azure Service Bus
It has a nice tool (ServiceBusExplorer) for monitoring the queues and listners, and I could to a repost of any failed events
It visulizes my azure functions subscribers nicely
Should I continue with only storage queues
A bit difficult to monitor, but it works nice
I'll be really thankful for any advice or insights to this question.
Thanks
EventGrid is great when you have notifications floating to multiple subscribers. Is that the case for you?
An example would be deferring messages. With queues you can defer a message, not with EventGrid. Whenever to choose Storage Queues or Service Bus depends on the specific requirement that you have. Do you need de-duplication? Or ordered delivery? If you do, Service Bus is the way. Otherwise Storage Queues is enough.
First of All, I would like to recommend these two articles, it will clarify most of your doubts about these services:
Choose between Azure services that deliver messages
Storage queues and Service Bus queues compared
Regarding Event Grid, it acts like a bridge between the publisher and the subscriber, where publisher will send messages and forget whether it has been processed or not, and the Event Grid will handle the retry if the receiver\subscriber does not acknowledge that it was processed successfully.
As you mentioned, storage queues has limitations, as such blob triggered functions, and maybe Service Bus, but it will depend on your design requirements. I would like to point out some things you might consider before moving to Event Grid.
Storage queues & Service Bus does not care about your message schema, in Event Grid you have to create a custom event based on their schema to wrap your event, so the publisher and subscriber has to understand Event Grid for that, not that is a big deal, but now you have both sides coupled to Event Grid.
If you want to send the event straight to your micro-service, you have to implement the subscription validation in your service, otherwise the service won't be able to receive the events
Event Grid only retry the delivery of your messages for 24 hours, if your service is down or not process the message correctly for longer than 24h, it will make the event dead. Currently, there is no way to query dead messages. Storage Queues and Service Bus are configurable how long you keep the message and it can be kept for many days.
Your service web-hook must acknowledge the receipt(http 200 or 202) of an event within 60 seconds, otherwise it will consider failed. If your operation is longer that that, you should send it to a queue and handle the locking from your service.
Probably there are more limitations, but these are the ones I remember right now that might change anytime soon, I think Event Grid is a great technology still on early days, and there is much to improve, I would recommencement only as a hub for Azure management events, I don't think it is ready for use as an application integrator.
Regarding your comment for queue manager, for Service Bus your have the Service Bus Explorer, and for Azure Storage you have the Azure Storage Explorer, where you can check the messages in the queue, is not the same as Service Bus, but helps.
It very much depends on how are you consuming the queue messages, you can take a look at this comparison: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted
If you don't need ordering and if you don't have a strong limit on message volume, size or TTL, you can stick to storage queues.

Does Azure client .OnMessage generate billable request for empty queues?

You can subscribe to asynchronous updates from Azure topics and queues by using SubscriptionClient/QueueClient's .OnMessage call which will presumably create a separate thread polling the topic/queue with default settings and calling a defined callback if it receives anything.
Azure website says that receiving a message is a billable action, which is understandable. However, it isn't clear enough if each those poll requests are considered billable even when they do not return anything, i.e. the queue in question has no pending messages.
Based on the Azure Service Bus Pricing FAQ - the answer to your question is yes
In general, management operations and “control messages,” such as
completes and deferrals, are not counted as billable messages. There
are two exceptions:
Null messages delivered by the Service Bus in
response to requests against an empty queue, subscription, or message
buffer, are also billable. Thus, applications that poll against
Service Bus entities will effectively be charged one message per poll.
Setting and getting state on a MessageSession will also result in
billable messages, using the same message size-based calculation
described above.
Given the price is $0.01 per 10,000 messages, I don't think you should worry too much about that.

Resources