Getting event notifications for Azure Blob objects created in the past - azure

I plan on using Azure Event Grid to get notifications when Blob objects are created. If my service goes down and comes back up, will I get notifications of Blob objects that were created while my service was down? I would prefer not to get notifications of Blob objects that were created while my service was down.

The document for Retry schedule and duration described the types of destination endpoints and errors for which retry doesn't happen.
As you can see in the table of the Failure codes the code 503 Service Unavailable is retrying a message delivery after 30 seconds or more.
It will be nice to have a capability for each subscription to control a retry policy based on the status code, but for now I can see the following workaround:
Use the Azure API Management for handling a properly return status code back to the AEG, in other words, the APIM will mapped the stratus code 503 from your service to the status code 403 back to the AEG. Note, this APIM mediator will also handle a webhook validation for AEG subscription.
I do recommend to enable a dead-lettering feature for auditing non delivered messages during when your service has been down.

Related

dead lettering in Azure event grid does not work

As per the Microsoft documentation, Event Grid does not implement retry on specific errors like 400, 413, 401. In such cases Event Grid will either perform dead-lettering on the event or drop the event if dead-letter isn't configured.
So I enabled dead letter on my webhook (receiver endpoint) and produced a 400 error. But it is still not captured in dead letter logs container.
Is there something I am missing?
Please have a look at the Event Grid message delivery and retry - Dead-letter events documentation and see if your configuration is correct.
By default, Event Grid doesn't turn on dead-lettering. To enable it, you must specify a storage account to hold undelivered events when creating the event subscription. You pull events from this storage account to resolve deliveries.
[...]
Before setting the dead-letter location, you must have a storage account with a container. You provide the endpoint for this container when creating the event subscription. The endpoint is in the format of: /subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.Storage/storageAccounts/<storage-name>/blobServices/default/containers/<container-name>
Also, take into account that
There's a five-minute delay between the last attempt to deliver an event and when it's delivered to the dead-letter location. This delay is intended to reduce the number of Blob storage operations. If the dead-letter location is unavailable for four hours, the event is dropped.

Diagnosing failures in azure event grid?

I did not find much in the way of troubleshooting events lost scenario in the azure event grid.
Hence I am asking question in relation to following scenario:
Our code publishes the events to the domain.
The events are delivered to the configured web hook in the subscription.
This works for a while.
The consumer (who owns the web hook endpoint) complains that he is not receiving some events but most are coming through.
We look in the configured dead-letter queue and find that there are no events. It has been more than a day and hence all retries are already exhausted.
Hence we assume that all events are being delivered because there are no failed delivery events in the metrics.
We also make sure that we indeed submitted these mysterious events to the grid.
But consumer insists about the problem and proves that there is nothing wrong with his side.
Now we need to figure out if some of these events are being swallowed by the event grid.
How do I go about troubleshooting this scenario?
The current version of the AEG is not integrated for Diagnostic settings feature which can be help very well for streaming the metrics and logs.
For your scenario which is based on the Event Domains (still in the public preview, see limits) can help an Azure Monitoring REST API, to see all metrics in the specific your Event Domain.
The valid metrics are:
PublishSuccessCount,PublishFailCount,PublishSuccessLatencyInMs,MatchedEventCount,DeliveryAttemptFailCount,DeliverySuccessCount,DestinationProcessingDurationInMs,DroppedEventCount,DeadLetteredCount
The following example is a REST GET request to obtain all metrics values within your event domain for specific timespan and interval:
https://management.azure.com/subscriptions/{mySubId}/resourceGroups/{myRG}/providers/Microsoft.EventGrid/domains/{myDomain}/providers/Microsoft.Insights/metrics?api-version=2018-01-01&interval=PT1H&aggregation=count,total&timespan=2019-02-06T07:58:12Z/2019-02-07T08:58:12Z&metricnames=PublishSuccessCount,PublishFailCount,PublishSuccessLatencyInMs,MatchedEventCount,DeliveryAttemptFailCount,DeliverySuccessCount,DestinationProcessingDurationInMs,DroppedEventCount,DeadLetteredCount
Based on the response values, you can see metrics of the AEG behavior from the publisher side and the event delivery to the subscriber. For your production version, I do recommend to use a polling technique to obtain all metrics from AEG and pushing them to the Event Hub for a streaming analyzing, alerting, etc. Based on the query parameters (such as timespan, interval, etc.), it can be close to the real-time. When the Diagnostic settings will be supported by AEG, than this polling and publishing all metrics is obsoleted and small modification at the analyzing stream job can be continued.
The other point is to extend your eventing model for auditing part. I do recommend the following:
Add a domain scope subscription to capture all events in the event domain and push them to the Event Hub for streaming purposes. Note, that any published event within that event domain should be in this published stream pipeline.
Add a storage subscription for dead-letter messages and push them to the same Event Hub for streaming purposes.
(optional) Add the Diagnostic settings (some metrics) of the dead-letter storage to the same Event Hub for streaming purposes. Note, that the dead-letter message is dropped after 4 hours trying to store it in the blob container. There is no any log message for that failed process, just only metric counter.
For the customer side, I do recommend that each subscriber will create a log message (aeg headers + event message) for auditing and troubleshooting purposes. It should be stored in the blob container or locally and then uploaded, etc. The point is, that this reference can be very useful for analyzing stream job to quickly figure out where is the problem.
In addition to your eventing model, your publisher should periodically (for instance once per hour) probes the event domain endpoint and also should send a probe event message to the probe topic for test purposes. The event subscription for that probe topic will configure a deadlettering option. The subscriber webhook handler should be always failed with a error code = HttpStatusCode.BadRequest such as no retrying action. Note, that there is a 300 seconds delay time, when the deadletter message will be stored in the storage. In other words, after probe event + 5 minutes, the deadlettering message should be in the stream pipeline. This probe scenario in your eventing model will probe a functionality of the AEG from the publisher and delivery point of the view.
The above described solution is shown in the following screen snippet:

How do you maintain idempotency with Azure EventGrid webhooks?

I have configured an EventGrid subscription to initiate a web hook call for events in a resource group when a resource is created.
The web hook call is successfully handled, and I return a 200 OK. To maintain idempotency, I store all events that have occurred in a webhook_events table with the id of the event. Any new events are checked to see if they exist in that table by their id.
Azure EventGrid attempts to remove the event from the retry queue after returning a 200 OK. No matter how quickly I respond with a 200 OK, EventGrid reliably retries sending.
I am receiving the same event multiple times (as I said, EventGrid always retries, as it cannot remove the event from the retry queue fast enough). This however is not the focus of my question; rather, the issue exists in the fact that each of these retries presents me with a different id for the event. This means that I cannot logically determine the uniqueness of an event, and my application code is not being executed in an idempotent fashion.
How can I maintain idempotency between my application and Azure despite there being no unique identifier between event retries?
It's the way EventGrid is implemented if you look at the documentation
If the endpoint responds within 3 minutes, Event Grid will attempt to
remove the event from the retry queue on a best effort basis but
duplicates may still be received.
you can use back-end code to clean up logs and stored data, using event and message IDs to identify duplicates.
The id field is in fact unique per event and kept identical between retries & therefore can be used for dedupe.
What you're running into is a specific issue with some events generated by Azure Resource Manager (ARM). Specifically, the two events you are seeing are in fact distinct events, not duplicates, generated by ARM at different stages of the creative flow for some resource types.
ARM is acting as the API front door to the various Azure services and emits a set of events for that are generalized and often to get the details of what has occurred, you need to look in the data payload. For example, ARM will emit a success event for each 2xx status code it receives from an Azure service, so a 202 accepted and a 201 created can result in two events being emitted and the only way to see the difference would be in the data payload.
This is a known pain point, and we are working to emit more high-fidelity events that will be clearer and easier to react to in these scenarios. The ideal state will be a change-feed of sorts for the Azure control plane.

How do I dead-letter a message with the Azure Service Bus HTTP API

I'm trying to integrate with the Azure Service Bus to perform brokered messaging. I've used the managed .NET API successfully before, but this time I need to use the HTTP API.
When processing a message, if I determine that a message is poisonous (i.e. it can never be processed successfully), I want to move the message to the dead-letter queue.
In the managed API, I'd call BrokeredMessage.DeadLetterAsync() which lets me specify the reasons for dead-lettering the message and moves it to the dead-letter queue as an atomic operation.
Having been reading through the HTTP API documentation, I've found and invoked operations to perform the other actions, such as peek-lock, delete a locked message or abandon a lock, but I can't find an explicit operation to dead-letter a message.
Does this operation exist in the HTTP API?
DeadLetter operation today is not supported thru the http/rest API. We will add that support in an upcoming release. When the max delivery count for any message is reached and it is still not completed then it will be automatically deadlettered if that is enabled for the queue/subscription. The connectivity mode mentioned above is for the .NET API where the SBMP service bus protocol is tunneled over a http/port80 connection so it is not using REST APIs for that.
Even though I did not find any documentation for it, you can access dead letter messages via:
https://{servicebusnamespace}/{topic}/subscriptions/{subscriptionname}/$deadletterqueue/messages/head
I took a look at the REST Api Reference too and I could not find a way. There's a comparative table that shows features that are available through REST Api and features available through .NET SDK.
http://msdn.microsoft.com/en-us/library/windowsazure/hh780771.aspx
It sounds strange for me because I thought that .NET SDK calls a REST API Resource.
I believe that you must apply Peek-Lock on a message and after the processing, delete it.
Peek-lock message:
http://msdn.microsoft.com/en-us/library/windowsazure/hh780735.aspx
Delete:
http://msdn.microsoft.com/en-us/library/windowsazure/hh780768.aspx

Azure service bus statistics/Monitoring

I want to make a dashboard which shows the status of our Azure services bus queues and displays the history for "messages added to queue", "length of queue" and "messages processed" etc. Using the Azure Management Portal, I can see that most of these statistics manually for each queue.
Is there any way to get access to the data that is displayed in the Management Portal through one of the APIs as I want to combine the data from number of queues that we use into a single interface. I have searched in vain but I don't want to log my own statistics as that seems like redoing a task that Microsoft already perform.
Currently with REST API all I can see is how to get the current approximate count of messages in the queue.
There is an API for this now (wasn't back when the OP created the thread):
https://msdn.microsoft.com/en-gb/library/azure/dn163589.aspx (REST)
https://msdn.microsoft.com/en-us/library/mt348562.aspx (.NET)
Also, I believe it should be available via Azure Insights API:
https://msdn.microsoft.com/en-us/library/microsoft.azure.insights.aspx
It is possible to fetch the Count of Messages in a Queue, Incoming Messages, Outgoing Messages with the help of the latest Azure Monitor Metrics, with which you can build you own Dashboard. Or you can make use of the Azure Monitor in Azure portal, which allows you to configure dashboards and alerts.

Resources