How to retry sending events from Azure Event Grid to Logic Apps - azure

I have an event grid which publishes a lot of events, and a logic app which needs to consume some of them. These events aren't guaranteed to be in order, and events which require another event to be processed first, might end up in the logic app prematurely, causing them to fail.
From the documentation, I can see that event grid supports a retry policy, with an increasing time interval. This would solve my problem.
However, it seems like the logic app in question, always acknowledges events from the event grid, even though the process is stopped early with the Terminate action in the failure state and with an error code.
From the logic app overview, the runs are shown as failed. But the event grid never attempts a retry, and seems to consider the events successful. What can I do to make the event grid retry failed logic app runs?

It seems that once the Azure logic app is triggered, the event in the Azure event grid is considered to be processed.
I think you can configure retry policy at the step where your Azure logic app failed, please refer to Retry policies.
Take the example of Httpaction:
You can click ··· in the upper right corner of the Http action, then click Settings, and select the type you want under Retry Policy:

Event Grid will retry depending on how you terminate your Logic App. If you terminate using http response action (status code 500) then event grid will attempt retries.
Now, depending on what is going on in your Logic app, handle the failures in a way that it terminates on http response action with status code 500.

Related

Logic App (Consumption) Handling Lost Locks (not Timeout)

We have instances where our service bus message lock has been lost before it can be completed. MS referred me to the documentation:
Important
It is important to note that the lock that PeekLock acquires on the
message is volatile and may be lost in the following conditions
Service Update OS update Changing properties on the entity (Queue,
Topic, Subscription) while holding the lock. When the lock is lost,
Azure Service Bus will generate a MessageLockLostException which will
be surfaced on the client application code. In this case, the client's
default retry logic should automatically kick in and retry the
operation.
We already handle the 5 minute timeout with a parallel loop. Now we need to handle a lost lock due to volatility. What is everyone's best practice here?
A resubmit is not appropriate - in case of duplication
Dead-lettering cannot be done because the lock is lost, a second instance will already have started for the same message
Message could be completed immediately, however we lose the dead-letter ability etc...
Could this be a good solution for you?
Change the logic app to be http triggered
Add another logic app triggered by the message that creates a record in a storage of some sort with the state of processing the message set to 0 for example and then calls the first logic app
Add another logic app that sets the record to complete state 1 for example
When the logic app finishes, it calls the second logic app to update the record
What happens:
message arrives
new logicapp1 picks it and completes the message
new logicapp1 creates a record and calls your main logic app
main logic app do its processing
main logic app calls new logicapp2
new logicapp2 updates the record as completed

ServiceBus message delivery time reliable?

I'm working on creating an events system with Azure ServiceBus, I find events generally hits reliably at the scheduled time I had them set to run - so if event 'pop' is supposed to run at 12:30pm it generally would be delivered at that time to my reciever.
I wanted to know is there a guarantee that events are always fired within the scheduled time or is that more of a suggested time and the system can get clogged and backlogged causing longer queues to form?
There are quite a few differences between messages (which are handled with Service Bus) and events, as you can see in the article Choose between Azure messaging services - Event Grid, Event Hubs, and Service Bus.
An event is a lightweight notification of a condition or a state change. The publisher of the event has no expectation about how the event is handled. The consumer of the event decides what to do with the notification. Events can be discrete units or part of a series.
[...]
A message is raw data produced by a service to be consumed or stored elsewhere. The message contains the data that triggered the message pipeline.
It sounds like you need a reliable way to have a timer trigger execute on a specific time. Service Bus is not the correct service for that, since "the message enquing time does not mean that the message will be sent at that time. It will get enqueued, but the actual sending time depends on the queue's workload and its state." (see BrokeredMessage.ScheduledEnqueueTimeUtc Property).
For handling the triggering in a reliable way, you could use services like Logic Apps (if you want to create it low-code/no-code) or Azure Functions (for the Serverless solution with code).
If you're actually looking for events, consider Event Grid.

Where are "queued" Azure Event Grid Blob trigger event messages stored and how can I clear them?

Pardon if my terminology is a little off; I'm new to this.
I have created an Azure Event Grid subscription which triggers an event whenever I upload a file to blob storage. I have an Azure Function which responds to this event. I've got this all working finally, but I have a slough of left-over messages from previous (bad) uploads that are failing periodically (as viewed from the Logs window in the Azure portal for the associated Azure Function). It's as if they're stored in a queue somewhere and retried periodically, though I'm not sure if that's how it works.
In any case, what I want to be able to do is purge any in-transit or queued events, but I don't know where to find them to do this. As far as I know they're just floating about in the ether.
How can I purge these events so they don't keep triggering my Azure Function at random times?
Event Grid will automatically retry delivery of the message if anything other then a 200 or 202 (OK/Accepted) is returned when a delivery attempt is made. By default it will try again for 24 hours and it uses an exponential backup that adds additional time in between each request until it gives up. What you're seeing is that default process running. (You can also configure a dead letter handling with a storage account so the undelivered messages get stored somewhere if it eventually fails).
What you are likely looking for is the Retry Policy you can create when creating a subscription. Pretty sure you can set the number of maximum delivery attempts to 1 so it won't retry (and without dead letter support turned on the message would essentially be dropped). More details on this can be found at https://learn.microsoft.com/en-us/azure/event-grid/manage-event-delivery#set-retry-policy
I'm not aware of any way to "dequeue" already submitted messages without that retry policy already in place - you may have to delete and recreate the subscription to that event grid topic.
In addition to #JoshCarlisle's answer and more clear to the Event Grid message delivery and retry document:
The dead-lettering enables a special case in the retry policy logic.
In the case of the dead-lettering is turn-on and subscriber failed with a HttpStatusCode.BadRequest, the Event Grid will stop a retrying process and the event is sent to the dead-letter endpoint. This error code indicates, that the delivery will never succeed.
the following code snippet shows some properties in the dead-letter message:
"deadLetterReason": "UndeliverableDueToHttpBadRequest",
"deliveryAttempts": 1,
"lastDeliveryOutcome": "BadRequest",
"lastHttpStatusCode": 400,
the following list shows some of the status codes where the Event Grid will continue in the retrying process:
HttpStatusCode.ServiceUnavailable
HttpStatusCode.InternalServerError
HttpStatusCode.RequestTimeout
HttpStatusCode.NotFound
HttpStatusCode.Conflict
HttpStatusCode.Forbidden
HttpStatusCode.Unauthorized
HttpStatusCode.NotImplemented
HttpStatusCode.Gone
Example of the some dead-letter properties, when the HttpStatusCode.RequestTimeout:
"deadLetterReason":"MaxDeliveryAttemptsExceeded",
"deliveryAttempts":3,
"lastDeliveryOutcome":"TimedOut",
"lastHttpStatusCode":408,
Now, you can see the above two difference cases described in the deadLetterReason property such as "UndeliverableDueToHttpBadRequest" vs "MaxDeliveryAttemptsExceeded"
One more thing:
When the dead-lettering is turn-on, the Event Grid will NOT deliver a dead-letter message to the dead-letter endpoint immediately, but after ~300 seconds. I hope this is a bug and it will be fix soon.
Practically, if the subscriber failed for instance HttpStatusCode.BadRequest, we can not wait for 5 minutes the event from the container storage, it must be an event driven close to the real-time.

Azure EventGrid Webhook timeout

Came to know from the documentation that timeout for webhook is 60 secs. If that's the case then are we expecting developers to do asynchronous operations? I mean what if the work that I want to do as part of the webhook takes more than 60 secs? But if we make that operation asynchronous and the work I want do as part of the webhook fails then how do we recover from that situation because we already responded event grid 200 OK. In that case - would we lose the event?
In the scenario like yours such as the event handler processing over 60 seconds, the following can be implemented, based on the retrying and dead-lettering technique:
use the primary event subscription with a retry policy and
dead-lettering. This subscriber (function) with a binding to the storage table will handle a state of the long running (max 24 hrs) event processing and also forwarding the first event message to to the storage queue for triggering a long running process. The response from this primary subscriber will depend from the state of the StorageQueueTrigger function.
every new retry event message will check the state of the long running process and based on that, the response code (for instance OK(200) or Service.Unavailable(503)) is sent back to the Event Grid.
In the above scenario, the retry mechanism represents a "watchdog timer" for watching a long running event message processing. The second function such as QueueTrigger function is yielding a process between the Event Grid and long running process.
In summary, your scenario will require the following:
EventSubscriber with retry policy and dead-lettering for Webhook (EventGridTrigger or HttpTrigger function)
EventGridTrigger or HttpTrigger function
Storage Table
QueueTrigger Function
If anything unusually happen during the watchdog timer, the dead-lettering is sent to your container storage with a deadLetterReason.
Note, that in the case if your long running process is over 5/10 minutes, the StorageQueue trigger needs to be considered in the App Service plan or using your custom worker processor.
Update:
The following screen snippet shows the above solution for "long running subscriber" with a Watchdog timer:
also it can be used directly a StorageQueue Event Handler to yield the long running process from the EventGrid, but in this case, the function has a more responsibilities such as retrying, notification, dead-lettering, etc., see the following picture:

Check the response of a 'Send Event' Logic App action

I have a Logic App that sends an event to a specified Event Hub using the Send Event action.
It seems that regardless of whether or not the event is accepted by the specified Event Hub, the Logic App continues on regardless. Unlike the Azure Functions action, there appears to be no automatically generated StatusCode property available for Send Event action.
Is it possible to check the response from Event Hubs so that I may determain whether or not to halt execution?
Update
After a completed run, it seems that there is a status code returned by Event Hubs, although unusually it seems to be 200 where as typically when sending events it's 201.
However, when editing the Logig App, there doesn't seem to be any way of accessing that status code in order to check the success/failure of the send event action.
You should be able to use #outputs('Send_event')?['statusCode'] to access the status code.

Resources