Azure EventGrid Webhook timeout - azure

Came to know from the documentation that timeout for webhook is 60 secs. If that's the case then are we expecting developers to do asynchronous operations? I mean what if the work that I want to do as part of the webhook takes more than 60 secs? But if we make that operation asynchronous and the work I want do as part of the webhook fails then how do we recover from that situation because we already responded event grid 200 OK. In that case - would we lose the event?

In the scenario like yours such as the event handler processing over 60 seconds, the following can be implemented, based on the retrying and dead-lettering technique:
use the primary event subscription with a retry policy and
dead-lettering. This subscriber (function) with a binding to the storage table will handle a state of the long running (max 24 hrs) event processing and also forwarding the first event message to to the storage queue for triggering a long running process. The response from this primary subscriber will depend from the state of the StorageQueueTrigger function.
every new retry event message will check the state of the long running process and based on that, the response code (for instance OK(200) or Service.Unavailable(503)) is sent back to the Event Grid.
In the above scenario, the retry mechanism represents a "watchdog timer" for watching a long running event message processing. The second function such as QueueTrigger function is yielding a process between the Event Grid and long running process.
In summary, your scenario will require the following:
EventSubscriber with retry policy and dead-lettering for Webhook (EventGridTrigger or HttpTrigger function)
EventGridTrigger or HttpTrigger function
Storage Table
QueueTrigger Function
If anything unusually happen during the watchdog timer, the dead-lettering is sent to your container storage with a deadLetterReason.
Note, that in the case if your long running process is over 5/10 minutes, the StorageQueue trigger needs to be considered in the App Service plan or using your custom worker processor.
Update:
The following screen snippet shows the above solution for "long running subscriber" with a Watchdog timer:
also it can be used directly a StorageQueue Event Handler to yield the long running process from the EventGrid, but in this case, the function has a more responsibilities such as retrying, notification, dead-lettering, etc., see the following picture:

Related

Logic App (Consumption) Handling Lost Locks (not Timeout)

We have instances where our service bus message lock has been lost before it can be completed. MS referred me to the documentation:
Important
It is important to note that the lock that PeekLock acquires on the
message is volatile and may be lost in the following conditions
Service Update OS update Changing properties on the entity (Queue,
Topic, Subscription) while holding the lock. When the lock is lost,
Azure Service Bus will generate a MessageLockLostException which will
be surfaced on the client application code. In this case, the client's
default retry logic should automatically kick in and retry the
operation.
We already handle the 5 minute timeout with a parallel loop. Now we need to handle a lost lock due to volatility. What is everyone's best practice here?
A resubmit is not appropriate - in case of duplication
Dead-lettering cannot be done because the lock is lost, a second instance will already have started for the same message
Message could be completed immediately, however we lose the dead-letter ability etc...
Could this be a good solution for you?
Change the logic app to be http triggered
Add another logic app triggered by the message that creates a record in a storage of some sort with the state of processing the message set to 0 for example and then calls the first logic app
Add another logic app that sets the record to complete state 1 for example
When the logic app finishes, it calls the second logic app to update the record
What happens:
message arrives
new logicapp1 picks it and completes the message
new logicapp1 creates a record and calls your main logic app
main logic app do its processing
main logic app calls new logicapp2
new logicapp2 updates the record as completed

ServiceBus message delivery time reliable?

I'm working on creating an events system with Azure ServiceBus, I find events generally hits reliably at the scheduled time I had them set to run - so if event 'pop' is supposed to run at 12:30pm it generally would be delivered at that time to my reciever.
I wanted to know is there a guarantee that events are always fired within the scheduled time or is that more of a suggested time and the system can get clogged and backlogged causing longer queues to form?
There are quite a few differences between messages (which are handled with Service Bus) and events, as you can see in the article Choose between Azure messaging services - Event Grid, Event Hubs, and Service Bus.
An event is a lightweight notification of a condition or a state change. The publisher of the event has no expectation about how the event is handled. The consumer of the event decides what to do with the notification. Events can be discrete units or part of a series.
[...]
A message is raw data produced by a service to be consumed or stored elsewhere. The message contains the data that triggered the message pipeline.
It sounds like you need a reliable way to have a timer trigger execute on a specific time. Service Bus is not the correct service for that, since "the message enquing time does not mean that the message will be sent at that time. It will get enqueued, but the actual sending time depends on the queue's workload and its state." (see BrokeredMessage.ScheduledEnqueueTimeUtc Property).
For handling the triggering in a reliable way, you could use services like Logic Apps (if you want to create it low-code/no-code) or Azure Functions (for the Serverless solution with code).
If you're actually looking for events, consider Event Grid.

What happens to the messages being processed on functions running when we disable the function?

We are working with Azure functions, which are triggered on every message in the service bus queue. We are trying to solve a problem whereby we need to disable a function on the function app processing messages, dynamically, so that it does not process messages any further and we do not lose any message in the process as well.
We can disable the functions via multiple ways, referring to link but the problem remains the same. Unable to figure out what happens to the functions already spawned when trying to disable the same.
Since the function is service bus triggered there is always a possibility that the function is processing a message and we disable the same, does it get processed, any sorts of cancellation is raised, it just dies out with an exception?
It would be great someone could direct me to some documentation or something. Thanks.
Azure Service Bus triggered function will already have a lock on the message that's being processed. If Function is terminated and the message was not completed or disposition, the lock will expire and the message will reappear on the queue. That's because of the Functions runtime receives a message in PeekLock mode.
One factor to consider is the queue's MaxDeliveryCount. If a function is terminated upon the last processing attempt, the message will be dead-lettered as all processing attempts have been exhausted. That's a standard Azure Service Bus behaviour.

Azure Service Bus - cancelled scheduled messages getting re-queued

I'm using the latest Java bindings (v3.1.3) for Azure Service Bus: https://github.com/Azure/azure-sdk-for-java/tree/master/sdk/servicebus
When I create a new queue client, schedule a message, and cancel it...
QueueClient sendClient = new QueueClient(new ConnectionStringBuilder(connectionString, queueName), ReceiveMode.PEEKLOCK);
long sequenceNumber = sendClient.scheduleMessage(message, instant);
...
sendClient.cancelScheduledMessage(sequenceNumber)
...the code appears to work as intended: The active message count goes to 0. But as soon as the scheduled message gets to the time it was supposed to be scheduled (I tested with 10 seconds and 100 seconds in the future), the message sometimes gets re-queued with a new sequence number. I'm not getting any errors when scheduling or cancelling the messages. Is there something I can do to make sure cancelled messages don't get re-queued?
From my own testing, I found that cancelling a service bus message in a short time frame after the scheduled message was sent to the service bus queue does not always process the cancellation as expected. In general we're talking only a few seconds but the behaviour is not entirely consistant.
My conslusion is that there will be some latency between the scheduled message being queued to when a cancellation of that same message is registered which means that canclelling a scheduled message almost straight away after sending it to the queue will not always stop it being processed.
Therefore in my environment, I had to provide my own fallback feature to check additional custom properties in the service bus message, so when it arrives back at my subscriber app, i use an IF Statement to check the status of the custom property so I can chose whether to ignore it and not process anything more.
This really caught me out for a little while as my environement was rather complex and I assumed there was some issue in my code somwhere along the line which in the end, once I factored in the above annomlly and started to see how the service bus was responding to the schedule message cancellation, I was able to overcome this issue.
You can schedule messages either by setting the ScheduledEnqueueTimeUtc property when sending a message through the regular send path, or explicitly with the ScheduleMessageAsync API. The latter immediately returns the scheduled message's SequenceNumber, which you can later use to cancel the scheduled message if needed.
Cancels the enqueuing of an already sent scheduled message, if it was not already enqueued. This is an asynchronous method returning a CompletableFuture which completes when the message is cancelled.
So, I suggest that you could use cancelScheduledMessageAsync to cancel scheduled message.

Where are "queued" Azure Event Grid Blob trigger event messages stored and how can I clear them?

Pardon if my terminology is a little off; I'm new to this.
I have created an Azure Event Grid subscription which triggers an event whenever I upload a file to blob storage. I have an Azure Function which responds to this event. I've got this all working finally, but I have a slough of left-over messages from previous (bad) uploads that are failing periodically (as viewed from the Logs window in the Azure portal for the associated Azure Function). It's as if they're stored in a queue somewhere and retried periodically, though I'm not sure if that's how it works.
In any case, what I want to be able to do is purge any in-transit or queued events, but I don't know where to find them to do this. As far as I know they're just floating about in the ether.
How can I purge these events so they don't keep triggering my Azure Function at random times?
Event Grid will automatically retry delivery of the message if anything other then a 200 or 202 (OK/Accepted) is returned when a delivery attempt is made. By default it will try again for 24 hours and it uses an exponential backup that adds additional time in between each request until it gives up. What you're seeing is that default process running. (You can also configure a dead letter handling with a storage account so the undelivered messages get stored somewhere if it eventually fails).
What you are likely looking for is the Retry Policy you can create when creating a subscription. Pretty sure you can set the number of maximum delivery attempts to 1 so it won't retry (and without dead letter support turned on the message would essentially be dropped). More details on this can be found at https://learn.microsoft.com/en-us/azure/event-grid/manage-event-delivery#set-retry-policy
I'm not aware of any way to "dequeue" already submitted messages without that retry policy already in place - you may have to delete and recreate the subscription to that event grid topic.
In addition to #JoshCarlisle's answer and more clear to the Event Grid message delivery and retry document:
The dead-lettering enables a special case in the retry policy logic.
In the case of the dead-lettering is turn-on and subscriber failed with a HttpStatusCode.BadRequest, the Event Grid will stop a retrying process and the event is sent to the dead-letter endpoint. This error code indicates, that the delivery will never succeed.
the following code snippet shows some properties in the dead-letter message:
"deadLetterReason": "UndeliverableDueToHttpBadRequest",
"deliveryAttempts": 1,
"lastDeliveryOutcome": "BadRequest",
"lastHttpStatusCode": 400,
the following list shows some of the status codes where the Event Grid will continue in the retrying process:
HttpStatusCode.ServiceUnavailable
HttpStatusCode.InternalServerError
HttpStatusCode.RequestTimeout
HttpStatusCode.NotFound
HttpStatusCode.Conflict
HttpStatusCode.Forbidden
HttpStatusCode.Unauthorized
HttpStatusCode.NotImplemented
HttpStatusCode.Gone
Example of the some dead-letter properties, when the HttpStatusCode.RequestTimeout:
"deadLetterReason":"MaxDeliveryAttemptsExceeded",
"deliveryAttempts":3,
"lastDeliveryOutcome":"TimedOut",
"lastHttpStatusCode":408,
Now, you can see the above two difference cases described in the deadLetterReason property such as "UndeliverableDueToHttpBadRequest" vs "MaxDeliveryAttemptsExceeded"
One more thing:
When the dead-lettering is turn-on, the Event Grid will NOT deliver a dead-letter message to the dead-letter endpoint immediately, but after ~300 seconds. I hope this is a bug and it will be fix soon.
Practically, if the subscriber failed for instance HttpStatusCode.BadRequest, we can not wait for 5 minutes the event from the container storage, it must be an event driven close to the real-time.

Resources