Retriggering an Azure Cosmos DB Trigger - azure

I've ran into a problem with my Azure Cosmos DB trigger. Apparently some of the triggers failed and thus didn't complete sending the data to a specific service. As far as I can see, there is no easy way to 'retrigger' those events, without actually inserting the data in Cosmos again.
I read somewhere that I could insert the incoming data from the trigger into a ServiceBus queue message and handle it from there. Then I can use the deadletter queue to potentially requeue failed items. However, the messages contain a couple kB's of data. I'm not sure if that is wise..
What would be the best way to tackle this issue?
Thanks!

You can only retrigger by
Modifying the document (replace)
Manually call the trigger by using the API and pass the document's content
Put the message into a separate queue as you mentioned
Use retries on the CosmosDB trigger for short lived transient issues.
We have been doing the ServiceBus solution for quite a bit now without any issues. The maximum message size is 256KB for standard tier, which is plenty.
If the size is really an issue for you, you could only put the documentId into ServiceBus. However this creates a solution that is more read-intensive for your CosmosDB. If you want to avoid that then the solution gets even more complex.
This is already quite opinionated, but the ServiceBus solution is in my experience very robust and not very complex. You can use the manual approach if you only need this very rarely to "fake" re-triggering of the event.

Related

Using Azure Service Bus to handle lots of votes and process results with Azure Functions

I am creating a poll app. Users can define one or more poll questions and configure answer options. Guests can join a session and when a poll (question) is activated, start voting. Basically what a standard poll looks like.
For processing the incoming votes, I use the Azure Service Bus. I have an endpoint that accepts votes and sends a message to a Service Bus Queue. Then, an Azure Function with a Service Bus Queue trigger will consume that message and persist the vote somewhere in a repository.
My problem is that I want another 'background process', I imagine another Azure Function, that will be triggered when votes come in, to go and calculate the cumulative of votes to be able to draw a pie chart.
Now I want this Function to be triggered as efficiently as possible. Key is that it must be accurate. What I'm looking for, is a method that will trigger the calculation once when a vote comes in, but when a bunch of votes comes in, I want to trigger the calculation only once after the last vote was persisted. I was thinking of introducing a new queue to send 'calculation commands' to. I use a real-time framework to update the pie chart. I would like to send pie-chart updates frequently, but not necessarily thousands of times a second when huge amounts of votes came in in a short amount of time.
I looked for a solution where I can use the de-duplication of an SB queue, but I think this de-dup also checks for previously sent messages. And using this solution does not guarantee that the calculation takes place after the last vote has been processed, because the message may be recognized as a duplicate and therefore ignored.
Another solution may be to introduce a SessionId for the votes queue allowing me to overcome the problem that vote messages are handled simultaneously, but this feels like an anti-pattern using the Service Bus. In the end, you want the thing to scale like a maniac when large amounts of votes come in, so for that reason, the session is a no go to me.
And now I'm running out of ideas, is there a mechanism that I overlooked that I can take advantage of to (for example) only put a message on a queue when there is no similar message waiting to be processed (e.g. without a lock) or something?
You can trigger the Function using one of the available Event Grid events for Service Bus, if the concern is that you don't want a listener to run at all times.
The Azure Functions approach suggested by Clemens is a viable approach. You probably don't need Event Grid because your function could be triggered by the Service Bus queue.
I want to trigger the calculation only once after the last vote was persisted.
If there is a way to indicate voting period is over, you could have a 2nd function that runs the calculations from the data stored by processing voting messages. One thing to watch out for is how the 1st function that accepts the voting messages stores the data. If the data is stored in append-only mode, you're good. If you're trying to keep a counter only, you'll have contention and don't recommend that approach. Append only is a more efficient approach.

Azure Durable Functions as Message Queue

I have a serverless function that receives orders, about ~30 per day. This function is depending on a third-party API to perform some additional lookups and checks. However, this external endpoint isn't 100% reliable and I need to be able to store order requests if the other API isn't available for a couple of hours (or more..).
My initial thought was to split the function into two, the first part would receive orders, do some initial checks such as validating the order, then post the request into a message queue or pub/sub system. On the other side, there's a consumer that reads orders and tries to perform the API requests, if the API isn't available the orders get posted back into the queue.
However, someone suggested to me to simply use an Azure Durable Function for the requests, and store the current backlog in the function state, using the Aggregator Pattern (especially since the API will be working find 99.99..% of the time). This would make the architecture a lot simpler.
What are the advantages/disadvantages of using one over the other, am I missing any important considerations?
I would appreciate any insight or other suggestions you have. Let me know if additional information is needed.
You could solve this problem with Durable Task Framework or Azure Storage or Service Bus Queues, but at your transaction volume, I think that's overcomplicating the solution.
If you're dealing with ~30 orders per day, consider one of the simpler solutions:
Use Polly, a well-supported resilience and fault-tolerance framework.
Write request information to your database. Have an Azure Function Timer Trigger read occasionally and finish processing orders that aren't marked as complete.
Durable Task Framework is great when you get into serious volume. But there's a non-trivial learning curve for the framework.

Azure Function Event Hub Trigger reliability

I'm a bit confused regarding the EventHubTrigger for Azure functions.
I've got an IoT Hub, and am using its eventhub-compatible endpoint to trigger an Azure function that is going to process and store the received data.
However, if my function fails (= throws an exception), that message (or messages) being processed during that function call will get lost. I actually would expect the Azure function runtime to process the messages at a later time again. Specifically, I would expect this behavior because the EventHubTrigger is keeping checkpoints in the Function Apps storage account in order to keep track of where in the event stream it has to continue.
The documention of the EventHubTrigger even states that
If all function executions succeed without errors, checkpoints are added to the associated storage account
But still, even when I deliberately throw exceptions in my function, the checkpoints will get updated and the messages will not get received again.
Is my understanding of the EventHubTriggers documentation wrong, or is the EventHubTriggers implementation (or its documentation) wrong?
This piece of documentation seems confusing indeed. I guess they mean the errors of Function App host itself, not of your code. An exception inside function execution doesn't stop the processing and checkpointing progress.
The fact is that Event Hubs are not designed for individual message retries. The processor works in batches, and it can either mark the whole batch as processed (i.e. create a checkpoint after it), or retry the whole batch (e.g. if the process crashed).
See this forum question and answer.
If you still need to re-process failed events from Event Hub (and errors don't happen too often), you could implement such mechanism yourself. E.g.
Add an output Queue binding to your Azure Function.
Add try-catch around processing code.
If exception is thrown, add the problematic event to the Queue.
Have another Function with Queue trigger to process those events.
Note that the downside of this is that you will loose ordering guarantee provided by Event Hubs (since Queue message will be processed later than its neighbors).
Quick fix. As retry policy would not work if down system is down for few hours. You can call Process.GetCurrentProcess().Kill(); in exception handling. This would stop the checkpoint moving forward. I have tested this with consumption based function app. You will not see anything in logs but i added email to notify that something went wrong and to avoid data loss i have killed the function instance.
Hope this helps.
Would put an blog over it and other part of workflow where I stop function in case of continuous failure on down system using logic app.

Azure Functions notification on failure

I have timer-triggered Azure functions running in production, but now I want to be notified if the function fails.
In my case, access to various connected services can cause crashes, and there are many to troubleshoot. The crash is the type of error I need notification for.
When the function does fail, the log entry indicates failure, so I wonder if there is a hook in the system that would allow me to cause the system to generate a notification.
I know that blob and queue bindings, for instance, support the creation of poison queue entries, but timer trigger binding doesn't say anything about any trigger outputs of that nature.
I see that functions can pass their $return status as input to other functions, but that operation is not explained in depth in the docs. Also, in that case, I need to write another function to process the error status, and I was looking for something built-in.
I Have inquired with #AzureSupport on this, but their answer had nothing to do with Azure Functions, instead referring me to DLL notification hooks, then recommending I file on uservoice.
I'm sure there must be people here who have implemented some sort of error status notification. I prefer a solution that doesn't require code.
The recommended way to monitor and alert on failures is to use AppInsights which integrates fully with Azure Functions now
https://blogs.msdn.microsoft.com/appserviceteam/2017/04/06/azure-functions-application-insights/
Since all the logs are available in AppInsights it's easy to monitor for failures and setup alerts based on your own criteria.
However, if you only care about alerting and not things like monitoring etc, you could use Azure Monitor instead: https://learn.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-get-started
When the function does fail, the log entry indicates failure, so I wonder if there is a hook in the system that would allow me to cause the system to generate a notification.
...
I prefer a solution that doesn't require code.
This is a zero-code solution:
I poked #AzureFunctions once before on this topic, and a suggested response was to use Application Insights. It can handle the alerts upon failure and also can use webhooks.
See the Azure Functions App-Insights documentation on how to link your function app to App Insights. Then set up any alerts you want.
Unfortunately this hook doesn't exist.
Can you switch from a timer trigger to a queue trigger?
You can get retries (if you want them), and after the specified number of attempts the message is sent to a poison queue.
To schedule executions you can add queue messages with a visibility timeout to match your schedule.
In order to get alerts on failure you have two options:
A timer trigger than scans the execution logs (via SFTP) for failures.
Wrap the whole function in a try/catch block and in the catch block write a few lines to send you an email with the error details.
Hope this helps.
No code:
Go to your azure cloud account
From the menu select Monitor
Then select Add New Rule
Then Select your condition, action and add the alert details.

Requeue or delete messages in Azure Storage Queues via WebJobs

I was hoping if someone can clarify a few things regarding Azure Storage Queues and their interaction with WebJobs:
To perform recurring background tasks (i.e. add to queue once, then repeat at set intervals), is there a way to update the same message delivered in the QueueTrigger function so that its lease (visibility) can be extended as a way to requeue and avoid expiry?
With the above-mentioned pattern for recurring background jobs, I'm also trying to figure out a way to delete/expire a job 'on demand'. Since this doesn't seem possible outside the context of WebJobs, I was thinking of maybe storing the messageId and popReceipt for the message(s) to be deleted in Table storage as persistent cache, and then upon delivery of message in the QueueTrigger function do a Table lookup to perform a DeleteMessage, so that the message is not repeated any more.
Any suggestions or tips are appreciated. Cheers :)
Azure Storage Queues are used to store messages that may be consumed by your Azure Webjob, WorkerRole, etc. The Azure Webjobs SDK provides an easy way to interact with Azure Storage (that includes Queues, Table Storage, Blobs, and Service Bus). That being said, you can also have an Azure Webjob that does not use the Webjobs SDK and does not interact with Azure Storage. In fact, I do run a Webjob that interacts with a SQL Azure database.
I'll briefly explain how the Webjobs SDK interact with Azure Queues. Once a message arrives to a queue (or is made 'visible', more on this later) the function in the Webjob is triggered (assuming you're running in continuous mode). If that function returns with no error, the message is deleted. If something goes wrong, the message goes back to the queue to be processed again. You can handle the failed message accordingly. Here is an example on how to do this.
The SDK will call a function up to 5 times to process a queue message. If the fifth try fails, the message is moved to a poison queue. The maximum number of retries is configurable.
Regarding visibility, when you add a message to the queue, there is a visibility timeout property. By default is zero. Therefore, if you want to process a message in the future you can do it (up to 7 days in the future) by setting this property to a desired value.
Optional. If specified, the request must be made using an x-ms-version of 2011-08-18 or newer. If not specified, the default value is 0. Specifies the new visibility timeout value, in seconds, relative to server time. The new value must be larger than or equal to 0, and cannot be larger than 7 days. The visibility timeout of a message cannot be set to a value later than the expiry time. visibilitytimeout should be set to a value smaller than the time-to-live value.
Now the suggestions for your app.
I would just add a message to the queue for every task that you want to accomplish. The message will obviously have the pertinent information for processing. If you need to schedule several tasks, you can run a Scheduled Webjob (on a schedule of your choice) that adds messages to the queue. Then your continuous Webjob will pick up that message and process it.
Add a GUID to each message that goes to the queue. Store that GUID in some other domain of your application (a database). So when you dequeue the message for processing, the first thing you do is check against your database if the message needs to be processed. If you need to cancel the execution of a message, instead of deleting it from the queue, just update the GUID in your database.
There's more info here.
Hope this helps,
As for the first part of the question, you can use the Update Message operation to extend the visibility timeout of a message.
The Update Message operation can be used to continually extend the
invisibility of a queue message. This functionality can be useful if
you want a worker role to “lease” a queue message. For example, if a
worker role calls Get Messages and recognizes that it needs more time
to process a message, it can continually extend the message’s
invisibility until it is processed. If the worker role were to fail
during processing, eventually the message would become visible again
and another worker role could process it.
You can check the REST API documentation here: https://msdn.microsoft.com/en-us/library/azure/hh452234.aspx
For the second part of your question, there are really multiple ways and your method of storing the id/popReceipt as a lookup is a possible option, you can actually have a Web Job dedicated to receive messages on a different queue (e.g plz-delete-msg) and you send a message containing the "messageId" and this Web Job can use Get Message operation then Delete it. (you can make the job generic by passing the queue name!)
https://msdn.microsoft.com/en-us/library/azure/dd179474.aspx
https://msdn.microsoft.com/en-us/library/azure/dd179347.aspx

Resources