I might be confused how EventHubs supposed to be used or need guidance on how to reliably process events posted into Eventhub. I export Azure ActivityLog to Eventhub and currently just using console application to read those messages. What I don't understand is what I'm supposed to do with events which I already read and processed. Say I want to write content of all messages into Storage account AppendLog. For this I need to delete messages which I already processed (like it would be done if it will be message queue), how do I do that with eventhub?
You cannot delete them. From the docs:
Event Hubs retains data for a configured retention time that applies across all partitions in the event hub. Events expire on a time basis; you cannot explicitly delete them.
Back to your question:
Say I want to write content of all messages into Storage account AppendLog. For this I need to delete messages which I already processed
I am not sure why you need this though. You can keep a pointer to the last read message so you are able to process only new messages. Why should you need to delete the older ones? You can read about offsets and ceckpointing here.
What technique are you using for reading the messages?
If you need a pattern of popping messages, you need the Queue or Topic from the Azure Service Bus.
When you ack that message, it is popped from the queue.
Related
It is possible to set up TTL for messages in Azure Service Bus. I am wondering if there is a possibility to do the same for Dead Letter Queue?
What I want to achieve is "auto-cleaning" of the DLQ of the old messages that are probably not relevant anymore anyway, so that we don't need to do this manually (which is not supported out of the box either).
What I want to achieve is "auto-cleaning" of the DLQ of the old messages that are probably not relevant anymore anyway so that we don't need to do this manually (which is not supported out of the box either).
You can receive and delete messages from the dead-letter queue, but you cannot set up a TTL on the dead-lettered messages as those are created and moved into the sub-queue by the service. While the dead-letter queue mimics the regular queue in many concepts, it is not.
One of the semi-automated would be to create a process that peeks messages and completes based on the criteria you define, such as message age. Unfortunately, there's no good way to peek at messages in general. Not much can be done for the dead-lettered messages, other than peeking all and then filtering out those that need to be actioned.
Another alternative is to transition those dead-lettered messages into a database and then have a process to retire based on the defined criteria w/o the need to peek at all of the messages constantly.
All,
I have storage queue triggered Azure Function. It loads various data into a database from files. I specify the input file in the message sent into the input queue.
However when I send a message into the queue my function starts in multiple instances and tries to insert the same file into the db. If I log msg.dequeue_count I see it rising.
What shall I do to start only one function for each message? Please note I'd like to keep the possibility to start multiple instance for multiple messages to load different files parallel.
This question was also asked here and the answer was to check out the chart comparing storage and service bus queues.
Bottom line is that storage queues offer 'at least once' delivery. If you want 'at most once' you should use service bus and PeekLock or ReceiveAndDelete.
I have 65k records in Azure Service Bus Topic, while testing, whenever my test application is started, it reads all the 65k records. Can you please help me how can we avoid reading messages that have already read or How can we read only the messages that are send after executing test application?
From the question, it's unclear what exactly you're after. Here are a few things for consideration.
Queues/subscriptions are intended to be read by the consumers, not to store messages and access conditionally. To avoid consuming messages, you should consume those either by using ReceiveAndDelete receiving more, or PeekLock and completing the received messages.
If these messages are test messages and are not intended for the production, do not mix the environments and use different namespaces.
Alternatively, set a short TimeToLive on your test messages to get rid of those. You could also drop the entity and recreate it, but I try to avoid this if your performing testing quite often.
My service consumes messages from an Azure Service Bus subscription. A dependency of my service was down for a while, which caused a lot of messages to end up in the deadletter queue (DLQ). Now that the service is back up, I want to reprocess all messages from the DLQ. How can I move/resubmit all messages from the DLQ back in to the main queue.
Restrictions:
It's thousands of messages, so manually handling them isn't feasible.
The topic has about ten subscriptions. I don't want to resubmit the messages to the topic, because then all subscriptions would receive the messages, leading to double-processing.
I don't want to run the service against the DLQ directly, because some messages are broken and cause permanent errors, i.e. they would end up in the DLQ again, which would lead to an infinite loop. Moreover, the broken messages are put back at the front of the queue, effectively starving healthy messages that come after the broken ones.
I realize this is a while after the original post but if anyone else stumbles on this problem, there is a fairly handy solution baked into the Service Bus Explorer (which I have found to be incredibly handy with ASB development).
After connecting to your Service Bus and finding the needed namespace, find the desired topic and subscription with the deadletters in it. From there Right Click and Receive Deadletter Queue Messages and hit OK.
From there, highlight which you would like to send back to the main queue and hit Resubmit Selected Messages in Batch Mode.
Thomas, you probably already found your answer since this is quite awhile ago. think of DLQ (or any existing queue that you have) as just another collection variable like in a PC app, but residing on the cloud. just like a PC-app or in-memory collection variable from your tool-kit, you have many ways of utilising it. off course there are limitations and differences between these 2 types of collection variables, but that's how you design your solution as though the DLQ is just another collection variable by knowing those limitations and differences.
For some queuing implementations, one of the solutions would be to have another instance of the same app pointing to the DLQ, but with a fairly long visibility timeout (e.g. 6 or 12 or even 24 hours depending on your SLA), since you don't want to repeat them too often. However, this is not applicable to Azure service bus, as it limits the visibility timeout to at most 5 minutes.
if the DLQ contains broken un-recoverable jobs, you should fix the app to delete them based on the error messages when the unknown exception occurred. once the fix is deployed, such broken un-recoverable jobs would have been removed by your app and never get sent to the DLQ in the first place. and those already in the DLQ will be removed by the fixed app.
The only option to replay DLQ messages is to receive them from DLQ, create new message with same content and send it again to the topic. They will end up at the end of subscription queue.
You can't send messages directly to the subscription. There is a trick to add a metadata property to the message, and then adjust all except one subscription to filter out such messages. It's up to you to decide if it's going to help in your scenario.
As for tooling, we always did that with custom code, because we always needed some extra work to be done, like logging each replayed message for further analysis.
The quick answer is that you cannot directly move messages back to the main queue of a subscription. This is by design with how Microsoft implemented their topics and subscriptions.
Option #1
There is the option to use Azure Service Bus topic filters https://learn.microsoft.com/en-us/azure/service-bus-messaging/topic-filters and define/tag your messages in a manner that would only allow them to be received on the targeted subscription.
Option #2
The other option would be to change your current implementation. You would set up "delivery queues" (regular service bus queues) and configure each corresponding subscription to auto forward its messages to these delivery queues. Your message processing logic would then listen on these "delivery queues" vs the subscription. Any failures would then result in DLQ messages on these associated "delivery queues" which could then be handled outside of the topic/subscriptions.
I was hoping if someone can clarify a few things regarding Azure Storage Queues and their interaction with WebJobs:
To perform recurring background tasks (i.e. add to queue once, then repeat at set intervals), is there a way to update the same message delivered in the QueueTrigger function so that its lease (visibility) can be extended as a way to requeue and avoid expiry?
With the above-mentioned pattern for recurring background jobs, I'm also trying to figure out a way to delete/expire a job 'on demand'. Since this doesn't seem possible outside the context of WebJobs, I was thinking of maybe storing the messageId and popReceipt for the message(s) to be deleted in Table storage as persistent cache, and then upon delivery of message in the QueueTrigger function do a Table lookup to perform a DeleteMessage, so that the message is not repeated any more.
Any suggestions or tips are appreciated. Cheers :)
Azure Storage Queues are used to store messages that may be consumed by your Azure Webjob, WorkerRole, etc. The Azure Webjobs SDK provides an easy way to interact with Azure Storage (that includes Queues, Table Storage, Blobs, and Service Bus). That being said, you can also have an Azure Webjob that does not use the Webjobs SDK and does not interact with Azure Storage. In fact, I do run a Webjob that interacts with a SQL Azure database.
I'll briefly explain how the Webjobs SDK interact with Azure Queues. Once a message arrives to a queue (or is made 'visible', more on this later) the function in the Webjob is triggered (assuming you're running in continuous mode). If that function returns with no error, the message is deleted. If something goes wrong, the message goes back to the queue to be processed again. You can handle the failed message accordingly. Here is an example on how to do this.
The SDK will call a function up to 5 times to process a queue message. If the fifth try fails, the message is moved to a poison queue. The maximum number of retries is configurable.
Regarding visibility, when you add a message to the queue, there is a visibility timeout property. By default is zero. Therefore, if you want to process a message in the future you can do it (up to 7 days in the future) by setting this property to a desired value.
Optional. If specified, the request must be made using an x-ms-version of 2011-08-18 or newer. If not specified, the default value is 0. Specifies the new visibility timeout value, in seconds, relative to server time. The new value must be larger than or equal to 0, and cannot be larger than 7 days. The visibility timeout of a message cannot be set to a value later than the expiry time. visibilitytimeout should be set to a value smaller than the time-to-live value.
Now the suggestions for your app.
I would just add a message to the queue for every task that you want to accomplish. The message will obviously have the pertinent information for processing. If you need to schedule several tasks, you can run a Scheduled Webjob (on a schedule of your choice) that adds messages to the queue. Then your continuous Webjob will pick up that message and process it.
Add a GUID to each message that goes to the queue. Store that GUID in some other domain of your application (a database). So when you dequeue the message for processing, the first thing you do is check against your database if the message needs to be processed. If you need to cancel the execution of a message, instead of deleting it from the queue, just update the GUID in your database.
There's more info here.
Hope this helps,
As for the first part of the question, you can use the Update Message operation to extend the visibility timeout of a message.
The Update Message operation can be used to continually extend the
invisibility of a queue message. This functionality can be useful if
you want a worker role to “lease” a queue message. For example, if a
worker role calls Get Messages and recognizes that it needs more time
to process a message, it can continually extend the message’s
invisibility until it is processed. If the worker role were to fail
during processing, eventually the message would become visible again
and another worker role could process it.
You can check the REST API documentation here: https://msdn.microsoft.com/en-us/library/azure/hh452234.aspx
For the second part of your question, there are really multiple ways and your method of storing the id/popReceipt as a lookup is a possible option, you can actually have a Web Job dedicated to receive messages on a different queue (e.g plz-delete-msg) and you send a message containing the "messageId" and this Web Job can use Get Message operation then Delete it. (you can make the job generic by passing the queue name!)
https://msdn.microsoft.com/en-us/library/azure/dd179474.aspx
https://msdn.microsoft.com/en-us/library/azure/dd179347.aspx