I'm trying to work with both one WebJob and one Worker Role.
The WebJob will have a BlobTrigger, every time a blob is added to the container a new message will be added to an Azure Storage Queue (call it pending blobs).
Also, there will be a Worker Role which will be pooling messages from the pending blobs queue and will add the blob names to an internal blocking collection that will be processed concurrently by several tasks triggered by the Worker Role.
I've thought in this solution setting my mind in scalability and because there will be a lot of blobs arriving to the container so I don't want to have peaks of CPU consumption.
Some questions came to my mind while developing the solution:
Is there a way to check if the Azure Storage Queue has messages inside?
If I call the GetMessage method and the queue does not have any message the excecution will be blocked until a new message arrive?
Is there a way to manually delete blob receipts?
Is there a way to check if the Azure Storage Queue has messages
inside?
Queues have an ApproximateMessageCount property you can check, for queue depth (note: this isn't 100% accurate, because messages may be added/deleted while checking).
If I call the GetMessage method and the queue does not have any
message the excecution will be blocked until a new message arrive?
GetMessage() is non-blocking. If there are no messages, the call returns. Note: Since you're planning on creating your own reader in a worker role, just be careful when dealing with an empty queue: If you put yourself in a tight loop and continue to blast the queue, you run the risk of exhausting the queue's 2000 transaction/second limit (and you'll probably see excessive network traffic and cpu utilization). How you implement a backoff strategy is up to you, but you'll want to incorporate a backoff of some type.
Maybe it would be better if you use Azure Functions to handle your queue messages. It will only be triggered if a new message appear on the queue.
https://azure.microsoft.com/en-us/documentation/articles/functions-bindings-storage/
FetchAttributes method in CloudQueue class can be used to get different attributes of Queue. The count attribute is one of those attributes.
According to documentation
The ApproximateMessageCount property returns the last value retrieved by the FetchAttributes method, without calling the Queue service.
Related
I have a slightly philosophical problem. We are using Storage Queues for processing the "tickets". The way we have implemented that is we have a background service (worker role) that is polling the storage queue and finding out if there is any ticket to be processed. The nature of the work we do is seasonal. Which means that there won't be tickets all the time to be processed. The problem we are facing with this is - since multiple worker role instances are continuously polling the storage queue, we have cost impact as it's just too many GetMessage() calls.
I came across the Service Bus queue which has event-based capability. There we have the concept of OnMesage() which gets called every time a new message becomes available on a service bus queue.
But my question is - does OnMessage() goes ahead and calls Receive() internally? Which means is it just syntax sugar and internally it is still polling going on and would there be a cost impact in Service Bus Queue case as well?
Any insights into this will be helpful.
Azure Service Bus client is using long polling to retrieve messages from the broker.
By default, it's set to 1 minute or when a message arrives. So if you have a message showing up earlier than 1 minute elapses, it will be retrieved and another poll for 1 minute will be issues. OnMessage/MessageHandler are no exception. It's a higher level abstraction on top of low level receive operation.
My scenario: I have an Azure Storage Queue where messages can come in at any time. If I have 10 items in that queue, it's imperative that they be processed in order. I'm using c# and the windows azure storage SDK.
If the first item fails after, say, 2 seconds it remains invisible on the queue for another 28 seconds (30 second invisibility by default).
Now, my worker will just continue to check a queue for messages and process them as and when. If a queue message fails, it remains invisible and so the next queue item will be processed before the first message is retried.
This seems like really basic functionality for anyone needing a queue where the items are processed in order.
No, I can't set the timeout to a smaller amount because tasks can take varying lengths of time.
George, if you are looking for a messaging queue solution that processes items in order, you should consider using Azure Service Bus Queues:
As a solution architect/developer, you should consider using Service Bus queues when:
Your solution must be able to receive messages without having to poll the queue. With Service Bus, this can be achieved through the use of the long-polling receive operation using the TCP-based protocols that Service Bus supports.
Your solution requires the queue to provide a guaranteed first-in-first-out (FIFO) ordered delivery.
You want a symmetric experience in Azure and on Windows Server (private cloud).
For more information, see Service Bus for Windows Server.
Your solution must be able to support automatic duplicate detection.
There is a good article comparing both Storage Queues and Service Bus: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted , you may find the latter better suitable for your case.
When using an azure webjobs queue, is it possible to queue a single message with a particular visibility time (i.e. when the message becomes available on the queue for processing)?
For the sake of retrying messages that we fail to process, it would be helpful to be able to re-queue with some sort of back-off so that transient problems have a chance to resolve themselves.
It is indeed possible when working with Service Bus Queues. Azure Storage Queues do NOT have this behavior. The QueueClient ScheduleMessageAsync method will allow you to do so.
You can subscribe to asynchronous updates from Azure topics and queues by using SubscriptionClient/QueueClient's .OnMessage call which will presumably create a separate thread polling the topic/queue with default settings and calling a defined callback if it receives anything.
Azure website says that receiving a message is a billable action, which is understandable. However, it isn't clear enough if each those poll requests are considered billable even when they do not return anything, i.e. the queue in question has no pending messages.
Based on the Azure Service Bus Pricing FAQ - the answer to your question is yes
In general, management operations and “control messages,” such as
completes and deferrals, are not counted as billable messages. There
are two exceptions:
Null messages delivered by the Service Bus in
response to requests against an empty queue, subscription, or message
buffer, are also billable. Thus, applications that poll against
Service Bus entities will effectively be charged one message per poll.
Setting and getting state on a MessageSession will also result in
billable messages, using the same message size-based calculation
described above.
Given the price is $0.01 per 10,000 messages, I don't think you should worry too much about that.
I need to keep track of how many failed attempts have been made to process a message in an azure storage queue and delete the message after N unsuccesful attempts.
I have searched, but have not found any particular property that does this automaticaly and was wondering if there was a way other than using a counter in a storage table.
Each cloud queue message has a DequeueCount property. Does this help?
REST API reference here.
As for how to delete messages automatically after n attempts: There's nothing that automatically does this. You'll need to implement your own poison-message handling in Windows Azure queues, based on DequeueCount.
Alternatively, Azure Service Bus queues have a dead-letter queue for undeliverable messages (or ones that can't be processed). More info here.