Azure Storage Queue - long time to process - azure

I need to generate quite a number of reports and a report can take about 5 minutes to be generated, large amount of data, many different sources.
The client will post messages to an Azure Storage Queue. There is a worker roles that processes the messages and generates the reports.
If I want to scale this up let's say I end up with 10 worker roles that will process the messages from the queue and generate the reports. Then I will add messages into the queue like this:
message 1: process reports from 1 - 5
message 2: process reports from 6 - 11
........
message 10: process reports from 50 - 55 (might not be accurate the range)
If my worker role 1 will take the first message and put a lock on it but the process will take 5 minutes, the lock will expire and the message will be visible again in the queue so the worker role 2 will take it and start processing it ... and so forth
How can I avoid that consuming the queue message is done only once keeping in mind that the task is a long one?

First of all: Using Azure Storage queues, you should be prepared for all of your operations to be idempotent: In case your queue item is processed multiple times, the same result should happen each time. The reason I bring this up: There's simply no way to guarantee you'll process the message one time (unless you check the DequeueCount property of the message and halt processing accordingly), due to unexpected events such as your role instance crashing/rebooting or your queue item processing code doing something unexpected like throwing an exception.
Next: Queue message invisibility timeout can be programmatically extended. This can be done via the queue api or via one of the language sdk's. In c# (something like this - I didn't test this), extending an additional minute:
queueMessage.UpdateMessage(message,
TimeSpan.FromSeconds(60),
MessageUpdateFields.Visibility);
You can also modify the message along the way (maybe as a hint to your code, to let you know which of the 5 reports has been complete. This should help your specific issue: In the event the message gets reprocessed, you don't have to process all five reports if the message has been modified to say something like "process reports from 3-5"). Note: You can combine the MessageUpdateFields flags via |:
queueMessage.UpdateMessage(message,
TimeSpan.FromSeconds(0),
MessageUpdateFields.Content);
Lastly: If you're concerned with the length of time taken to process a batch of reports, perhaps rethink why you're processing five reports in each message, vs. one report per message. You can always read queue messages in batches. This is getting a bit subjective, as there's really no right or wrong way to do it, but it's just something for you to think about.

Related

how to process hundreds of JMS message from 2 queues, response time of 1 second and 1 minute respectively

I have business requirement where I have to process messages in a certain priority say priority1 and priority2
We have decided to use 2 JMS queues where priority1 messages will be sent to priority1Queue and priority2 messages will be sent to priority2Queue.
Response time for priority1Queue messages is that the moment message is in Queue, I need to read, process and send the response back to say another queue in 1 second. This means I should immediately process these messages the moment they are in priority1Queue, and I will have hundreds of such messages coming in per second on priority1Queue so I will definitely need to have multiple concurrent consumers consuming messages on this queue so that they can be processed immediately when they are in the queue(consumed and processed within 1 second).
Response time for priority2Queue messages is that I need to read, process and send the response back to say another queue in 1 minute. So the response time of priority2 is lower to priority1 messages however I still need to respond back in a minute.
Can you suggest best possible approach for this so that I can concurrently read messages from both the queue and give higher priority to priority1 messages so that each priority1 message can be read and processed in 1 second.
Mainly how it can be read and fed to a processor so that the next message can be read and so on.
I need to write a java based component that does the reading and processing.
I also need to ensure this component is highly available and doesn't result in OutOfMemory, I will be having this component running across multiple JVMS and multiple application servers thus I can have multiple clusters running this Java component
First off, the requirement to process within 1 second is not going to be dependent on your messaging approach, but more about the actual processing of the message and the raw CPUs available. Picking up 100s of messages per second from a queue is child's play, the JMS provider is most likely not the issue. Depending on your deployment platform (Tomcat, Mule, JEE, whatever), there should be a way to have n listeners to scale up appropriately. Because the messages exist on the queue until you pick it up, doubtful you'll run out of memory. I've done these apps, processed many more messages without problems.
Second, number of strategies for prioritizing messages, not necessarily requiring different queues, using priorities. I'm leaning towards using message priorities and message filters, where one group of listeners take care of the highest priority messages and another listener filters off lower priority but makes sure it does enough to get them out within a minute.
You could also do something where a lower priority message gets rewritten back to the same queue with a higher priority, based on how close to 1 minute you are. I know that sounds wrong, but reading/writing from JMS has very little overhead (at least compared to do the equivalent, column-driven database transactions), but the listener for lower priority messages could just continually increase the priority until it has to be processed.
Or simpler, just have more listeners on the high priority queue/messages than the lower priority ones, and imbalance in number of processes for messages might be all it needs.
Lots of possibilities, time for a PoC.

Windows Azure staging <--> production causing conflicts & errors on table storage

We had a terrible problem/experience yesterday when trying to swap our staging <--> production role.
Here is our setup:
We have a workerrole picking up messages from the queue. These messages are processed on the role. (Table Storage inserts, db selects etc ). This can take maybe 1-3 seconds per queue message depending on how many table storage posts he needs to make. He will delete the message when everything is finished.
Problem when swapping:
When our staging project went online our production workerrole started erroring.
When the role wanted to process queue messsage it gave a constant stream of 'EntityAlreadyExists' errors. Because of these errors queue messages weren't getting deleted. This caused the queue messages to be put back in the queue and back to processing and so on....
When looking inside these queue messages and analysing what would happend with them we saw they were actually processed but not deleted.
The problem wasn't over when deleting these faulty messages. Newly queue messages weren't processed as well while these weren't processed yet and no table storage records were added, which sounds very strange.
When deleting both staging and producting and publishing to production again everything started to work just fine.
Possible problem(s)?
We have litle 2 no idea what happened actually.
Maybe both the roles picked up the same messages and one did the post and one errored?
...???
Possible solution(s)?
We have some idea's on how to solve this 'problem'.
Make a poison message fail over system? When the dequeue count gets over X we should just delete that queue message or place it into a separate 'poisonqueue'.
Catch the EntityAlreadyExists error and just delete that queue message or put it in a separate queue.
...????
Multiple roles
I suppose we will have the same problem when putting up multiple roles?
Many thanks.
EDIT 24/02/2012 - Extra information
We actually use the GetMessage()
Every item in the queue is unique and will generate unique messages in table Storage. Little more information about the process: A user posts something and will have to be distributed to certain other users. The message generate from that user will have a unique Id (guid). This message will be posted into the queue and picked up by the worker role. The message is distributed over several other tables (partitionkey -> UserId, rowkey -> Some timestamp in ticks & the unique message id. So there is almost no chance the same messages will be posted in a normal situation.
The invisibility time out COULD be a logical explanation because some messages could be distributed to like 10-20 tables. This means 10-20 insert without the batch option. Can you set or expand this invisibility time out?
Not deleting the queue message because of an exception COULD be a explanation as well because we didn't implement any poison message fail over YET ;).
Regardless of the Staging vs. Production issue, having a mechanism that handles poison messages is critical. We've implemented an abstraction layer over Azure queues that automatically moves messages over to a poison queue once they've been attempted to be processed some configurable amount of times.
You clearly have a fault on handling double messages. The fact that your ID is unique doesn't mean that the message will not be processed twice in some occasions like:
The role dying and with partially finished work, so the message will re-appear for processing in the queue
The role crashing unexpected, so the message ends up back in the queue
The FC migrating moving your role and you don't have code to handle this situation, so the message ends up back in the queue
In all cases, you need code that handles the fact that the message will re-appear. One way is to use the DequeueCount property and check how many times the message was removed from a Queue and received for processing. Make sure you have code that handles partial processing of a message.
Now what probably happened during swapping was, when the production environment became the staging and staging became production, both of them were trying to receive the same messages so they were basically competing each other fro those messages, which is probably not bad because this is a known pattern to work anyway but when you killed your old production (staging) every message that was received for processing and wasn't finished, ended up back in the Queue and your new production environment picked the message for processing again. Having no code logic to handle this scenario and a message was that partially processed, some records in the tables existed and it started causing the behavior you noticed.
There are a few possible causes:
How are you reading the queue messages? If you are doing a Peek Message then the message will still be visible to be picked up by another role instance (or your staging environment) before the message is deleted. You want to make sure you are using Get Message so the message is invisible until it can be deleted.
Is it possible that your first role crashed after doing the work for the message but prior to deleting the message? This would cause the message to become visible again and get picked up by another role instance. At that point the message will be a poison message which will cause your instances to constantly crash.
This problem almost certainly has nothing to do with Staging vs Production, but is most likely caused by having multiple instances reading from the same queue. You can probably reproduce the same problem by specifying 2 instances, or by deploying the same code to 2 different production services, or by running the code locally on your dev machine (still pointing to Azure storage) using 2 instances.
In general you do need to handle poison messages so you need to implement that logic anyways, but I would suggest getting to the root cause of this problem first, otherwise you are just going to run into a lot more problems later on.
With queues you need to code with idempotency in mind and expect and handle the ‘EntityAlreadyExists’ as a viable response.
As others have suggested, causes could be
Multiple message in the queue with the same identifier.
Are peeking for the message and not reading it form the queue and so not making them invisible.
Not deleting the message because an exception was thrown before you can delete them.
Taking too long to process the message so it cannot be deleted (because invisibility was timed out) and appears again
Without looking at the code I am guessing that it is either the 3 or 4 option that is occurring.
If you cannot detect the issue with a code review, you may consider adding time based logging and try/catch wrappers to get a better understanding.
Using queues effectively, in a multi-role environment, requires a slightly different mindset and running into such issues early is actually a blessing in disguise.
Appended 2/24
Just to clarify, modifying the invisibility time out is not a generic solution to this type of problem. Also, note that this feature although available on the REST API, may not be available on the queue client.
Other options involve writing to table storage in an asynchronous manner to speed up your processing time, but again this is a stop gap measures which does not really address the underlying paradigm of working with queues.
So, the bottom line is to be idempotent. You can try using the table storage upsert (update or insert) feature to avoid getting the ‘EntitiyAlreadyExists’ error, if that works for your code. If all you are doing is inserting new entities to azure table storage then the upsert should solve your problem with minimal code change.
If you are doing updates then it is a different ball game all together. One pattern is to pair updates with dummy inserts in the same table with the same partition key so as to error out if the update occurred previously and so skip the update. Later after the message is deleted, you can delete the dummy inserts. However, all this adds to the complexity, so it is much better to revisit the architecture of the product; for example, do you really need to insert/update into so many tables?
Without knowing what your worker role is actually doing I'm taking a guess here, but it sounds like when you have two instances of your worker role running you are getting conflicts while trying to write to an Azure table. It is likely to be because you have code that looks something like this:
var queueMessage = GetNextMessageFromQueue();
Foo myFoo = GetFooFromTableStorage(queueMessage.FooId);
if (myFoo == null)
{
myFoo = new Foo {
PartitionKey = queueMessage.FooId
};
AddFooToTableStorage(myFoo);
}
DeleteMessageFromQueue(queueMessage);
If you have two adjacent messages in the queue with the same FooId it is quite likely that you'll end up with both of the instances checking to see if the Foo exists, not finding it then trying to create it. Whichever instance is the last to try and save the item will get the "Entity already exists" error. Because it errored it never gets to the delete message part of the code and therefore it becomes visible back on the queue after a period of time.
As others have said, dealing with poison messages is a really good idea.
Update 27/02
If it's not subsequent messages (which based on your partition/row key scheme I would say it's unlikely), then my next bet would be it's the same message appearing back in the queue after the visibility timeout. By default if you're using .GetMessage() the timeout is 30 seconds. It has an overload which allows you to specify how long that time frame is. There is also the .UpdateMessage() function that allows you to update that timeout as you're processing the message. For example you could set the initial visibility to 1 minute, then if you're still processing the message 50 seconds later, extent it for another minute.

Azure queue and table storage transactions best practics

Using azure table storage I read about Entity Group Transactions within the same partition. Now what happens if I use the Azure Queue together with table storage. Is it possible to get handle a message from the queue, insert into table storage. If something breaks, rollback and put the message on the queue again?
Or how should I handle such a scenario with Azure
Tables and queues don't have any associated transactions.
Here's some general queue usage guidance:
Make sure queue actions are idempotent - that is, you'd have the same results executing a queue message more than once, with repeatable side-effects
Set a reasonalbe queue message visibility timeout. If your task looks like it will take longer, you can extend a message's invisibility timeout. This prevents other threads / role instances from grabbing the same queue item while you're still working on it.
For long-running tasks (or those where you want to avoid consuming a resource multiple times if possible), modify your queue message along the way, giving yourself status hints. For example: you have a render-video queue message: 'RENDER|Source-URL'. you're rendering video, and it needs two passes. You've done pass 1 with results stored in a temporary blob. You could modify the message with something like 'RENDER|Source-URL|Pass1-URL'. Now, assume something goes wrong and your rendering task fails for some reason. Later, when you pick up this message again, you can start from pass 2, instead of from the very beginning.
You don't need to worry about putting messages back on the queue. Messages aren't actually removed from the queue until you explicitly delete them. They simply become invisible during the invisibility timeout period you choose. If you don't delete by the end of that period (or extend the period), the message becomes visible again for someone else to read. Note: At this point, once someone else reads the queue message, the original message-holder will no longer be able to delete the message.

does multiple Azure worker role polling same Queue causes Dead Lock or Poison message

Scenario:
if I've spin off multiple Worker roles or ONE Worker role with multiple threads, which polls the new messages in Azure Queue.
Could someone please confirm if the this the correct design approach? The reason I would like to have many worker roles is to speed up the PROCESSJOB. Our application should be near real time, i.e. as soon as there are messages we should get, apply complex business rules and commit to AZURE DB. We are expecting 11,000 message per 3min.
Thank you.
You may have as many queue-readers as you like. It's very common to scale out worker role instances, as they can all read from the same queue, giving you much greater work throughput.
When you read a queue message, it's marked "invisible" for a period of time, to prevent others from reading and doing the same work. The owner of the message must delete it before the time period expires, otherwise the message becomes visible again, and an exception will be thrown when the original reader attempts to delete it. This means your operations must be idempotent.
There's no direct poison-message handling, but it's easy to implement, as each message has a dequeue count. Just check it and remove poison messages after being read 3-4 times. You can also dynamically adjust the timeout period based on dequeue count, as maybe the processing fails due to too-short a time window.
Here's the MSDN documentation for DequeueCount.
EDIT: As far as processing 11,000 messages in 3 minutes: the scalability target for queues is 500 2,000 TPS, or up to 360,000 transactions in 3 minutes (far beyond the 11,000 message requirement you have). You can speed things up further by combining messages into a single queue message, as well as reading multiple messages at a time, which will also reduce your transaction count. You can also look at the ApproximateMessageCount property of a queue to see if your queue is backing up (and then scaling out to additional intstances to help consume queue items).

Azure Queue unique message

I would like to make sure that I don't insert a message to the queue multiple times. Is there any ID/Name I can use to enforce uniqueness?
vtortola pretty much covered it, but I wanted to add a bit more detail into why it's at least once delivery.
When you read a queue item, it's not removed from the queue; instead, it becomes invisible but stays in the queue. That invisibility period defaults to 30 seconds (max: 2 hours). During that time, the code that got the item off the queue has that much time to process whatever command was in the queue message and delete the queue item.
Assuming the queue item is deleted before the timeout period is reached, all is well. However: Once the timeout period is reached, the queue item becomes visible again, and the code holding the queue item may no longer delete it. In this case, someone else can read the same queue message and re-process that message.
Because of the fact a queue message can timeout, and can re-appear:
Your queue processing must be idempotent - operations on a queue message must result in the same outcome (such as rendering a thumbnail for a photo).
You need to think about timeout adjustments. You might find that commands are valid but processing is taking too long (maybe your 45-second thumbnail rendering code worked just fine until someone uploaded a 25MP image)
You need to think about poison messages - those that will never process correctly. Maybe they cause an exception to be thrown or have some invalid condition that causes the message processor to abort processing, which leads to the message eventually re-appearing in the queue. There's a property callded DequeueCount - consider viewing that property upon reading a queue item and, if equal to, say, 3, push the message into a table or blob and send yourself a notification to spend some time debugging that message offline.
More details on the get-queue low-level REST API is here. This will give you more insight into Windows Azure queue message handling.
Azure queues doesn't ensure message order and either message uniqueness. Messages will be processed "at least once", but nothing ensures it won't be processed twice, so it doesn't ensure "at most once".
You should get ready to receive the same message twice. You can put an ID in the body of the message as part of your data.

Resources