I have to develop a Robot which has to work for 7 days. I've created the proccess and my question, do I have to create a Work Queue and configure my Proccess or how do I do that.
Work queue creation is not a compulsory task, it all depend on process to process until and unless we do not require any output from the BOT and not a large volume of data i.e.
We do not need to get status(error/completed) of an item.
Business do not need the status report
We do not need to keep track of the items got completed and pending
But, I would suggest you to create and use a Work queue as
It will keep track of number of records got processed
Easy to generate business report (how many requests got executed successfully or got exception)
For each record item it will give us the status whether it is been executed successfully or got exception
We can easily track the error.
And the most important : If suppose BOT execution is getting failed due to some reason and we need to restart the BOT then,
A. BOT will not pick the executed item if we are using work queue. It will pick the next pending item from work queue
B. If we are not using work queue, BOT will/can pick the items which were executed previously. There is no point to pick the items which were already got processed.
You can also refer the documentation provided by Blue Prism on their portal:
Work Queue Guide
Related
I'm creating an app that uses a JobQueue using Amazon SQS.
Every time a user logs in, I create a bunch of jobs for that specific user, and I want him to wait until all his jobs have been processed before taking the user to a specific screen.
My problem is that I don't know how to query the queue to see if there are still pending jobs for a specific user, or how is the correct way to implement such solution.
Everything regarding the queue (Job creation and processing is working as expected). But I am missing that final step.
Just for the record:
In my previous implementation I was using Redis + Kue and I had created a key with the user Id and the job count, every time a job was added that job count was incremented, and every time a job finished or failed I decremented that count. But now I want to move away from Redi + Kue and I am not sure how to implement this step.
Amazon SQS is not the ideal tool for the scenario you describe. A queueing system is normally used in a "Send and Forget" situation, where the sending system doesn't remain interested in later processing.
You could investigate Amazon Simple Workflow (SWF), which allows work to be monitored as it goes through several processes. Your existing code could mostly be re-used, just with the SWF framework added. Or even power it from Lambda, since you are already using node.js.
I was hoping if someone can clarify a few things regarding Azure Storage Queues and their interaction with WebJobs:
To perform recurring background tasks (i.e. add to queue once, then repeat at set intervals), is there a way to update the same message delivered in the QueueTrigger function so that its lease (visibility) can be extended as a way to requeue and avoid expiry?
With the above-mentioned pattern for recurring background jobs, I'm also trying to figure out a way to delete/expire a job 'on demand'. Since this doesn't seem possible outside the context of WebJobs, I was thinking of maybe storing the messageId and popReceipt for the message(s) to be deleted in Table storage as persistent cache, and then upon delivery of message in the QueueTrigger function do a Table lookup to perform a DeleteMessage, so that the message is not repeated any more.
Any suggestions or tips are appreciated. Cheers :)
Azure Storage Queues are used to store messages that may be consumed by your Azure Webjob, WorkerRole, etc. The Azure Webjobs SDK provides an easy way to interact with Azure Storage (that includes Queues, Table Storage, Blobs, and Service Bus). That being said, you can also have an Azure Webjob that does not use the Webjobs SDK and does not interact with Azure Storage. In fact, I do run a Webjob that interacts with a SQL Azure database.
I'll briefly explain how the Webjobs SDK interact with Azure Queues. Once a message arrives to a queue (or is made 'visible', more on this later) the function in the Webjob is triggered (assuming you're running in continuous mode). If that function returns with no error, the message is deleted. If something goes wrong, the message goes back to the queue to be processed again. You can handle the failed message accordingly. Here is an example on how to do this.
The SDK will call a function up to 5 times to process a queue message. If the fifth try fails, the message is moved to a poison queue. The maximum number of retries is configurable.
Regarding visibility, when you add a message to the queue, there is a visibility timeout property. By default is zero. Therefore, if you want to process a message in the future you can do it (up to 7 days in the future) by setting this property to a desired value.
Optional. If specified, the request must be made using an x-ms-version of 2011-08-18 or newer. If not specified, the default value is 0. Specifies the new visibility timeout value, in seconds, relative to server time. The new value must be larger than or equal to 0, and cannot be larger than 7 days. The visibility timeout of a message cannot be set to a value later than the expiry time. visibilitytimeout should be set to a value smaller than the time-to-live value.
Now the suggestions for your app.
I would just add a message to the queue for every task that you want to accomplish. The message will obviously have the pertinent information for processing. If you need to schedule several tasks, you can run a Scheduled Webjob (on a schedule of your choice) that adds messages to the queue. Then your continuous Webjob will pick up that message and process it.
Add a GUID to each message that goes to the queue. Store that GUID in some other domain of your application (a database). So when you dequeue the message for processing, the first thing you do is check against your database if the message needs to be processed. If you need to cancel the execution of a message, instead of deleting it from the queue, just update the GUID in your database.
There's more info here.
Hope this helps,
As for the first part of the question, you can use the Update Message operation to extend the visibility timeout of a message.
The Update Message operation can be used to continually extend the
invisibility of a queue message. This functionality can be useful if
you want a worker role to “lease” a queue message. For example, if a
worker role calls Get Messages and recognizes that it needs more time
to process a message, it can continually extend the message’s
invisibility until it is processed. If the worker role were to fail
during processing, eventually the message would become visible again
and another worker role could process it.
You can check the REST API documentation here: https://msdn.microsoft.com/en-us/library/azure/hh452234.aspx
For the second part of your question, there are really multiple ways and your method of storing the id/popReceipt as a lookup is a possible option, you can actually have a Web Job dedicated to receive messages on a different queue (e.g plz-delete-msg) and you send a message containing the "messageId" and this Web Job can use Get Message operation then Delete it. (you can make the job generic by passing the queue name!)
https://msdn.microsoft.com/en-us/library/azure/dd179474.aspx
https://msdn.microsoft.com/en-us/library/azure/dd179347.aspx
It seems that new azure SDK extends the visibilitytimeout to <= 7 days. I know by default, when I add a message to an azure queue, the live time is 7days. When I get message out, and set the visibilitytimeout to 7days. Does that mean I don't need to delete this message if I don't care about message reliable? the message will disappear later 7 days.
I want to take this way because DeleteMessage is very slow. If I don't delete message, doesn't it have any impact on performance of GetMessage?
Based on the documentation for Get Messages, I believe it is certainly possible to set the VisibilityTimeout period to 7 days so that messages are fetched only once. However I see some issues with this approach instead of just deleting the message once the process is done:
What happens when you get the message and start processing it and somehow the process fails? If you set the visibility timeout to be 7 days, then the message would never appear in the queue again and thus the process it was supposed to do never gets done.
Even though the message is hidden, it is still there in the queue thus you keep on incurring storage charges for that message. Even though the cost is trivial but why keep the message when you don't really need it.
A lot of systems rely on Approximate Messages Count property of a queue to check on the health of processes which are performed by messages in a queue. Please note that even though you make the message hidden, it is still there in the queue and thus will be included in total messages count in the queue. So if you're building a system which relies on this for health check, you will always find your system to be unhealthy because you're never deleting the messages.
I'm curious to know why you find deleting messages to be very slow. In my experience this is quite fast. How are you monitoring message deletion?
Rather than hacking around the problem I think you should drill into understanding why the deletes are slow. Have you enabled logs and looked at the e2elatency and serverlatency numbers across all your queue operations. Ideally you shouldn't be seeing a large difference between the two for all your queue operations. If you do see a large difference then it implies something is happening on the client that you should investigate further.
For more information on logging take a look at the following articles:
http://blogs.msdn.com/b/windowsazurestorage/archive/tags/analytics+2d00+logging+_2600_amp_3b00_+metrics/
http://msdn.microsoft.com/en-us/library/azure/hh343262.aspx
Information on client side logging can also be found in this post: e client side logging – which you can learn more about in this blog post. http://blogs.msdn.com/b/windowsazurestorage/archive/2013/09/07/announcing-storage-client-library-2-1-rtm.aspx
Please let me know what you find.
Jason
I'm aware a single workflow instance run in a single thread at a time. I've a workflow with two receive activities inside a pick activity. Message correlation is implemented to make sure the requests to both the activities should be routed to the same instance.
In the first receive branch I've a parallel activity with a delay activity in one branch. The parallel activity will complete either the delay is over or a flag is set to true.
When the parallel activity is waiting for the condition to meet how I can receive calls from the second receive activity? because the flag will be set to true only through through it's branch. I'm waiting for your suggestions or ideas.
Check out my blog The Workflow Parallel Activity and Task Parallelism This will help you understand how WF works
Not quite sure what you are trying to achieve here.
If you have a Pick with 2 branched and both branches contain a Receive it will continue after you receive either of the 2 messages the 2 Receive activities are waiting for. The other will be canceled and not receive anything. The fact that one Receive is in a Parallel will not make a difference here. So unless this is on a loop you will not receive more than one WCF message in your workflow.
I have a simple work role in azure that does some data processing on an SQL azure database.
The worker basically adds data from a 3rd party datasource to my database every 2 minutes. When I have two instances of the role, this obviously doubles up unnecessarily. I would like to have 2 instances for redundancy and the 99.95 uptime, but do not want them both processing at the same time as they will just duplicate the same job. Is there a standard pattern for this that I am missing?
I know I could set flags in the database, but am hoping there is another easier or better way to manage this.
Thanks
As Mark suggested, you can use an Azure queue to post a message. You can have the worker role instance post a followup message to the queue as the last thing it does when processing the current message. That should deal with the issue Mark brought up regarding the need for a semaphore. In your queue message, you can embed a timestamp marking when the message can be processed. When creating a new message, just add two minutes to current time.
And... in case it's not obvious: in the event the worker role instance crashes before completing processing and fails to repost a new queue message, that's fine. In this case, the current queue message will simply reappear on the queue and another instance is then free to process it.
There is not a super easy way to do this, I dont think.
You can use a semaphore as Mark has mentioned, to basically record the start and the stop of processing. Then you can have any amount of instances running, each inspecting the semaphore record and only acting out if semaphore allows it.
However, the caveat here is that what happens if one of the instances crashes in the middle of processing and never releases the semaphore? You can implement a "timeout" value after which other instances will attempt to kick-start processing if there hasnt been an unlock for X amount of time.
Alternatively, you can use a third party monitoring service like AzureWatch to watch for unresponsive instances in Azure and start a new instance if the amount of "Ready" instances is under 1. This will save you can save some money by not having to have 2 instances up and running all the time, but there is a slight lag between when an instance fails and when a new one is started.
A Semaphor as suggested would be the way to go, although I'd probably go with a simple timestamp heartbeat in blob store.
The other thought is, how necessary is it? If your loads can sustain being down for a few minutes, maybe just let the role recycle?
Small catch on David's solution. Re-posting the message to the queue would happen as the last thing on the current execution so that if the machine crashes along the way the current message would expire and re-surface on the queue. That assumes that the message was originally peeked and requires a de-queue operation to remove from the queue. The de-queue must happen before inserting the new message to the queue. If the role crashes in between these 2 operations, then there will be no tokens left in the system and will come to a halt.
The ESB dup check sounds like a feasible approach, but it does not sound like it would be deterministic either since the bus can only check for identical messages currently existing in a queue. But if one of the messages comes in right after the previous one was de-queued, there is a chance to end up with 2 processes running in parallel.
An alternative solution, if you can afford it, would be to never de-queue and just lease the message via Peek operations. You would have to ensure that the invisibility timeout never goes beyond the processing time in your worker role. As far as creating the token in the first place, the same worker role startup strategy described before combined with ASB dup check should work (since messages would never move from the queue).