I am writing a small azure worker role that remove old files from my Azure-Storage account.
I am planning to run this code one time per month. The duration of task execution is less the 10 minutes.
What I'm planning is to run this worker role and when it's finish - stop the worker role (aka quit). Now, I want to schedule another task that will start my worker role every first day in month.
Solution 1: While reading this article, I found the Quartz library not suitable because my worker role is running for the whole month (and I keep paying).
Solution 2: I saw it possible to use Azure-Queues to start my first instance of the application while some message in the queue. But, this is too much things to handle, while the task itself is pretty easy. Looking for more easy solution.
Any better solution? Maybe Azure-Worker-Role is not suitable for this task?
A Worker Role may not be the best choice for this task. You have two alternatives that might be better:
Use an Azure WebJob instead of a Worker Role. WebJobs support scheduling.
http://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/
Use the Azure Scheduler.
http://azure.microsoft.com/en-us/services/scheduler/
Related
I need an Azure VM (Ubuntu) to do some task (java application) every 10 minutes. Because the task lasts usually less than a minute I would save money if could start the machine every 10 minutes and stop it when the task accomplishes. I learned that I can schedule start and stop times in automation account, but more optimal would be to stop the VM in the very moment that task is completed. Is there a simple way to do that?
This really sounds like a job for Azure Batch. If you are looking for an IaaS solution, Azure Batch will do the job for you. Have a look at it: https://azure.microsoft.com/en-gb/services/batch/#overview.
It allows you to use VM's with your preferred OS (in Azure Batch it is called a node), and run a set of tasks. Once finished, the VM will be de-allocated.
So each node runs a set of pools, in each pool you have a job, and in each job you can have tasks. A task can be for example a cmd line that runs a specific app. So for instance you could just run example.exe 1 2 on a windows OS or the equivalent command line for an Ubuntu OS.
The power here is that it will allocate the tasks to run on the VM when you add them to the job, and then the VM will be disposed off once finished, and you would only pay for the compute time.
The disadvantages of this is that it is a stateless VM, therefore anything that you need installing or storing you would have to use alternative methods. Azure Batch allows you to pre-install a program (for example your Java application) each time it initiates. Also if you are using files and/or expecting files to be created, you would need a blob storage to support this. So if you are expecting it to use a certain amount of files, store them on blob storage and then write back to the blob storage if your program is doing this.
Finally your scheduler, this really depends on how you want to deal with it, if you have a local server or a server on Azure that is already running 24/7 you can add a scheduled job to the scheduler and run a program that will add the task to the Azure Batch. Or if you don't mind using Azure Functions, you can just add a timer Azure Function that will add a task to the job. There are multiple ways of dealing with this, you may already have an existing solution.
Hope you find this useful!
I am developing an application using Azure Cloud Service and web api. I would like to allow users that create a consultation session the ability to change the price of that session, however I would like to allow all users 30 days to leave the session before the new price affects the price for all members currently signed up for the session. My first thought is to use queue storage and set the visibility timeout for the 30 day time limit, but this seems like this could grow the queue really fast over time, especially if the message should not run for 30 days; not to mention the ordering issues. I am looking at the task scheduler as well but the session pricing changes are not a recurring concept but more random. Is the queue idea a good approach or is there a better and more efficient way to accomplish this?
The stuff you are trying to do should be done with a relational database. You can use timestamps to record when prices for session changed. I wouldn't use a queue at all for this. A queue is more for passing messages in a distributed system. Your problem is just about tracking what prices changed on what sessions and when. That data should be modeled in a database.
I think this scenario is more suitable to use Azure Scheduler. Programatically create a Job with one time recurrence with set date as 30 days later to run once. Once this job gets triggered automatically by scheduler, assign an action to callback to one of your API/Service to do the price & other required updates and also remove this Job from the scheduler as part of this action to have a clean jobs list. Anyways premium plan of Azure Scheduler Job Collection will give you unlimited number of jobs to run.
Hope this is exactly what you were looking for...
I would consider using Azure WebJobs. A WebJob basically gives you the ability to run a .NET console application within the context of an Azure Web App. It can be run on demand, continuously, or in response to a reoccurring schedule. If your processing requirements are low and allow for it they can also run in the same process that your Web App is running in to save you $$$ as they are free that way.
You could schedule the WebJob to run once or twice per day and examine the situation and react as is appropriate. Since it's really just a .NET worker role you have ultimate flexibility.
We would like to make our customers able to schedule recurring tasks on a daily, weekly and monthly basis. Linear scalability is really important to us, that is why we use Windows Azure Table Storage instead of SQL Azure. The current design is the following:
- Scheduling information is stored in a Table Storage table. For example: Task A, daily; Task B, weekly; ...
- There are worker processes, which run hourly and query this table. Then decide, they have to run a given task or not.
But what if, multiple worker roles start to run the same task?
Some other requirements:
- The worker processes can be in different time zones.
Windows Azure Queue Storage could solve all cuncurrency problems mentioned above, but it also introduces some new issues:
- How many queue items should we generate?
- What if the customer changes the recurrence rate or revokes the scheduling?
So, my question is: how to design a recurring task scheduler with multiple asynchronous workers using Windows Azure Storage?
Perhaps the new Azure Scheduler service could help?
http://www.windowsazure.com/en-us/services/scheduler/
Some thoughts:
But what if, multiple worker roles start to run the same task?
This could very well happen. To avoid this, what you could do is have a worker role instance (any worker role instance from the pool) read from table and push messages in a queue. While this instance is doing this work, all other instances wait. To decide which instance does this work, you can make use of blob lease functionality.
Some other requirements: - The worker processes can be in different
time zones.
Not sure about this. Assuming you're talking about Cloud Services Worker Roles, they could be in different data centers but all of them will be in UTC time zone.
How many queue items should we generate?
It really depends on how much work needs to be done. You could put all messages in a queue. Only a maximum of 32 messages can be dequeued from a queue by a client at a time. So if you have say 100 tasks and thus 100 messages, each instance can only read up to 32 messages from the queue in a single call to queue service.
What if the customer changes the recurrence rate or revokes the
scheduling?
That should be OK as once the task is completed you must remove the message from the queue. Next time when the task is invoked, you can read from the table again and it will give you latest information about the task from the table.
I would continue using the Azure Table Storage, but mark the process as "in progress" before a worker starts working on it. Since ATS supports concurrency which is controlled by Etags, you can be assured that two processes won't be able to start the same process
I would, however, think about retry logic when jobs fail unexpectedly and have a process that restarts job that appear to have gone orphan
I want to create a Web Crawler, that takes the content of some website and saves it in a blob storage. What is the right way to do that on Azure? Should I start a Worker role, and use the Thread.Sleep method to make it run once a day?
I also wonder, if I use this Worker Role, how would it work if I create two instances of it? I noticed using "Compute Emulator UI" that the command "Trace.WriteLine" works on both instances at the same time, can someone clarify this point.
I created the same crawler using php and set the cron job to start the script once a day, but it took 6 hours to grab the whole content, thats why I want to use Azure.
This is the right way to do it, as of Jan 2014 Microsoft introduced Azure WebJobs, where you can create a project (console for example), and run it as a scheduled task (occurrence once, recurrence)
https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/
http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
Considering that a worker role is basically Windows 2008 Server, you can run the same code you'd run on-premises.
Consider, though, that there are several reasons why a role instance might reboot: OS updates, crash, etc. In these cases, it's possible you'd lose the work being done. So... you can handle this in a few ways:
Queue. Place a message on a command queue. If it's a once-a-day task, you can just push the message on the queue when done processing the previous message. Note that you can put an invisibility timeout on the message, so it doesn't appear for a day. In the event of failure during processing, the message will re-appear on the queue and a different instance can pick it up. You can also modify the message as you go, to keep track of your status.
Scheduler. Just make sure there's only one instance running (by way of a mutex). An easy way to do this is to attempt to obtain a write-lock on a blob (there can only be one).
One thing to consider is breaking up your web-crawl into separate tasks (url's?) and place those individually on the queue? With this, you'd be able to scale, running either multiple instances or, potentially, multiple threads in the same instance (since web-crawling is likely to be a blocking operation, rather than a cpu- and bandwidth-intensive one).
A single worker role running once a day is probably the best approach. I would not use thread sleep though, since you may want to restart the instance and then it may, depening on your programming, start before one day or later than one day. What about putting the task command as a message on the Azure Queue and dequeuing it once it has been picked up by a worker role, then adding a new task command on the Azure Queue once.
I wonder if there's a way to use scheduled tasks with SQL Azure?
Every help is appreciated.
The point is, that I want to run a simple, single line statement every day and would like to prevent setting up a worker role.
There's no SQL Agent equivalent for SQL Azure today. You'd have to call your single-line statement from a background task. However, if you have a Web Role already, you can easily spawn a thread to handle this in your web role without having to create a Worker Role. I blogged about the concept here. To spawn a thread, you can either do it in the OnStart() event handler (where the Role instance is not yet added to the load balancer), or in the Run() method (where the Role instance has been added to the load balancer). Usually it's a good idea to do setup in the OnStart().
One caveat that might not be obvious, whether you execute this call in its own worker role or in a background thread of an existing Web Role: If you scale your Role to, say, two instances, you need to ensure that the daily call only occurs from one of the instances (otherwise you could end up with either duplicates, or a possibly-costly operation being performed multiple times). There are a few techniques you can use to avoid this, such as a table row-lock or an Azure Storage blob lease. With the former, you can use that row to store the timestamp of the last time the operation was executed. If you acquire the lock, you can check to see if the operation occurred within a set time window (maybe an hour?) to decide whether one of the other instances already executed it. If you fail to acquire the lock, you can assume another instance has the lock and is executing the command. There are other techniques - this is just one idea.
In addition to David's answer, if you have a lot of scheduled tasks to do then it might be worth looking at:
lokad.cloud - which has good handling of periodic tasks - http://lokadcloud.codeplex.com/
quartz.net - which is a good all-round scheduling solution - http://quartznet.sourceforge.net/
(You could use quartz.net within the thread that David mentioned, but lokad.cloud would require a slightly bigger architectural change)
I hope it is acceptable to talk about your own company. We have a web based service that allows you to do this. You can click this link to see more details on how to schedule execution of SQL Azure queries.
The overcome the issue of multiple roles executing the same task, you can check for role instance id and make sure that only the first instance will execute the task.
using Microsoft.WindowsAzure.ServiceRuntime;
String g = RoleEnvironment.CurrentRoleInstance.Id;
if (!g.EndsWith("0"))
{
return;
}