How do I auto-start an Azure queue? - azure

I want to build an Azure application that has two worker roles and NO web roles. When the worker roles first start up I want ONLY ONE of the roles to do the following a single time:
Download and parse a master file then enqueue multiple "child" tasks based on the
contents of the master file
Enqueue a single master file download "child" task to run the next day
Each of the "child" tasks would then be done by both of the workers until the task queue was exhausted. Think of the whole things as "priming the pump"
This sort of thing is really easy if I add the the first "master" task manually in a queue by calling a web role but seems to be really hard to do in an auto-start mode.
Any help in this regard would be greatly appreciated!
Thanks.....

One possibility: instead of calling a web role, just load the queue directly. (It sounds like this is the sort of application you'll want to automatically spin up to do some work and then shut down again... if you're automating that, it should be trivial to also automate loading the queue.)
A (perhaps) better option: Use some sort of locking mechanism to make sure only one worker instance does the initialization work. One way to do this is to try to create the queue (or a blob, or an entity in a table). If it already exists, then the other instance is handling initialization. If the create succeeds, then it's this instance's job.
Note that it's always better to use a lease than a lock, in case the instance that's doing the initialization fails. Consider using a timeout (e.g. storing a timestamp in table storage or in the metadata of the blob or in the name of the queue...).

We did end-up with the exact same sort of problem, that's why we introduced a O/C mapper (object to cloud). Basically, you want to introduce two types of cloud services:
QueueService that consumes messages whenever available.
ScheduledService that triggers operations on a scheduled basis.
Then, as others suggested, in the cloud, you really prefer using leases instead of locks, in order to avoid your cloud app to end up freezed forever due to a temporary hardware (or infrastructure) issue.

Related

One-time jobs on Azure workers

What is the best way to do a one-time job on Azure?
Say we want to extend a table in the associated database with a double column. All the new entries will have this value computed by the worker(s) at insertion, but somebody has to take care of the entries that are already in the table. I thought of two alternatives:
a method called by the worker only if a database entry (say, "JobRun") is set to true, and the method would flip the entry to false.
a separate app that does the job, and which is downloaded and run manually using remote desktop (I cannot connect the local app to the Azure SQL server).
The first alternative is messy (how should I deal with the code at the next deployment? delete it? comment it? leave it there? also, what if I will have another job in the future? create a new database entry "Job2Run"?). The second one looks like a cheap hack. I am sure that there is a better way I could not think of.
If you want to run a job once you'll need to take into account the following:
Concurrency: While the job is running, make sure no other worker picks up the job and runs it at the same time (you can use leases for this. More info here).
Once the job is done, you'll need to keep track (in Table Storage, SQL Azure, ...) that the job completed successfully. The next time a worker tries to pick up the job, it will look in Table Storage / SQL Azure / ..., it will see that the job completed and skip the job.
Failure: Maybe your worker crashes during the job which should allow another worker to pick up the job without any issue.
In your specific use case I would also consider using a tool like dbup to manage updates to your schema and existing data with SQL Scripts. Tools like these keep track of which scripts have been executed by adding them in a table in the database.

Creating a Web Crawler using Windows Azure

I want to create a Web Crawler, that takes the content of some website and saves it in a blob storage. What is the right way to do that on Azure? Should I start a Worker role, and use the Thread.Sleep method to make it run once a day?
I also wonder, if I use this Worker Role, how would it work if I create two instances of it? I noticed using "Compute Emulator UI" that the command "Trace.WriteLine" works on both instances at the same time, can someone clarify this point.
I created the same crawler using php and set the cron job to start the script once a day, but it took 6 hours to grab the whole content, thats why I want to use Azure.
This is the right way to do it, as of Jan 2014 Microsoft introduced Azure WebJobs, where you can create a project (console for example), and run it as a scheduled task (occurrence once, recurrence)
https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/
http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
Considering that a worker role is basically Windows 2008 Server, you can run the same code you'd run on-premises.
Consider, though, that there are several reasons why a role instance might reboot: OS updates, crash, etc. In these cases, it's possible you'd lose the work being done. So... you can handle this in a few ways:
Queue. Place a message on a command queue. If it's a once-a-day task, you can just push the message on the queue when done processing the previous message. Note that you can put an invisibility timeout on the message, so it doesn't appear for a day. In the event of failure during processing, the message will re-appear on the queue and a different instance can pick it up. You can also modify the message as you go, to keep track of your status.
Scheduler. Just make sure there's only one instance running (by way of a mutex). An easy way to do this is to attempt to obtain a write-lock on a blob (there can only be one).
One thing to consider is breaking up your web-crawl into separate tasks (url's?) and place those individually on the queue? With this, you'd be able to scale, running either multiple instances or, potentially, multiple threads in the same instance (since web-crawling is likely to be a blocking operation, rather than a cpu- and bandwidth-intensive one).
A single worker role running once a day is probably the best approach. I would not use thread sleep though, since you may want to restart the instance and then it may, depening on your programming, start before one day or later than one day. What about putting the task command as a message on the Azure Queue and dequeuing it once it has been picked up by a worker role, then adding a new task command on the Azure Queue once.

Scheduled Tasks with Sql Azure?

I wonder if there's a way to use scheduled tasks with SQL Azure?
Every help is appreciated.
The point is, that I want to run a simple, single line statement every day and would like to prevent setting up a worker role.
There's no SQL Agent equivalent for SQL Azure today. You'd have to call your single-line statement from a background task. However, if you have a Web Role already, you can easily spawn a thread to handle this in your web role without having to create a Worker Role. I blogged about the concept here. To spawn a thread, you can either do it in the OnStart() event handler (where the Role instance is not yet added to the load balancer), or in the Run() method (where the Role instance has been added to the load balancer). Usually it's a good idea to do setup in the OnStart().
One caveat that might not be obvious, whether you execute this call in its own worker role or in a background thread of an existing Web Role: If you scale your Role to, say, two instances, you need to ensure that the daily call only occurs from one of the instances (otherwise you could end up with either duplicates, or a possibly-costly operation being performed multiple times). There are a few techniques you can use to avoid this, such as a table row-lock or an Azure Storage blob lease. With the former, you can use that row to store the timestamp of the last time the operation was executed. If you acquire the lock, you can check to see if the operation occurred within a set time window (maybe an hour?) to decide whether one of the other instances already executed it. If you fail to acquire the lock, you can assume another instance has the lock and is executing the command. There are other techniques - this is just one idea.
In addition to David's answer, if you have a lot of scheduled tasks to do then it might be worth looking at:
lokad.cloud - which has good handling of periodic tasks - http://lokadcloud.codeplex.com/
quartz.net - which is a good all-round scheduling solution - http://quartznet.sourceforge.net/
(You could use quartz.net within the thread that David mentioned, but lokad.cloud would require a slightly bigger architectural change)
I hope it is acceptable to talk about your own company. We have a web based service that allows you to do this. You can click this link to see more details on how to schedule execution of SQL Azure queries.
The overcome the issue of multiple roles executing the same task, you can check for role instance id and make sure that only the first instance will execute the task.
using Microsoft.WindowsAzure.ServiceRuntime;
String g = RoleEnvironment.CurrentRoleInstance.Id;
if (!g.EndsWith("0"))
{
return;
}

Microsoft Azure Master-Slave worker roles

I am trying to port an application to azure platform. I want to run an existing application multiple times. My initial idea is as follows: I have a master_process. I have many slave_processes. Each process is a worker role in Azure. Each slave_process will run an instance of the application independently. I want master_process to start many slave_processes and provide them the input arguments. At the end, master_process will collect the results. Currently, I have a working setup for calling the whole application from a C# wrapper. So, for the success, I need two things: First, I have to find a way to start slave workers inside of a master worker (just like threads). Second, I need to find a way to store results of the slave workers and reach these result files from master worker. Can anyone help me?
I think I would try and solve the problem differently. Deploying a whole new instance can take 15 to 30 minutes. Adding extra instances to an already running worker role is a little quicker, but not by much. I'm going to presume that you want results faster than that and that this process is something that is run frequently.
I would have just one worker role type that runs your existing logic and as many instances of that worker role that you determine you'll need. Whatever your client is will decide that it needs to break the work up in a certain number of pieces, let's say 10 for the sake of argument. It will give each piece of work an ID (e.g. a guid) and then put 10 messages that contain the parameters and the ID into a queue. Your worker role instances take messages out of the queue, do their work and write their results to storage somewhere (either SQL Azure, Azure Table Storage or maybe even blob storage depending on what the results are). The client polls that storage to wait for all of the results to be complete and then carries on.
If this is a process that is only run infrequently, then rather than having the worker roles deployed all of the time, you could use the same method I've described, but in addition get the client code to deploy the worker roles when it starts and then delete them when it's finished through the management API. There are samples on MSDN on how to use this.
I have a similar situation you might find useful:
I have a large sequential batch process I run in Azure which requires pre and post processing. The technique I used was to use instances of a single multifunctional worker role, but to use a "quorum" to nominate a head node, which then controls the workflow.
They way I do it is using an azure page blob as the quorum (basically a kind of global mutex/lock), because once a node grabs it for writing it's locked. For resilience, in case there's an issue with the head node, all nodes occasionally try to recapture the quorum.

How to implement critical section in Azure

How do I implement critical section across multiple instances in Azure?
We are implementing a payment system on Azure.
When ever account balance is updated in the SQL-azure, we need to make sure that the value is 100% correct.
But we have multiple webroles running, thus they would be able to service two requests concurrently from different customers, that would potentially update current balance for one single product. Thus both instances may read the old amount from database at the same time, then both add the purchase to the old value and the both store the new amount in the database. Who ever saves first will have it's change overwritten. :-(
Thus we need to implement a critical section around all updates to account balance in the database. But how to do that in Azure? Guides suggest to use Azure storage queues for inter process communication. :-)
They ensure that the message does not get deleted from the queue until it has been processed.
Even if a process crash, then we are sure that the message will be processed by the next process. (as Azure guarantee to launch a new process if something hang)
I thought about running a singleton worker role to service requests on the queue. But Azure does not guarantee good uptime when you don't run minimum two instances in parallel. Also when I deploy new versions to Azure, I would have to stop the running instance before I can start a new one. Our application cannot accept that the "critical section worker role" does not process messages on the queue within 2 seconds.
Thus we would need multiple worker roles to guarantee sufficient small down time. In which case we are back to the same problem of implementing critical sections across multiple instances in Azure.
Note: If update transaction has not completed before 2 seconds, then we should role it back and start over.
Any idea how to implement critical section across instances in Azure would be deeply appreciated.
Doing synchronisation across instances is a complicated task and it's best to try and think around the problem so you don't have to do it.
In this specific case, if it is as critical as it sounds, I would just leave this up to SQL server (it's pretty good at dealing with data contentions). Rather than have the instances say "the new total value is X", call a stored procedure in SQL where you simply pass in the value of this transaction and the account you want to update. Somthing basic like this:
UPDATE Account
SET
AccountBalance = AccountBalance + #TransactionValue
WHERE
AccountId = #AccountId
If you need to update more than just one table, do it all in the same stored procedure and wrap it in a SQL transaction. I know it doesn't use any sexy technologies or frameworks, but it's much less complicated than any alternative I can think of.

Resources