Long running (or forever) task on Windows Azure - azure

I need to write some data to database every 50 seconds or so. It's similar to a Windows service that's running on background and silently doing its job. Starting and stopping is not an option in my case as I need a small amount of previously inserted data to be stored in memory. What's the best solution for this when using Windows Azure or AWS?
Thank you.

With Windows Azure, you can choose either a Web or Worker role (both basically Windows 2008 Server R2 or SP2) and have some type of timed event, as #Lucifure suggested. You could also run a scheduler, like Quartz.net, or take advantage of windows Azure queues or service bus queues to have messages show up at a certain time. However: You cannot have a "forever" task in a given role instance, in that periodically your VM instances will be rebooted (e.g. for host OS maintenance every month). With role shutdowns, you'll get notice, which you can handle these shutdown notices in Stopping() or OnStop(). If you have multiple instances, you can use a scheduler or queue to ensure your events still trigger every 50 seconds or so, and get handled across multiple instances (but only by one instance at any given time).
To preserve your in-memory information, one idea is to store that information in a cache. You have 2 choices:
Distributed (shared) cache service, which has been around for some time now. It runs independently of your role instances.
In-memory cache, just introduced in June 2012. Assuming you have more than one instance, the cache is spread across those instances. You can even run the cache inside of memory of your existing roles.
More information on caching is here.
There are a few StackOverflow answers regarding Quartz.net and Windows Azure, such as this one.

On Windows Azure, you can use a Worker Role, which can do this. It can be simple as a while loop.
Try this article for an introduction.
http://www.c-sharpcorner.com/uploadfile/40e97e/windows-azu-creating-and-deploying-worker-role/

You could setup a System.Threading.Timer to fire every 50 seconds or so, and do your work whenever the event occurs.

Related

Azure Worker Role data volatility

I would like to create an application that holds large amount of volatile data in memory. Only small part of this data needs to be persisted when host machine shuts down, or in case of maintenance. Outages should be rare, this in memory data needs to be accessible for most of the time, but rare restrats of service is bearable.
If I have been developing for a server, I would create a WindowsService, which runs reliably while the machine is up, and I would persist a fraction of the data in the OnStop() method.
I'm thinking of moving this whole thing to the cloud. The question is that if a Worker Role is similiar to a Windows Service from this point of view? Does it run most of the time with rare outages, or is it recycled / restarted from time to time or when it is idle?
Like Windows Service, Worker role is meant for processing background tasks. However one thing you would need to keep in mind that your worker role can go down any time. It may be because of hardware failure or software updates. Thus you can't always assume this to be highly available. That's why Windows Azure recommends deploying multiple instances of your application.
What you could do is have multiple instances of your worker role running and all of them sharing a common cache where you would put volatile data. Do take a look at Windows Azure Caching (http://msdn.microsoft.com/en-us/library/windowsazure/gg278356.aspx) where you could either dedicate some memory of a VM (i.e. an instance) for caching purpose or have a full VM dedicated for caching. That way you'll have your volatile data somewhere outside of your worker roles and thus making it available to all instances.

Creating a Web Crawler using Windows Azure

I want to create a Web Crawler, that takes the content of some website and saves it in a blob storage. What is the right way to do that on Azure? Should I start a Worker role, and use the Thread.Sleep method to make it run once a day?
I also wonder, if I use this Worker Role, how would it work if I create two instances of it? I noticed using "Compute Emulator UI" that the command "Trace.WriteLine" works on both instances at the same time, can someone clarify this point.
I created the same crawler using php and set the cron job to start the script once a day, but it took 6 hours to grab the whole content, thats why I want to use Azure.
This is the right way to do it, as of Jan 2014 Microsoft introduced Azure WebJobs, where you can create a project (console for example), and run it as a scheduled task (occurrence once, recurrence)
https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/
http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
Considering that a worker role is basically Windows 2008 Server, you can run the same code you'd run on-premises.
Consider, though, that there are several reasons why a role instance might reboot: OS updates, crash, etc. In these cases, it's possible you'd lose the work being done. So... you can handle this in a few ways:
Queue. Place a message on a command queue. If it's a once-a-day task, you can just push the message on the queue when done processing the previous message. Note that you can put an invisibility timeout on the message, so it doesn't appear for a day. In the event of failure during processing, the message will re-appear on the queue and a different instance can pick it up. You can also modify the message as you go, to keep track of your status.
Scheduler. Just make sure there's only one instance running (by way of a mutex). An easy way to do this is to attempt to obtain a write-lock on a blob (there can only be one).
One thing to consider is breaking up your web-crawl into separate tasks (url's?) and place those individually on the queue? With this, you'd be able to scale, running either multiple instances or, potentially, multiple threads in the same instance (since web-crawling is likely to be a blocking operation, rather than a cpu- and bandwidth-intensive one).
A single worker role running once a day is probably the best approach. I would not use thread sleep though, since you may want to restart the instance and then it may, depening on your programming, start before one day or later than one day. What about putting the task command as a message on the Azure Queue and dequeuing it once it has been picked up by a worker role, then adding a new task command on the Azure Queue once.

Azure Compute : OS Patching, Updates and Downtime

We know that Azure Compute is PaaS so the Operating System (Windows Server 2008 R2) has to be patched and upgraded automatically.
I just wanted to know will there be any downtime during the patching or Compute upgradation...?
If you only have a single instance of a particular VM role, then yes - you'll have a short bit of downtime, as you need to be rebooted. Likewise, if the host OS is patched, you'll have a bit of downtime.
If you run two or more instances, then the SLA kicks in, because your instances are separated into different containers/network branches/etc. These are fault domains. So even if a network segment, router, or entire rack were to go offline, you'd have another instance somewhere else.
During OS updates, your instances are divided into upgrade domains, so that they're not all upgraded at once. This leaves your service in an always-available state, as long as you have two or more instances of your roles. For background processes that aren't customer-facing (say, in a worker role that simply reads from queues and processes queue items asynchronously), you can probably get away with a single instance of that role, provided you can handle the work load and that it would be ok to have occasional processing delays.
See this recent TechNet blog post for more details around fault domains and upgrade domains.

Microsoft Azure Master-Slave worker roles

I am trying to port an application to azure platform. I want to run an existing application multiple times. My initial idea is as follows: I have a master_process. I have many slave_processes. Each process is a worker role in Azure. Each slave_process will run an instance of the application independently. I want master_process to start many slave_processes and provide them the input arguments. At the end, master_process will collect the results. Currently, I have a working setup for calling the whole application from a C# wrapper. So, for the success, I need two things: First, I have to find a way to start slave workers inside of a master worker (just like threads). Second, I need to find a way to store results of the slave workers and reach these result files from master worker. Can anyone help me?
I think I would try and solve the problem differently. Deploying a whole new instance can take 15 to 30 minutes. Adding extra instances to an already running worker role is a little quicker, but not by much. I'm going to presume that you want results faster than that and that this process is something that is run frequently.
I would have just one worker role type that runs your existing logic and as many instances of that worker role that you determine you'll need. Whatever your client is will decide that it needs to break the work up in a certain number of pieces, let's say 10 for the sake of argument. It will give each piece of work an ID (e.g. a guid) and then put 10 messages that contain the parameters and the ID into a queue. Your worker role instances take messages out of the queue, do their work and write their results to storage somewhere (either SQL Azure, Azure Table Storage or maybe even blob storage depending on what the results are). The client polls that storage to wait for all of the results to be complete and then carries on.
If this is a process that is only run infrequently, then rather than having the worker roles deployed all of the time, you could use the same method I've described, but in addition get the client code to deploy the worker roles when it starts and then delete them when it's finished through the management API. There are samples on MSDN on how to use this.
I have a similar situation you might find useful:
I have a large sequential batch process I run in Azure which requires pre and post processing. The technique I used was to use instances of a single multifunctional worker role, but to use a "quorum" to nominate a head node, which then controls the workflow.
They way I do it is using an azure page blob as the quorum (basically a kind of global mutex/lock), because once a node grabs it for writing it's locked. For resilience, in case there's an issue with the head node, all nodes occasionally try to recapture the quorum.

WF4 Affinity on Windows Azure and other NLB environments

I'm using Windows Azure and WF4 and my workflow service is hosted in a web-role (with N instances). My job now is find out how
to do an affinity, in a way that I can send messages to the right workflow instance. To explain this scenario, my workflow (attached) starts with a "StartWorkflow" receive activity, creates 3 "Person" and, in a parallel-for-each, waits for the confirmation of these 3 people ("ConfirmCreation" Receive Activity).
I then started to search how the affinity is made in others NLB environments (mainly looked for informations about how this works on Windows Server AppFabric), but I didn't find a precise answer. So how is it done in others NLB environments?
My next task is find out how I could implement a system to handle this affinity on Windows Azure and how much would this solution cost (in price, time and amount of work) to see if its viable or if it's better to work with only one web-role instance while we wait for the WF4 host for the Azure AppFabric. The only way I found was to persist the workflow instance. Is there other ways of doing this?
My third, but not last, task is to find out how WF4 handles multiple messages received at the same time. In my scenario, this means how it would handle if the 3 people confirmed at the same time and the confirmation messages are also received at the same time. Since the most logical answer for this problem seems to be to use a queue, I started looking for information about queues on WF4 and found people speaking about MSQM. But what is the native WF4 messages handler system? Is this handler really a queue or is it another system? How is this concurrency handled?
You shouldn't need any affinity. In fact that's kinda the whole point of durable Workflows. Whilst your workflow is waiting for this confirmation it should be persisted and unloaded from any one server.
As far as persistence goes for Windows Azure you would either need to hack the standard SQL persistence scripts so that they work on SQL Azure or write your own InstanceStore implementation that sits on top of Azure Storage. We have done the latter for a workflow we're running in Azure, but I'm unable to share the code. On a scale of 1 to 10 for effort, I'd rank it around an 8.
As far as multiple messages, what will happen is the messages will be received and delivered to the workflow instance one message at a time. Now, it's possible that every one of those messages goes to the same server or maybe each one goes to a diff. server. No matter how it happens, the workflow runtime will attempt to load the workflow from the instance store, see that it is currently locked and block/retry until the workflow becomes available to process the next message. So you don't have to worry about concurrent access to the same workflow instance as long as you configure everything correctly and the InstanceStore implementation is doing its job.
Here's a few other suggestions:
Make sure you use the PersistBeforeSend option on your SendReply actvities
Configure the following workflow service options
<workflowIdle timeToUnload="00:00:00" />
<sqlWorkflowInstanceStore ... instanceLockedExceptionAction="AggressiveRetry" />
Using the out of the box SQL instance store with SQL Azure is a bit of a problem at the moment with the Azure 1.3 SDK as each deployment, even if you made 0 code changes, results in a new service deployment meaning that already persisted workflows can't continue. That is a bug that will be solved but a PITA for now.
As Drew said your workflow instance should just move from server to server as needed, no need to pin it to a specific machine. And even if you could that would hurt scalability and reliability so something to be avoided.
Sending messages through MSMQ using the WCF NetMsmqBinding works just fine. Internally WF uses a completely different mechanism called bookmarks that allow a workflow to stop and resume. Each Receive activity, as well as others like Delay, will create a bookmark and wait for that to be resumed. You can only resume existing bookmarks. Even resuming a bookmark is not a direct action but put into an internal queue, not MSMQ, by the workflow scheduler and executed through a SynchronizationContext. You get no control over the scheduler but you can replace the SynchronizationContext when using the WorkflowApplication and so get some control over how and where activities are executed.

Resources