Communication with worker roles in windows azure - azure

I present my problem if anyone can help will be appreciated:
I have a web site (implemented in WebForms in Azure) where the user can make configurations that will result into a XML with all the data of the configuration.
In the other hand, I have a Worker Role which is working and need this XML configuration.
The question is: how can I send this XML file generated by the user to the Worker Role?
I have looked for something similar to a REST API as interol communication seems not to be the correct path to follow.

Save the XML to the blob storage
Use Azure Queue - web app will add the "job" message pointing to XML (XML url + maybe some other data) and worker roles periodically check for "jobs" in this queue and processing them
It is recommended practice in Azure, and it works very well (I am using it on some projects with very high load - hundreds of thousands "jobs" per day. Azure queues are very reliable and fast, those performance issues you have read about were certainly in some "user code".

Related

The Best Way to Handle Big File Uploads in Azure

We are having a VM hosting our web application where its users upload big files to their profiles -mainly they are 3d models- via the Web API of the web app.
What I'm looking to do is to find a way to handle these long running uploading jobs/process efficiently to some place rather than the VM where I'm hosting the application.
I know there are some Azure methods but I want to make it correct. Which is the most efficient way to do that ? Worker Role, a web page running a scripts to upload it to storage or Web Jobs over a queue maybe ?
The function which uploads the file also is having some other processes like generating thumbnails, storing the data to sql and etc.

Multiple Instances of Azure Worker Roles for non-transaction integration tasks

We have an upcoming project where we'll need to integrate with 3rd parties over a variety of transports to get data from them.
Things like WCF Endpoints & Web API Rest Endpoints are fine.
However in 2 scenario's we'll need to either pick up auto-generated emails containing xml from a pop3 account OR pull the xml files from an External SFTP account.
I'm about to start prototyping these now, but I'm wondering are there any standard practices, patterns or guidelines about how to deal with these non-transactional systems, in a multi-instance worker role environment. i.e.
What happens if 2 workers connect to the pop account at the same time or the same FTP at the same time.
What happens if 1 worker deletes the file from the FTP while another is in mid-download.
Controlling duplication shouldn't be an issue, as we'll be logging everything on application side to a database, and everything should be uniquely identifiable so we'll be able to add if-not-exists-create-else-skip logic to the workers but I'm just wondering is there anything else I should be considering to make it more resilient/idempotent.
Just thinking out loud, since the data is primarily files and emails one possible thing you could do is instead of directly processing them via your worker roles first thing you do is save them in blob storage. So there would be some worker role instances which will periodically poll the POP3 server / SFTP site and pull the data from the there and push them in blob storage. When the blob is written, same instance can delete the data from the source as well. With this approach you don't have to worry about duplicate records because blob will be overwritten (assuming each message/file has a unique identifier and the name of the blob is that identifier).
Once the file is in your blob storage, you can write a message in a Windows Azure Queue which has details about this blob (may be blob URL etc.). Then using 'Get' semantics of Windows Azure Queues, your worker role instances start fetching and processing these messages. Because of Get semantic, once a message is fetched from the queue it becomes invisible to other callers (worker roles instances in this case). This way you could take care of duplicate message processing.
UPDATE
So I'm trying to combat against two competing instances pulling the same file at the same moment from the SFTP
For this, I'll pitch my favorite Master/Slave Concept:). Essentially the idea is that each instance will try to acquire a lease on a single blob. The instance which acquires the lease becomes the master and others slave. Master would fetch the data from SFTP while slaves will wait. I've described this concept in my blog post which you can read here: http://gauravmantri.com/2013/01/23/building-a-simple-task-scheduler-in-windows-azure/, though the context of the blog is somewhat different.
have a look the recently released Cloud Design Patterns. you might be able to find the corresponding pattern and sample code for what you need.

Way to share task and results between Azure website and workers

We need to change our system to a two-tiered structure on azure with an Azure website handling requests and adding tasks to a queue which will then be processed in priority order by a set of Azure worker roles. The website will then return the results to the end user. The data and results sets for each task will be largish (several megabytes). What's the best way to broker this exchange of data.
We could do it via an Azure storage blob but they are quite slow. Is there a better way? Up until now we have been doing everything in scaled azure website which allows all instances access to the same disk.
If this is a long-running process I doubt that using blob storage would add that much overhead, although you don't specify what the tasks are.
On Zudio long-running tasks update Table Storage tables with progress and completion status, and we use polling from the browser to check when a task has finished. In the case of a large result returning to the user, we provide a direct link with a shared access signature to the blob with the completion message, so they can download it directly from storage. We're looking at replacing the polling with SignalR running over Service Bus, and having the worker roles send updates directly to the client, but we haven't started that development work yet so I can't tell you how that will actually work.

Architecture design and role communication with Azure in file bound app

I am considering moving my web application to Windows Azure for scalability purposes but I am wondering how best to partition my application.
I expect my scenario is typical and is as follows: my application allows users to upload raw data, this is processed and a report is generated. The user can then review their raw data and view their report.
So far I’m thinking a web role and a worker role. However, I understand that a VHD can be mounted to a single instance with read/write access so really both my web role and worker role need access to a common file store. So perhaps I need a web role and two separate worker roles, one worker role for the processing and the other for reading and writing to a file store. Is this a good approach?
I am having difficulty picturing the plumbing between the roles and concerned of the overhead caused by the communication between this partitioning so would welcome any input here.
Adding to Stuart's excellent answer: Blobs can store anything, with sizes up to 200GB. If you needed / wanted to persist an entire directory structure that's durable, you can mount a VHD with just a few lines of code. It's an NTFS volume that your app can interact with, just like any other drive.
In your case, a vhd doesn't fit well, because your web app would have to mount a vhd and be the sole writer to it. And if you have more than one web role instance (which you would if you wanted the SLA and wanted to scale), you could only have one writer. In this case, individual blobs fit MUCH better.
As Stuart stated, this is a very normal and common pattern. And again, with only a few lines of code, you can call upon the storage sdk to copy a file from blob storage to your instance's local disk. Then you can process the file using regular File IO operations. When your report is complete, another few lines of code lets you copy your report into a new blob (most likely in a well-known container that the web role knows to look in).
You can take this a step further and insert rows into an Azure table that are partitioned by customer, with row key identifying the individual uploaded file, and a 3rd field representing the URI to the completed report. This makes it trivial for the web app to display a customer's completed reports.
Blob storage is the easiest place to store files which lots of roles and role instances can then access - with none of them requiring special access.
The normal pattern suggested seems to be:
allow the raw files to be uploaded using instances of a web role
these web role instances return the HTTP call without doing processing - they store the raw files in blob storage, and add a "do this work message" to a queue.
the worker role instances pick up the message from the queue, read the raw blob, do the work, store the report result, then delete the message from the queue
all the web roles can then access the report when the user asks for it
That's the "normal pattern suggested" and you can see it implemented in things like the photo upload/thumbnail generation apps from the very first Azure PDC - its also used in this training course - follow through to the second page.
Of course, in practice you may need to build on this pattern depending on the size and type of data you are processing.

Worker & Web Role in same application

We have a WebRole which deals with request coming in off a WCF service, this validates then puts the messages into an Azure queue in Cloud storage. In the same application we have a WorkerRole which reads the information from the queue and makes a call to our persistence layer which processes the request and returns the result. We were wondering why the worker roles didn't pick up any of our configuration settings and hence was not providing the Trace information we were looking for. We realised that the worker role was likely looking for an app.config and couldn't find it.
Is there a way to point the worker role to the web config, or to be able to load our Enterprise Library settings into the ServiceConfiguration.cscfg file which in either case would mean both could read from a common place?
Many thanks in advance
Kindo
As far as I'm aware there is no way for your worker role to get access to the web config of a web role out of the box.
If you move your configuration items to the ServiceConfiguration.csfg file and both the worker and web role are in the same cloud project, the settings will be in the same file. But because the web role and the worker role are different projects within that cloud project, their settings are in different sections of that .csfg file. If you want the settings to be the same for both of them, you will have to duplicate the settings.
Putting your settings in this file gives you the advantage that you can change the settings while the roles are running and have the roles respond however you like e.g. you might want certain settings to restart the roles, for others you may just want to update a static variable. In order to update a web.config or app.config you need to redeploy that role.
You do need to be aware though that the ServiceConfiguration file is not a replacement for a webconfig. If you're using tools that look for their settings in a web or app config, unless they're particularly smart and aware of the Azure environment, they won't go looking for settings in the ServiceConfiguration file.
I know you didn't ask this question, but if you're expecting your worker role to be providing an almost synchronous response to your web role by using a request queue and a response queue, you'll probably find that this won't scale very well. If you want to do synchronous calls to a worker role in Azure you're best off using WCF (this isn't highlighted anywhere in the guides). As you said that all of this is just a WCF service anyway, the answer to your problem may be to just do away with the worker role.
You cannot share web.config's OR .cscfg's across Roles in Azure, as there is no guarantee a role is in the same host, cluster, or even datacenter as another role.
If you are simply trying to share items like connection Strings, app-specific variables, etc., I would simply create your own "config manager" class that would obtain some XML and parse it into a collection of settings. This way, you could store that config on Azure Blob Storage and changes would be as simply as updating that blob and signaling your apps to reload. (very easy using the Service Management API).

Resources