We have a windows-based app service that requires a large dataset to run (files stored on Azure Blob Storage at around ~30GB). This data is static per app version, and therefore should be accessible to all instances across a given slot (a slot in our case represents a version).
Based on our initial research, it seems like Persistent Storage (%HOME%) would be the ideal place for this, since data stored there is shared across instances, but not across slots.
The next step now is to load the required data as part of our devops deployment pipeline, since the app service cannot operate without the underlying data. However, it seems like the %HOME% directory is only accessible by the app service itself, even though the underlying implementation is using Azure Storage.
At this point, we're considering having the app service download the data during its startup, but then we hit a snag which is that we have two instances. We could implement a Mutex (using blob lease) but this seems to us to be too complicated a solution for a simple need.
Any thoughts about how to best implement this?
The problems I see with loading the file on container startup are the following:
It's going to be really slow, and you might hit one of the built-in App Service timeouts.
Every time your container restarts, or you add another instance, it will re-download all the data, and it might cause issues with blocked writes because of file handle locks, which can make files or directories on %HOME% completely unaccessible for reading and modifying (I just had this happen to me).
For this I would rather suggest connecting the app to Azure Files by SMB, and for example have a directory per each version. This way you can connect to Azure Files and write the data during your build pipeline, and save an ENV variable or file that tells each slot which directory to get the current version's data from.
Related
I am relatively new to cloud, so please guide me through complete process.
I have an application that will be hosted on containers in cloud environment. I want some temporary storage on the container or the cloud environment, and access it via my web application (written in C#), meaning I will generate a file and keep it there. First of all, is it possible without costing me extra? Secondly, if it is possible, how can I access the area with C# code? And even if it costs me extra, will I have any access issues?? Also, please let me know the limitations of that free space, in terms of storage, accessibility and cost.
Using an App Service you can store temporal files inside %TMP% folder which is maped to %SYSTEMDRIVE%\local\Temp with no extra cost.
Depending on your App Service Plan you will have from 10GB to 140GB free space to store files. Beware, your files will dissapear if App service is restarted.
Refer to this link:
https://github.com/projectkudu/kudu/wiki/Understanding-the-Azure-App-Service-file-system
I have the following requirements, where I consider using Azure LogicApp:
Files placed in Azure Blob Storage must be migrated into a custom place (it can be different from case to case)
Amount of files is something about 1 000 000
When the process is over, we should have a report saying how many records (files) failed
If the process stopped somewhere in the middle, the next run must take only files that have not been migrated
The process must be fast as it can be and files must be migrated within N hours
But what makes me worried is the fact that I cannot find any examples or articles (including official Azure Documentation) where the same thing is achieved by Azure LogicApp.
I have some ideas about my requirements and Azure Logic App:
I think that I must use pagination for dealing with this amount of files because Azure Logic App will not be able to read millions of file names - https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-exceed-default-page-size-with-pagination
I can add a record into Azure Table Storage to track failed migrations (something like creating a record to say that the process started and updating it when the file is moved to the destination)
I have no ideas how I can restart the Azure Logic App without using a custom tracking mechanism (for instance it can be the same Azure Table Storage instance)
And the question about splitting the work across several units is still open
Do you think that Azure Logic App is the right choice for my needs or I should consider something else? If Azure LogicApp can work for me, could you please share your thoughts and ideas on how I can achieve the given requirements?
I don't think logic app is a good solution for you to implement the requirement because the amount of files is about 1000000, that's too much. For this requirement, I suggest you to use Azure Data Factory.
To migrate data in azure blob according data factory, you can refer to this document
I heard a lot that Azure Web Applications uploaded content such as images or any files should be stored in Azure Storage service, not in the app File System.
But I would like to keep the solution simple and store those files into local file system.
Does storing images or any files on the application's local file system hampers somehow the application deployment with more than one instance?
After researching a lot, I understand that, unlike Amazon Beanstalk, all instances of a Web App share the same file storage. despite of ecah apps runs into a different VM, they file system is the same.
The best way to think about it is that all the instances of your app map to the same network drive share that have your files.
when you spin up 2 or more instances the files in the file are using that same storage based file system even if you only have one instance.
You can see that easily by dropping a file (e.g. via FTP), and seeing it reflected instantly in all instances.
Sources: Microsoft, This question and this question
By storing files on the Web application, you're limiting your ability to scale.
The web server/app should do one thing: process your request and output the HTML. Everything else should be handled elsewhere if you truly want to take advantage of cloud computing. So for this instance you should store any files off the web app.
Now if you want to keep this as simple as possible and you're not overly concerned with achieving scale, there's really nothing wrong with your approach. It's my understanding that Web Apps, don't actually spin up new virtual machines for you, or if they do, they replicate exactly what is on the VM. For instance think about this -- if you had all your files stored on one VM and you spun up two new ones, you'd have to copy all those files over to the next two, and now you'd have to create a way to sync all the uploaded files among all your VMs.
I don't actually think you'd run into this problem with Azure Web Apps, but it i a problem that can arise if you're handling the VMs yourself or through an auto-scaling policy. You'll definitely run into the issue if you decide to spin up new web apps in different regions (Say the Ireland region to your EU customers get better performance - you'd now have two different locations where files could be uploaded to and you'd need to sync them as opposed to uploading them to Azure storage all along and keepin them centrally located)
I am trying to a deploy an app which has a frontend app and a backend worker. The worker runs a CPU intensive process. Now my requirements are to run the web app in a Azure A0 instance while the CPU intensive process runs in a D2 instance. Now both the instance must be able to share the files. I have read at places where they spoke of SBS.
I tried creating the linux VMs in same cloud services but couldnt figure out how to ssh into them separately since they use the same cloud service url. i followed this http://azure.microsoft.com/en-us/documentation/articles/cloud-services-connect-virtual-machine/
to create the 2nd vm.
Can anyone suggest me as how to achieve this setup? Also if possible how do i check if the disks are available to both the instances?
Azure docs aren't as helpful as aws. :(
If the two VMs just want to share files and you don't want to go to the extra effort of coding for blob storage then consider Azure Files which exposes an SMB share against a blob storage back end. This allows you to do standard file IO operations instead of custom blob storage code. See http://blogs.msdn.com/b/windowsazurestorage/archive/2014/05/12/introducing-microsoft-azure-file-service.aspx which shows how to create the file share in Windows and Linux VMs.
[Probably easier to give an answer here]
BlobStorage is a universal storage container that can effectively act as the common drive you are looking for. Access to the blob storage container is made over HTTP / HTTPS either through a BlobStorage Client or over REST, where you will have functions to upload, download, list objects, etc.
For Python, you'll hopefully find this article sufficient although I've no experience with Python on Azure to comment, or if choosing REST and http requests - that should work fine.
HTH
I am considering moving my web application to Windows Azure for scalability purposes but I am wondering how best to partition my application.
I expect my scenario is typical and is as follows: my application allows users to upload raw data, this is processed and a report is generated. The user can then review their raw data and view their report.
So far I’m thinking a web role and a worker role. However, I understand that a VHD can be mounted to a single instance with read/write access so really both my web role and worker role need access to a common file store. So perhaps I need a web role and two separate worker roles, one worker role for the processing and the other for reading and writing to a file store. Is this a good approach?
I am having difficulty picturing the plumbing between the roles and concerned of the overhead caused by the communication between this partitioning so would welcome any input here.
Adding to Stuart's excellent answer: Blobs can store anything, with sizes up to 200GB. If you needed / wanted to persist an entire directory structure that's durable, you can mount a VHD with just a few lines of code. It's an NTFS volume that your app can interact with, just like any other drive.
In your case, a vhd doesn't fit well, because your web app would have to mount a vhd and be the sole writer to it. And if you have more than one web role instance (which you would if you wanted the SLA and wanted to scale), you could only have one writer. In this case, individual blobs fit MUCH better.
As Stuart stated, this is a very normal and common pattern. And again, with only a few lines of code, you can call upon the storage sdk to copy a file from blob storage to your instance's local disk. Then you can process the file using regular File IO operations. When your report is complete, another few lines of code lets you copy your report into a new blob (most likely in a well-known container that the web role knows to look in).
You can take this a step further and insert rows into an Azure table that are partitioned by customer, with row key identifying the individual uploaded file, and a 3rd field representing the URI to the completed report. This makes it trivial for the web app to display a customer's completed reports.
Blob storage is the easiest place to store files which lots of roles and role instances can then access - with none of them requiring special access.
The normal pattern suggested seems to be:
allow the raw files to be uploaded using instances of a web role
these web role instances return the HTTP call without doing processing - they store the raw files in blob storage, and add a "do this work message" to a queue.
the worker role instances pick up the message from the queue, read the raw blob, do the work, store the report result, then delete the message from the queue
all the web roles can then access the report when the user asks for it
That's the "normal pattern suggested" and you can see it implemented in things like the photo upload/thumbnail generation apps from the very first Azure PDC - its also used in this training course - follow through to the second page.
Of course, in practice you may need to build on this pattern depending on the size and type of data you are processing.