I have a WCF service hosted on Windows Azure as a "cloud service." When the service starts, it needs to populate data from files/disk to its memory so it is accessed fast (cached in other words). Right now I'm using like C:\Documents\Filestoprocess folder so that the WCF calls the folder and populates data data in that folder in its memory. I have like 5,000 small files. How do I do this in Azure? Is there a folder path that I can call within the WCF so that the WCF calls these files and opens each files and saves each data in the files? I'm not really looking for complicated Blob access through network using bandwidth. I'm looking for simple disk I/O access to these files from the WCF "cloud service" that is running on its own public web address.
You should try to use a cloud storage service to store data, as if you write to the local file system it can get destroyed on a restart of the service or recycling of the service.
You can look into using the azure drive service, which is like creating a disk dive. It is on top of blob storage.
But if you really want to write and read data on the local file system check out this blog post http://blog.codingoutloud.com/2011/06/12/azure-faq-can-i-write-to-the-file-system-on-windows-azure/
It talks about setting up your service definition to allow writing to the local file system.
Depending on the size of your instances you'll get a non-presistent disk where you can store this kind of temporary data. The minimum is 20GB for an extra small instance. You shouldn't access the disk directly, but you need to use a local resource instead which you can configure in your service definition file or in Visual Studio (double click your Web / Worker Role).
This storage is non-persistent, this means if you delete your deployment, if you decrease the number of instances, in case of hardware problems, ... you loose all data saved here. If you want to persist your files you should use blob storage instead. But in your case, where you need the files as some kind of caching mechanism, local resources are perfect.
And if your goal is to cache data you might want to take a look at the caching features included in Windows Azure: Caching in Windows Azure
Blob access is not complex. In fact, you could do a single download of a zip file from blob storage to local disk, unzip it, then prime your wcf service from those 5,000 small files.
Check out this msdn page documenting DownloadBlobToFile(). The essential parts:
CloudBlobClient blobClient =
new CloudBlobClient(blobEndpoint, new StorageCredentialsAccountAndKey(accountName, accountKey));
// Return a reference to the blob.
CloudBlob blob = blobClient.GetBlobReference("mycontainer/myblob.txt");
// Download the blob to a local file.
blob.DownloadToFile("c:\\mylocalblob.txt");
Now: I don't agree with saving to the root folder on C:. Rather, you should grab some local storage (easily configurable). Once you configure local storage in your role configuration, just ask the role environment for it, and ask for root path:
var localResource = RoleEnvironment.GetLocalResource("mylocalstorage");
var rootPath = localResource.RootPath;
Note: As #KingPancake mentioned, you could use an Azure drive. However: remember that an Azure drive can only be writeable by one instance. You'd need to make additional snapshots for your other instances. I think it's much simpler for you to go with a simple blob, copy your files down (either as single zip or individual files), and go from there.
You mentioned concern with network+bandwidth. You don't pay for bandwidth within the same data center. Also: It's extremely fast: 100Mbps per core. So even with a Small instance, you'll have your files copied down very quickly, moreso when you go to larger instance sizes.
One last thought: The only other ways to gain access to your 5,000 files, without using blob storage or Azure Drives (which are mounted as vhd's in blob storage) would be to either download the files from an external source or bundle them with your Windows Azure package (and then they'd show up in your app's folder, under whatever subfolder you stuck them in). Bundling has two downsides:
Longer time to upload your deployment package due to added size
Inability to change any of the individual files without redeploying the package.
By storing in a blob, you can easily change one (or all) of your small files without redeploying your code - you'd just need to signal it to either re-read from blob storage or restart the instances so they automatically download the new files.
Related
I have a VM on Azure which is my content management system using nodejs and mongodb.
One of things the CMS does is have a social sharing function where html pages are created and users are given the url to this page.
I expect a large volume of users (probably 5000 at a given time) access the http pages. I do not want this load to be on the same server as my CMS.
So I was thinking about moving the html pages to another server. My question is do I need to look at Azure blob storage to do this or should I just use another VM and put files there?
The files are very small and minified. I want to keep my costs down whilst at the same time if I get more than 5000 requests, the server should auto scale.
The question itself is somewhat subjective/opinion-soliciting. And how you solve this problem is really up to you.
But from an objective perspective:
Blobs themselves are not the same as local file storage. If you're going to store content in them, either your CMS needs to support them natively or you're going to need to build that support into it (if that's even possible). Since they have their own REST API (and related SDKs) you cannot simple do file I/O operations against them. They are, however, accessible via URI (which may be made private or public).
Azure VMs store their disks (vhd's) in page blobs (so, you're already using blob storage technically speaking). And each VM may have attached disks (1TB each) also in page blobs, two disks per core (so a dual-core VM supports 4 attached 1TB disks). Just like your OS disk, these attached disks are durable, in blob storage. A CMS may access an attached disk once it's formatted and given a drive letter (Windows) or mounted (Linux). EDIT - forgot to mention: If you go with the attached-disk approach, you need to consider the fact that these disks are per-VM. That is, they are not shared across multiple VM's (in the event you scale your CMS to multiple instances).
Azure File Service is an SMB share sitting atop Azure Blob Storage. Again, durable storage, and drive-mappable. EDIT unlike attached disks, Azure File Service SMB shares are accessible across multiple VM's.
I have an Azure Web App, which will generate pdf files at runtime and write them to disk. Can I trust that these files will be persisted?
I am concerned that if my image is spun down and brought back up again then the files might have disappeared.
Or perhaps Azure decides to move the website to a different machine or different datacentre, where these files would not exist.
I know there are cloud based options such as blob storage, but I would prefer the simplicity of writing to disk and having access over FTP.
Anything that you write under the d:\home folder is guaranteed to be persisted. See the File System section in this for more details on this topic.
I have an old legacy application built on .NET remoting, and transferring data via XML via with FTP.
Esentially, a CRM system is sending XML files to a directory on the web server, which has a windows service that uses a filewatcher to process the incoming XML file, updating the database.
Similarly, changes on the web application serialize down into an XML file into an out folder, that the CRM polls via FTP every 5 minutes.
Trying to map the best services to convert this to for Azure.
You could use Azure Blobs or Azure Files for this.
Azure Blobs: This is the lowest cost option, while still providing high throughput. However, note that Azure Blobs do not have File Watcher functionality, so you would have to poll the directory every few minutes to check for a new file. If you delete files after processing them, then this is really easy - all you have to do is list and see if there are any files. If you want to retain the files, then you might have to do more, since the file list will get big over time. Let me know if this is the case and I can suggest some options.
Azure Files: This is an SMB share that you can mount from a VM in the same region. This will map pretty closely to your exising filesystem based code, including FileWatcher. However, note that Azure Files can only be mounted by a VM in the same region.
I've just setup an extra small VM instance in Windows Azure to run a help console for our company. The help files can be updated and published through a simple .NET interface. Obviously the flat html files are getting deployed to the local drive on the VM and exposed publicly through IIS. I'm just wondering how stable this is? If the VM suffers a hardware failure, presumably there's no automatic failover and any edits we've made to the help system will be lost?
Can anyone recommend a way I can shuttle the source files out of the VM into blob storage? I could write a an application to do this, I'm just wondering if there is an out-of-the-box solution out there?
Additional information:
The VM instance is running Server 2008 R2 SP1 (As a Virtual Machine not a web-role)
A backup needs to be created once every 24 hours
Aged backups (3+ days old) need to be automatically cleared from the blob container
The help system we use is called HelpConsole 2012
New pages are added at a rate of myabe 2-3 per week
The answer depends on how whether you are running this in a Windows Azure Virtual Machine or on a Windows Azure Web role.
If you are running this on a Windows Azure Virtual Machine, then the VHD is stored in BLOB storage and, if the site is running of the C: drive and not on a data Disk, then the system has some Host caching turned on for both reads and writes. In this scenario it is possible (depending on the methods you use to write your files out) that the data is not pushed back to the VHD in BLOB storage before a failure occurs. You can either ensure that your writing methods do a write through operation, or turn off the write caching. Better yet, attach a data disk for your web site files. By default data disks have both read and write caching off (you could turn on read caching). Since the VHD's are persisted you don't have to worry about the concern of the edits getting lost. You can script out taking a snapshot of the files and move them to BLOB storage separately, or even push them somewhere else. Another thing to think about with this option is that you have to care for the VM instances and keep them patched and up to date.
If you are running a Web Role, then yes, if a failure occurs and the VM goes through self healing it will indeed redeploy with the older files. In this case I'd recommend changing the code in the web role that when it writes the updates to the local file it also puts a copy of the local file into BLOB Storage. In addition, in the web role OnStart you could reach out to BLOB storage and pull down all the new content locally. BE VERY CAREFUL with this approach though because it only really works well for ONE instance, not multiple. If you plan on running multiple instances of the server (and you will have to if you want the SLA for uptime) then your code will need to be a little more robust and do writes out to BLOB storage and then alert all instances of the role that there is a new file to pull down locally.
Another option for web roles is to also write a handler for the content so that requests come in and are mapped to a file BLOB Storage directly. Then updates can occur to direct edits to the file in BLOB storage. This offloads the serving of the flat files from your compute nodes to BLOB storage and you could even implement some caching and stream the content back through the handler rather than having them hit BLOB storage directly if you wanted to.
Now, another option, is to use Windows Azure Web Sites for this. The underlying storage of the web site files in Windows Azure Web Sites is a shared location and thus updating the files in it will immediately be reflected for all instances. Also, the content for the site is stored in BLOB storage and can be updated via FTP, source control, or directly from code. Lots of options here. You may end up moving to reserved instances to help keep away from some of the quotas that Web Sites have. Web Sites may not be an option for you currently depending on other requirements (as in how much control do you need over the environment since you don't get a lot of control for Web Sites).
How would I write to a tmp/temp directory in windows azure website? I can write to a blob, but i'm using an NPM that requires me to give it file names so that it can directly write to those filenames.
Are you using Cloud Services (PaaS) or Virtual Machines (IaaS).
If PaaS, look at Windows Azure Local Storage. This option gives you up to 250gb of disk space per core. Its a great location for temporary storage of information in a way that traditional apps will be familiar with. However, its not persistent so if you put anything there you need to make sure will be available if the VM instance gets repaved, then copy it to Blob storage. Also, this storage is specific to a given role instance. So if you have two instances of the same role, they each have their own local storage buckets.
Alternatively, you can use Azure Drive, which allows you to keep the information persisted, but still doesn't allow multiple parallel writes.
If IaaS, then you can just mount a data disk to the VM and write to it directly. Data disks are already persisted to blob storage so there's little risk of data loss.
Just from my understanding and please correct me if anything wrong.
In Windows Azure Web Site, the content of your website will be stored in blob storage and mounted as a drive, which will be used for all instances your web site is using. And since it's in blob storage it's persistent. So if you need the local file system I think you can use the folders under your web site root path. But I don't think you can use the system tmp or temp folder.