Why can't I store data in an Azure compute instance? - azure

I have deployed a Worker Role to an Azure instance with remote access enabled.
When I remote to the server, I see disks C: and D: on the server.
I was told that Azure doesn't guarantee the durability of data stored in compute instance. However when I reboot/upgrade the service, I still see the previous data on disks C: and D:.
When will the data on disks C: and D: be lost?

Local disks are non-durable disks. In other words, not replicated. They may fail at any time and offer you no way to recover this data.
During role recycles (reboots), data typically will survive, but you cannot count on it surviving.
If your software must use a drive letter because you can't alter the code base, you can mount an NTFS volume inside a Page Blob (basically a Cloud Drive). You can do this from your OnStart(), then pass the drive letter to your app. Note: a cloud drive may only have one writer. So... if you have multiple instances, each instance would need to create its own cloud drive.

Because Azure is a cloud service, the hardware that your instance is running on is not guaranteed to be the same at any given point in time. As a result, you shouldn't rely on the data being present. Even though it may persist across reboots/upgrades, it isn't guaranteed.
See the second paragraph on Local Storage from this article. It makes the following recommendation:
If you require reliable durability of your data, want to share data between instances, or access your data outside of Windows Azure, consider using a Windows Azure Storage account or SQL Azure Database instead

It will usually be there after a reboot, but I have seen one case where I rebooted and something went wrong, so the instance was reset to a clean state. You cannot rely on the data surviving. I would imagine the same thing could happen with an upgrade.
Stopping and starting the instances will also probably lose the data, but I haven't checked.
Here's a quote from an MVP on the MSDN forums:
The local disk storage of Compute VMs (whether Web Role, Worker Role, or VM Role) is not persistent. It can go away at any time. The data center has the right to move and re-create your VMs whenever it deems it necessary. This could happen in response to a hardware failure, or simply because the data center needs to be reorganized. When this happens, you lose your VM disk files and go back to your deployment image. It is only a matter of time before this happens. This is normal behavior for cloud computing compute instances.

Related

Can you move/copy Azure virtual machines to a different instance?

If I setup a server running my application on an azure instance, for example A1 can I later change the instance to D2?
I might want to experiment with a VM at a lower cost but then move to a higher performing machine at a later date without having to rebuild everything.
Yes, you can change the size of Azure VM on demand. Changing the size will trigger a machine reboot and if you're using a configuration with SSD temporary drive, the content of the SSD will get erased. Other than that, everything else will be left untouched.
Drew, the Principal PM in this area has a great blog here about this.
You can only resize a VM to another offering that does not have fundamentally different hardware. Since A-Series and D-Series VMs have similar hardware, you would be able to swap those two around. You would not be able to go from A-Series to G-Series though. In addition you need to look at VM availability per region if you want to swap to something only in certain areas, as well as look at if you are using an ASM or ARM VM.
If you have an existing VM, you can check what it can swap out with in the new portal under "Size" in the VM Settings.
This will allow you to reboot into a different machine type, however any temp storage will be erased as with any VM reboot. You just need to ensure you are storing your persistent data in external storage.
You can learn more about the VM size offerings here.

How to store (and query) the MaxMind GeoIP2 database in Azure?

In an Azure Web App I need to efficiently query the MaxMind GeoIP2 City Database (due to the volume of queries and the latency requirements we cannot use the MaxMind's rest API).
I'm wondering what's the best approach for storing the db (binary MMDB format, accessed via the official .NET api) so that it's easy to update with minimal downtime (we are going to subscribe Monthly updates) and still cost effective as to what regards Azure storage and transactions.
Apparently block blobs are the way to go, but I'm not sure about the monthly updates and the fact that the GeoIP2 api load in memory the whole db (I do not know if this would be a problem for the Web App, if I need a web worker to keep it up or I need something else), but actually I do not know yet how large the file is.
What's the most cost effective solution that preserve low latency over a huge volume?
According to the API docs you must have the database available in a file system (the API doesn't know anything about Azure storage and related REST API). So, regardless where you permanently store it, you'll need to have it on a disk somewhere.
I have no idea how large the database footprint is, but Web Apps, Cloud Services (web/worker roles) and Virtual Machines (whether Linux or Windows) all have local disks. And you have read/write access to these disks. So, you'd need to copy the database binary file (or csv) to local disk from somewhere. At this point, when you initialize the SDK, you'd create a DatabaseReader and point it to your locally-downloaded copy of the database file.
You mentioned storing the database in blob storage. There's nothing stopping you from doing so and simply downloading a copy to local disk. And there's nothing stopping you from storing multiple versions in multiple blobs. Note: You may also take advantage of Azure File storage (an SMB share). Which you choose is up to you.
As far as most cost effective solution: You'll need to do the pricing workup yourself to see what's most effective. You'd also need to evaluate how much RAM is available for the given size VM/role instance/Web App you choose. You mentioned Web Apps in your question: Web App instances scale from 0.5GB to 14GB, depending on the tier you choose (again, you'll need to evaluate this).

Possible to keep two vhd's in sync on azure vm's

This is scenario than a specific technical question.
I have two azure vm's who run a web application in load balanced mode.
as per this article http://asheej.blogspot.in/2014/03/load-balancing-using-windows-azure.html
both virtual machines are attached an additional disk which stores images which are referred from web application hosted in vm's IIS.
Now What would be the best way to keep contents on two vm hard drives in sync.
For example, If i delete, add a data from vhd of first vm then that should also be affected on second vm.
Is there anything possible, probably using a common vhd for both machines which will take sync out of question.
Before going into solution , let me briefly touch base on the VM and disk relationship.
Typically a VM contains 3 Disks attached to them 1. OS Disk 2. Temporary Disk and 3.Data Disks. The VM will have lease on all these disks ,the only way to write into data disks is via the VM.
The C: Disk is persistent, meaning when the VM get rebooted the data in the disk is retained. But the D:\ is non persistent , when you reboot the disk will be fully wiped clean. So at any point in time the D:\ shouldn't be used to store any user data.
So writing a process to sync between two VM's just to keep pictures in sync is not very ideal. You might know this already , but wanted to set context for the choice of options provided below.
Your potential options are as follows
You can setup File Share using the new Azure File Service (In Preview) http://blogs.technet.com/b/uspartner_ts2team/archive/2014/06/09/setting-up-a-file-share-for-the-new-azure-file-service.aspx. This will be single source for all your images and you don't need to worry about syncing of files.
2.Store the images in the Azure Blob and access them from the application that's running in the VM http://blogs.msdn.com/b/yaohuang1/archive/2012/07/02/asp-net-web-api-and-azure-blob-storage.aspx and http://www.nickharris.net/2012/11/how-to-upload-an-image-to-windows-azure-storage-using-mobile-services/
3.Host another VM as a Webserver and host your images from there. Then the two VM's can refer the image. The cost here will be to hosting the VM.
The key point with all the 3 potential options there is no need sync the files in two different places , everything is in single place.
Edited based on new information:-
In your scenario hosting your files into VM is not the right approach. You should take the following into consideration even for the short term solution , if you are using Azure LB.
Azure Load Balancer uses a 5 tuple (source IP, source port, destination IP, destination port, protocol type) to calculate the hash that and map traffic to available servers and also the distribution is fairly random. So if you load balance the VM, you cannot control which VM the images are accessed.
Manual updates is not possible in this scenario.
You either need to setup an virtual network to allow you to create and share a windows file share OR you should investigate the use of Azure File Service for creating a share that both VMs connect to (see: http://blogs.technet.com/b/uspartner_ts2team/archive/2014/06/09/setting-up-a-file-share-for-the-new-azure-file-service.aspx).

About windows azure blob storage, the implementation of a project should not depends on the cloud platform

We plan to migrate the existing website to Windows azure, and i have been told that we need to store files to blob storage.
My questions is:
If we want to use blob storage, that means i need to re-write the file storage function(we use file system for now), call blob service api to store files, that's very strange for me just because we want to use windows azure, how about in the future we want to use Amazon EC2 or other cloud platform, they might have there own way to store file, then may be i need to re-write the file storage function again, in my opinion , the implementation of a project should not depends on the cloud platform(or cloud server)! Can any body correct me, thanks!
I won't address the commentary about whether an app should have a dependency on a particular cloud environment (or specific ways to deal with that particular issue), as that's subjective and it's a nice debate to have somewhere else. What I will address is the actual storage in Azure, as your info is a bit out-of-date.
One reason to use blob storage directly (and possibly the reason you were told to use blob storage) is that it provides access from multiple instances of your app. Also, blob storage provides 500TB of storage per storage account, and it's triple-replicated within the deployed region (and optionally geo-replicated). With attached storage (either with local disk or blob-backed Azure Disk), the access is specific to a particular instance of your app. Shifting from file system access to blob storage access does require app modification.
If you choose not to modify your app's file I/O operations, then you can also consider the new Azure File Service, which provides SMB access to storage (backed by blob storage). Using File Service, your app would (hopefully) not need to be modified, although you might need to change your root path.
More information on Azure File Service may be found here.
Why does it seem strange? You need to store your files somewhere and the cloud is a good a place as any IF it suits your needs. The obvious advantages are redundancy and geo replication, sharing files across multiple projects and servers, The list goes on. It's difficult to advise on whether it would be a good idea or not without hearing some specifics.
You could use windows azure storage with amazon in the future if you wanted to (you'd just need to set up the access for it), obviously with slighter longer delay. Then again that slight performance drop may be significant and you may end up re-writing it.
Most importantly, swapping over from one cloud provider to another is not trivial depending on just how much you use it or how much data you've got in it, so I would strongly suggest looking at the advantages / disadvantages of each platform closely before putting your lot in with either one and then fully learn that platform.
Personally, I went for Azure cloud services + storage etc even though it was slightly more expensive at the time, because i'm a Microsoft Person (not that I didn't do my research). It was annoying in the early days when key features were missing, but it's really matured now and I like the pace that it's improving.
It's cheap to test, why not try both and see which one suits you? A small price to pay when you have big decisions to make.
Disclaimer: I don't know the current state of Amazon web services.
Nice question. We are in the middle of a migration of an old PHP/MySQL/LocalShare to WebRole/SQLAzure/AzureStorage ERP application. We faced the same problem and decision. Let me write some thoughts about the issue :
It is a good option to just be able to switch the storage provider but is it reasonable? You can always build the abstraction but do you plan how to do the actual change of storage provider - migration/sync while in production? What kind of argument will exactly drive the transition to another storage provider? How much users and data do you have? Do you plan to shard-rebalance the storage in the future? How reliable must be this system during this storage provider switch? Do you want to totally move the data when you want to switch or you just want to shard it so that you start using this different provider? Does the cost development of these (reliable) storage layers and the cost of development of reliable transitions (or bi-directional syncs) outweighs the money difference between any two storage providers?
Just switching storage mechanism from Azure Blob to Amazon will incur heavy latency penalty if your other services are on Azure - When you create Storage and Services on Azure you set affinity groups by region so that you minimize the network latency.
These are only a few of the questions to answer before doing all the weightlifting. We have abstracted the file repository (blob) because we planned to move from local NFS to Blob transparently and gradually and it answers our needs.

Azure Sql Server VM Recycle

When a Sql Server VM (IaaS VM) is recycled (for whatever reason, hardware failure, etc) and comes up on a different VM) can my app continue using the database (once the new VM comes up) automatically or do I manually have to restore the database? Do I manually have to attach the disks again? Even if the disks are attached automatically, will the database restore also happen automatically? If the disks are available automatically, will my data files also be present on the new VM?
In case of a hardware failure, the experience to the guest O/S - your SQL Server VM - would be as if someone had pulled the power cord and restarted the server. configuration of the VM will remain the same, including all attached disks etc. but operations that were not yet persisted to disk may have been lost. the impact depends on how you use the server in terms of transactions etc.
I believe that scheduled maintenance is a bit more organised and the VM will be shut down in an orderly fashion increasing the consistency for your application.

Resources