Azure VM pre-made storage account: prerequisites - azure

In Azure, when creating a VM, you can choose to "Use an automatically generated storage account".
Which, generates a geo-replicated storage account.
Can I create a zone replicated storage account and use that? I have tried, but it does not show up in the list.
What are the prerequisites for a storage account to be usable for the VM's?
Is this documented somewhere?

I think the only restriction is that you can't choose a ZRS account to be used by VMs because VMs are essentially stored as Page blobs in blob storage and ZRS account only support block blobs. From the storage team blog here - http://blogs.msdn.com/b/windowsazurestorage/archive/2014/08/01/introducing-zone-redundant-storage.aspx:
As you can see, these options provide a continuum of durability and
availability options. ZRS fits between LRS and GRS in terms of
durability and price. ZRS stores 3 replicas of your data across 2 to 3
facilities. It is designed to keep all 3 replicas within in a single
region, but may span across two regions. ZRS currently only supports
block blobs. ZRS allows customers to store blob at a higher durability
than a single facility can provide with LRS. ZRS accounts do not have
metrics or logging capability enabled at this time.

Related

How to increase availability for Azure storage account?

For Azure storage accounts the SLA for write requests is 99.9% regardless if I'm using LRS, ZRS, GRS or RA-GRS. Is there a way to increase the SLA for write requests on the storage account?
E.g is there a good way to fail over to another storage account in another region?
The accounts don't have to contain the same data. I just want to always be able to store the blobs.
Is there a good way to fail over to another storage account in another region?
Of course, Azure storage itself provides this feature, you can refer to the document Initiate a storage account failover.
Before you can perform an account failover on your storage account, make sure that your storage account is configured for geo-replication. Your storage account can use any of the following redundancy options:
Geo-redundant storage (GRS) or read-access geo-redundant storage (RA-GRS)
Geo-zone-redundant storage (GZRS) or read-access geo-zone-redundant storage (RA-GZRS)
You will see an interface like this:
Is there a way to increase the SLA for write requests on the storage account?
My suggestion is to increase the number of retries for write requests, maybe the example here can help you, you can use BlobClientOptions.

Azure Blob Storage: Does Microsoft Implement Redundant Backups?

I've searched the web and contacted technical support yet no one seems to be able to give me a straight answer on whether items in Azure Blob Storage are backed up or not.
What I mean is, do I need to create a twin storage account as a "backup" and program copies of all content from one storage to another, or are the contents of a client's Blob Storage automatically redundantly backed up by Microsoft?
I know with AWS, storage is redundantly backed up via onsite drives as well as across other nodes in the cluster.
do I need to create a twin storage account as a "backup" and program
copies of all content from one storage to another, or are the contents
of a client's Blob Storage automatically redundantly backed up by
Microsoft?
Yes, you will need to do backup manually. Azure Storage does not back up the contents of your storage account automatically.
Azure Storage does provide geo-redundant replication (provided you configure the redundancy level for your storage account as GRS or RA-GRS) but that is not back up. Once you delete content from your primary account (location, it will automatically be removed from secondary account (geo-redundant location).
Both AWS (EBS) and Azure(Blob Storage) options provides durability by replicating the data across different data centers. This is for the high availability and durability of the data to provide the guarantee by the cloud provider.
In order to ensure that your data is durable, Azure Storage has the
ability to keep (and manage) multiple copies of your data. This is
called replication, or sometimes redundancy. When you set up your
storage account, you select a replication type. In most cases, this
setting can be modified after the storage account is set up.
For more details refer the replication section in documentation.
If you need to capture changes to the storage and allow restore to previous versions (e.g In situations like data corruption or application feature requirements like restore points, backups), you need to take a SnapShot manually. This is common for both AWS and Azure.
For more details on creating a Snapshot of Blob in Azure refer the documentation.

Azure storage account: general purpose vs blob storage

Having the need to store and access blobs which type of storage account is the most appropriate? Both types (general purpose and blob storage) seem to support blobs and in addition to this general purpose accounts allow selecting default or premium performance while blob storage accounts allow only default performance but on the other hand they also allow selecting the access tier (cool or hot).
In the end I find unclear what would be the best option.
A few differences between Blob and General Purpose Storage Accounts:
Blob storage account only support blobs while General purpose storage accounts support blobs, files, queues & tables (some exception apply - please see note about replication below). So if you ever need these additional services, you may want to choose general purpose accounts over blob accounts.
Blob storage account only supports block and append blobs while General purpose storage accounts support block, append & page blobs (some exception apply - please see note about replication below). So if you need to create virtual machines, you would want to choose general purpose accounts over blob accounts.
Blob storage accounts support both Hot and Cool access tier while General purpose storage accounts only support Hot access tier. So if you need to use Cool access tier i.e. use storage primarily for near-shore archiving, you would want to choose blob accounts over general purpose accounts.
You may want to be careful in choosing replication type in general purpose accounts as features offered varies by replication types.
LRS, GRS, RAGRS: Supports everything. Blobs (Block, Append, Page), Files, Queues & Tables.
ZRS: Supports only block blobs and nothing else.
Premium LRS: Supports only page blobs and nothing else.
Microsoft's guidance on this (as of 7/13/2018) recommends using General Purpose v2 (GPv2) Storage Accounts over Blob Storage accounts for two reasons I recently discovered (there might be more):
They offer more flexibility in terms of what you can store and do with them (Queues, Tables, Files and/or Blobs vs just Blobs). Ref: Microsoft Azure Documentation - Azure Storage account options
Microsoft recommends using general-purpose v2 storage accounts over Blob storage accounts for most scenarios.
There are more integration options with GPv2 accounts including Azure Function Triggers via Event Grid. Ref: Microsoft Azure Documentation - Azure Blob storage bindings for Azure Functions
Blob-only storage accounts are supported for blob input and output bindings but not for blob triggers. Blob storage triggers require a general-purpose storage account.

What is the benefit of having linked storage account for HDInsight cluster?

For an HDInsight cluster there has to be at least one azure storage account which is its default storage account -- it is required so that it is treated as its fs (filesystem). This I get. But what about optional linked azure storage accounts? From ADF (Azure Data Factory) perspective at least, do we need to have a storage account added as linked storage account to an HDInsight cluster? Anyway the Azure storage account is accessible purely by providing just two pieces of information --- the account name and the key. Both these things are specified in Linked Servers in ADF. This guarantees the access of the storage account. What is the real benefit of having some account added as linked storage account, from ADF point of view or otherwise? Basically, what I am asking is -- is there anything that we can't do purely using account name and key without adding the account as linked storage for the given HDInsight cluster?
The main reason to have additional accounts is because they have limits. A storage account can have 500 TB of data in it and 20000 request per second. Depending on the size of your cluster and work load you might hit the request limit. If you are worried about those limits and you don't want to manage alot of storage accounts you should look into Azure Data Lake.
I think I sort of figured out the answer. With linked storage accounts the cluster, when used as a compute, can directly access BLOBS on those storage accounts without requiring us to separately specify the storage keys in queries. That's the use case for which linked storage is a must have.

Can I use Azure Storage geo-replication as source?

I know Azure will geo-replication a copy of current storage account to another location,
my questions is: can I access another location in program, even just read only
I asked this, because this allow me to build another deploy in different geo-location for performance and disaster-proof like what Azure did. For current setup, if I use same source of storage in different geo-location, I have to pay extra bandwidth cost.
You can only access your storage account by its primary name. In the event of failover, that name will be mapped to the alternate datacenter. You cannot access the failover storage directly, nor can you choose when to trigger a failover. For a multi-site setup as you described, you'd need to duplicate your data (which would then add the cost of storage in datacenter #2). This does give you ultimate flexibility in your DR and performance planning, but at an added cost of storage and bandwidth (egress-only).
Last week the storage team announced read-only access to the failover storage: Windows Azure Storage Redundancy Options and Read Access Geo Redundant Storage.
This means you can now deploy your application in a different datacenter which can be used for "full" failover (meaning that the storage will also be available there). Even if it's only read-only, your application will still be online - but simply in "degraded" mode.
The steps on how you can implement this with traffic manager are described here: http://fabriccontroller.net/blog/posts/adding-failover-to-your-application-with-read-access-geo-redundant-storage-and-the-windows-azure-traffic-manager/

Resources