Having the need to store and access blobs which type of storage account is the most appropriate? Both types (general purpose and blob storage) seem to support blobs and in addition to this general purpose accounts allow selecting default or premium performance while blob storage accounts allow only default performance but on the other hand they also allow selecting the access tier (cool or hot).
In the end I find unclear what would be the best option.
A few differences between Blob and General Purpose Storage Accounts:
Blob storage account only support blobs while General purpose storage accounts support blobs, files, queues & tables (some exception apply - please see note about replication below). So if you ever need these additional services, you may want to choose general purpose accounts over blob accounts.
Blob storage account only supports block and append blobs while General purpose storage accounts support block, append & page blobs (some exception apply - please see note about replication below). So if you need to create virtual machines, you would want to choose general purpose accounts over blob accounts.
Blob storage accounts support both Hot and Cool access tier while General purpose storage accounts only support Hot access tier. So if you need to use Cool access tier i.e. use storage primarily for near-shore archiving, you would want to choose blob accounts over general purpose accounts.
You may want to be careful in choosing replication type in general purpose accounts as features offered varies by replication types.
LRS, GRS, RAGRS: Supports everything. Blobs (Block, Append, Page), Files, Queues & Tables.
ZRS: Supports only block blobs and nothing else.
Premium LRS: Supports only page blobs and nothing else.
Microsoft's guidance on this (as of 7/13/2018) recommends using General Purpose v2 (GPv2) Storage Accounts over Blob Storage accounts for two reasons I recently discovered (there might be more):
They offer more flexibility in terms of what you can store and do with them (Queues, Tables, Files and/or Blobs vs just Blobs). Ref: Microsoft Azure Documentation - Azure Storage account options
Microsoft recommends using general-purpose v2 storage accounts over Blob storage accounts for most scenarios.
There are more integration options with GPv2 accounts including Azure Function Triggers via Event Grid. Ref: Microsoft Azure Documentation - Azure Blob storage bindings for Azure Functions
Blob-only storage accounts are supported for blob input and output bindings but not for blob triggers. Blob storage triggers require a general-purpose storage account.
Related
I have some e-mail attachments being saved to Azure Blob.
I am now trying to write a Azure Functions App that would connect to that blob storage, run some scripts and re-save the file.
However, when selecting a storage account for the function, I couldn't select my blob storage account.
I went on the website and it said this:
When creating a function app, you must create or link to a general-purpose Azure Storage account that supports Blob, Queue, and Table storage. Some storage accounts don't support queues and tables. These accounts include blob-only storage accounts and Azure Premium Storage.
I'm wondering, is there any workaround this? and if not, perhaps any other suggestions? I'm becoming a little lost in all the options, and which one to actually choose.
Thanks!
EDIT: Might I add I writing the function Python
I think you are overlooking the fact that you can have multiple storage accounts. In order for an Azure Function to work you need a storage account. That storage account is used to store runtime information of the Azure Function for internal purposes like state management. This storage account is subject to restrictions as you already found out. There is no workaround for that.
However, if the function you are writing needs to access another storage account it is free to do so. You just have to provide details to connect to that specific storage account. In that case you also have a clear seperation between the storage account that is used by the azure function for its internal operations and the storage account your application needs to connect and which you have total control about withouth having to worry that you break things by deleting internal used blobs/tables/queues.
You can have a blob triggered function that gets triggered when changes occur on your specific blob storage. That doesn't need to be the storage account that the azure function internally uses, which is created/selected when creating the azure function.
Here is a sample that shows how to add a blob triggered azure function in Python. MyStorageAccountAppSetting refers to an app setting that holds the connection string to the storage account that you use for storage.
The snippet from the website you are quoting is for storing the function app code itself and any related modules. It does not pertain to what your function can access when the code of your function executes.
When your function executes it will need to use the Azure Blob Storage SDK/modules to connect to your blob storage account and read the email attachments. Here's a quickstart guide for using Azure Storage with Python: Quickstart with Azure Storage Blobs SDK for Python
General-purpose v2 storage accounts support the latest Azure Storage features and incorporate all of the functionality of general-purpose v1 and Blob storage accounts here
There are more integration options with GPv2 accounts including Azure Function Triggers. See: Azure Blob storage bindings for Azure Functions
Further refer: Types of storage accounts
If Blob, based on your need, you can choose an access tier based on the frequency of access for the data (e-mail attachments)Access tiers for Azure Blob Storage - hot, cool, and archive. If General purpose storage account, its standard performance tier.
In my Azure subscription, I have used both Storage Accounts that are of the type BlobStorage and some that say Storage or StorageV2...
I know the difference that my BlobStorage types do NOT support Tables, Files, etc containers.
But, are there other differences that I should be aware of? Is StorageV2 any faster that Blob only storage?
General-purpose v2 storage accounts support the latest Azure Storage
features and incorporate all of the functionality of general-purpose
v1 and Blob storage accounts. General-purpose v2 accounts deliver the
lowest per-gigabyte capacity prices for Azure Storage, as well as
industry-competitive transaction prices.
From the description, the v2 General Storage account which takes the features of the blob storage accounts and combines then with the general storage account, plus tiering. And I think the most important to the custom is the price. Follow this link, there is an exmaple analysing the difference in price of the two type storages.
No real 'nuances', just feature differences as stated in document, new storage you want on V2, older storage you want to migrate to V2 if possible.
Example:
General-purpose V2
Blob Tier: Hot, Cool, Archive
Replication: LRS, ZRS4, GRS, RA-GRS
Resource Manager
General-purpose V1
Blob Tier:N/A
Replication: LRS, GRS, RA-GRS
Resource Manager, Classic
Hope this help.........
I've searched the web and contacted technical support yet no one seems to be able to give me a straight answer on whether items in Azure Blob Storage are backed up or not.
What I mean is, do I need to create a twin storage account as a "backup" and program copies of all content from one storage to another, or are the contents of a client's Blob Storage automatically redundantly backed up by Microsoft?
I know with AWS, storage is redundantly backed up via onsite drives as well as across other nodes in the cluster.
do I need to create a twin storage account as a "backup" and program
copies of all content from one storage to another, or are the contents
of a client's Blob Storage automatically redundantly backed up by
Microsoft?
Yes, you will need to do backup manually. Azure Storage does not back up the contents of your storage account automatically.
Azure Storage does provide geo-redundant replication (provided you configure the redundancy level for your storage account as GRS or RA-GRS) but that is not back up. Once you delete content from your primary account (location, it will automatically be removed from secondary account (geo-redundant location).
Both AWS (EBS) and Azure(Blob Storage) options provides durability by replicating the data across different data centers. This is for the high availability and durability of the data to provide the guarantee by the cloud provider.
In order to ensure that your data is durable, Azure Storage has the
ability to keep (and manage) multiple copies of your data. This is
called replication, or sometimes redundancy. When you set up your
storage account, you select a replication type. In most cases, this
setting can be modified after the storage account is set up.
For more details refer the replication section in documentation.
If you need to capture changes to the storage and allow restore to previous versions (e.g In situations like data corruption or application feature requirements like restore points, backups), you need to take a SnapShot manually. This is common for both AWS and Azure.
For more details on creating a Snapshot of Blob in Azure refer the documentation.
Is it possible to make a blob be able to auto delete after a certain time?
I need to delete my blobs after few hours they were uploaded to azure, I don't need store them more than 10 days.
Not at this time, unfortunately. Using Webjobs or something similar this is something that could be accomplished on top of Azure Storage, but there is nothing offered from the platform itself.
Since March 2019, this is possible with Lifecycle management support in Azure Blob Storage. See https://stackoverflow.com/a/57305518/347805
Azure Blob storage lifecycle management offers a rich, rule-based
policy for GPv2 and Blob storage accounts. Use the policy to
transition your data to the appropriate access tiers or expire at the
end of the data's lifecycle.
The lifecycle management policy lets you:
Transition blobs to a cooler storage tier (hot to cool, hot to archive, or cool to archive) to optimize for performance and cost
Delete blobs at the end of their lifecycles
Define rules to be run once per day at the storage account level Apply rules to containers or a subset of blobs (using prefixes as filters)
In short, it is NOT POSSIBLE to make a blob auto-delete after a certain time by any setting/configuration on the blob itself in Azure at this time.
You will need to rely on other services such as Azure WebJobs or Azure Automation to automate such task.
In Azure, when creating a VM, you can choose to "Use an automatically generated storage account".
Which, generates a geo-replicated storage account.
Can I create a zone replicated storage account and use that? I have tried, but it does not show up in the list.
What are the prerequisites for a storage account to be usable for the VM's?
Is this documented somewhere?
I think the only restriction is that you can't choose a ZRS account to be used by VMs because VMs are essentially stored as Page blobs in blob storage and ZRS account only support block blobs. From the storage team blog here - http://blogs.msdn.com/b/windowsazurestorage/archive/2014/08/01/introducing-zone-redundant-storage.aspx:
As you can see, these options provide a continuum of durability and
availability options. ZRS fits between LRS and GRS in terms of
durability and price. ZRS stores 3 replicas of your data across 2 to 3
facilities. It is designed to keep all 3 replicas within in a single
region, but may span across two regions. ZRS currently only supports
block blobs. ZRS allows customers to store blob at a higher durability
than a single facility can provide with LRS. ZRS accounts do not have
metrics or logging capability enabled at this time.