What is the maximum capacity of Azure Blob Storage account? - azure

If I am creating an Azure Storage Account v2 then what is the maximum capacity of (or maximum size) of files we can store in the blob storage? I see some docs talking about 500 TB as the limit. Does that mean once the storage account reaches that 500 TB limit then it will stop accepting the uploads? Or is there a way to store more files by paying more?

It depends on the region. According to https://learn.microsoft.com/en-us/azure/azure-subscription-service-limits#storage-limits US and Europe can have up to 2PB Storage accounts. All other regions are 500TB. As mentioned by Alfred below, you can request an increase if you need to (see new max sizes here https://azure.microsoft.com/en-us/blog/announcing-larger-higher-scale-storage-accounts/)
I have yet to see a storage account hit the limit, but I would anticipate you would hit an error trying to upload a file at max capacity. I would advise designing your application to make use of multiple storage accounts to avoid hitting this limit (if you are expecting to exceed 500TB).

you can ask support to increase
https://azure.microsoft.com/en-us/blog/announcing-larger-higher-scale-storage-accounts/

Related

"Storage account management operations" and ClientThrottlingError

According to Storage account scale limits each storage account in Azure can handle 20.000 requests per second.
But there is also Storage resource provider scale limits that restricts Storage account management operations (read) to 800 request per 5 minutes.
We seem to have reached the latter limit, and we are wondering what kind of operations are counted as Storage account management operations. We got a few minutes with intermittent 503 responses in our production system this morning, having 2600 GetBlob operations in 5 minutes.
Which operations count as Storage account management operations?
Does it matter whether we use BlobClient from the blob storage SDK, or HttpClient from .NET?
How do we read blob properties and metadata, and download blobs to (possibly) achieve 20.000 requests per second?
Are there any other ideas on what can lead to throttling when the load isn't that high altogether?
UPDATE:
After communication with Microsoft support (the proper ones...), they could inform us of the following:
The type of throttling you experienced is a partition throttling error. This type of error occurs when the client does too many requests against the same partition server. When such happens and the partition server gets overloaded, it does internal load balancing operations as part of the normal azure storage healing process.
When the partition being accessed suffers a load balancing operation (reassigning partitions to less loaded servers), the storage service returns 500 or 503 errors.
The limits I previously mentioned (the 800 reads for 5 minutes) are indeed for management operations and not for data ones. In your case, the GetBlob ones are data operations and are not covered by these hard limits. After analyzing the ingress/egress limit and also the transactions per second of your storage account, I verified that you also seem to be far away from hitting the threshold.
Just for the record and improved searchability: In Metrics these errors showed up as ClientOtherError and ClientThrottlingError.
Which operations count as Storage account management operations?
All the operations listed here are considered as storage account management operations. Essentially the operations you perform on managing the storage account themselves (and not the data in them) are considered as management operations.
Does it matter whether we use BlobClient from the blob storage SDK, or
HttpClient from .NET?
No. These operations deal with the data and not considered as part of management operations. These operations have separate throughput limit.
How do we read blob properties and metadata, and download blobs to
(possibly) achieve 20.000 requests per second?
Please see answer to previous question.

Estimating Azure Search Cost

I'm fairly new to Azure platform and need some help with cost estimates for Azure Search service.
Every month we will have about 500GB worth of files that will dropped into Azure Blob Storage. We would like to index these files using Azure Search just based on the file names.
When I look at the Standard S2 pricing, it has the following text for storage:
100 GB/partition (max 1.2 TB documents per service). What does this mean? Does it mean that once my storage crosses 1.2TB, I'll need to purchase another Service?
Any help would be greatly appreciated.
If a tier's capacity turns out to be too low based on your needs, you will need to provision a new service at the higher tier and then reload your indexes. Kindly note that there is no in-place upgrade of the same service from one SKU to another.
Storage is constrained by disk space or by a hard limit on the maximum number of indexes, document, or other high-level resources, whichever comes first.
A service is provisioned at a specific tier. Jumping tiers to gain capacity involves provisioning a new service (there is no in-place upgrade). For more information, see Choose a SKU or tier. To learn more about adjusting capacity within a service you've already provisioned, see Scale resource levels for query and indexing workloads.
Check the document for more details on this topic.
S2 offers the following (at this time) - Storage per partition = 100 GB, with Partitions per service = 12 and, the Partition size = 100 GB.
You could use Pricing Calculator (https://azure.microsoft.com/pricing/calculator/) for the cost estimation as well.
The storage limits for Azure Search refers to the size of documents in the index, which can be bigger or smaller than your original blobs depending on your use case.
For example, if you want to do text searches on blob content then the index size will be bigger than your original blobs. Whereas if you only want to search for file names, then size of the original blobs becomes irrelevant and only the number of blobs will affect your Azure Search index size.

Azure blob storage throttling

We are trying to move some data from one of our blob storage accounts and we are getting throttled.
Initially, we were getting 9gbps but soon after we got throttled down to 1.1gbps.
We also started receiving errors saying that Azure forcibly closed the connection and we were getting network timeouts.
Has anyone experienced this or have any knowledge around increasing limits?
According to the offical document Storage limits of Azure subscription and service limits, quotas, and constraints, there are some limits about your scenario which can not around as below.
Maximum request rate1 per storage account: 20,000 requests per second
Max egress:
for general-purpose v2 and Blob storage accounts (all regions): 50 Gbps
for general-purpose v1 storage accounts (US regions): 20 Gbps if RA-GRS/GRS enabled, 30 Gbps for LRS/ZRS 2
for general-purpose v1 storage accounts (Non-US regions): 10 Gbps if RA-GRS/GRS enabled, 15 Gbps for LRS/ZRS 2
Target throughput for single blob: Up to 60 MiB per second, or up to 500 requests per second
Considering for download data to local environment, except your network bandwidth and stablity, you have to compute the max concurrent number of requests per blob not over 500 and the total number of all requests not over 20,000 if you want to move data programmatically. So it's the key point for high concurrency controll.
If just move data inside Azure or not by programming, the best way is to use the offical transfer data tool AzCopy(for Windows or Linux) and Azure Data Factory. Then you will not need to consider for these limits and just wait for the move progress done.
Any concern, please feel free to let me know.

Azure: Why is it advised to use multiple storage accounts for a virtual machine scale set?

In the documentation for Virtual Machine Scale Sets it says
Spread out the first letters of storage account names as much as possible
I have two questions to this:
Why should you use multiple Storage Accounts at all?
Why is Azure creating 5 Storage Accounts if I create a new Virtual Machine Scale Set through portal?
Why should I spread the first letters as much as possible?
The answer to this lies in the limits of Azure. If you look at the storage limits specifically, you will find that the storage account is capped at 20k IOPS.
Total Request Rate (assuming 1KB object size) per storage account
Up to 20,000 IOPS, entities per second, or messages per second
So that means that your Scale Set would effectively be capped at 20k IOPS, no matter how many VM's you put in it.
As for the storage Account naming, I have no clue, but looking at the templates they are linking to, they are not doing it:
"uniqueStringArray": [
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '0')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '1')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '2')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '3')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '4')))]"
],
I suspect, this may be somehow linked to how the storage accounts are distributed among nodes hosting them (so say accounts starting with 'A' are all hosted on the same cluster or near by clusters).
It's about avoiding throttling
https://learn.microsoft.com/en-us/azure/storage/storage-scalability-targets
For standard storage accounts: A standard storage account has a
maximum total request rate of 20,000 IOPS. The total IOPS across all
of your virtual machine disks in a standard storage account should not
exceed this limit.
You can roughly calculate the number of highly utilized disks
supported by a single standard storage account based on the request
rate limit. For example, for a Basic Tier VM, the maximum number of
highly utilized disks is about 66 (20,000/300 IOPS per disk), and for
a Standard Tier VM, it is about 40 (20,000/500 IOPS per disk), as
shown in the table below.
There's is no price difference between 5 or 1 storage accounts, so why not 5?
If you create 5 SA in different Storage Rack/Stomp (Datacenter infrastructure) you have less chance to be throttled, and they have better chance to distribute traffic load. So I think those are the reasons

Azure Cloud Role included storage = extra costs?

I'm currently working out the cost-analysis for my upcoming Azure project. I am tempted to use a Azure Cloud Role, because it has a certain amount of storage included in the offer. However, I have the feeling that it is too good to be true.
Therefore, I was wondering. Do you have to pay transaction-costs/ storage costs on this "included" storage? I can't find any information about this on the Azure website, and I want to be as accurate as possible (even if the cost of transactions is almost nothing).
EDIT:
To clarify, I specifically want to know about the transaction costs on the storage. Do you have to pay a small cost per transaction on the storage (like with Blob/Table storage), or is this included in the offer as well?
EDIT 2:
I am talking about the storage included with the Cloud Services (web/worker) and not a separate Table/blob storage.
Can you clarify which offer you're referring to?
With Cloud Services (web/worker roles), each VM instance has some local storage associated with it, which is free of charge and, because it's a local disk, there are no transactions or related fees associated with this storage. As Rik pointed out in his answer, that data is not durable: it's on a single disk and will be gone forever if, say, the disk crashes.
If you're storing data in Blobs, Tables, or Queues (Windows Azure Storage), then you pay per GB ($0.095 cents per GB per month for geo-redundant storage, or $0.07 per GB per month for locally-redundant storage), and a penny per 100,000 transactions. And as long as your storage account is in the same data center as your Cloud Service, there's no data egress fees.
Now we come back to the question of which offer you're referring to. The free 90-day trial, for instance, comes with 70GB of Windows Azure Storage, and 50M transactions monthly included. MSDN subscriptions come with included storage and transactions as well. If you're just working with a pay-as-you-go subscription, you'll pay for storage plus transactions.
The storage is included, but not guaranteed to be persistent. Your role could be shut down and started on a different physical location, which has no impact on the availability of your role, but you'll lose your whatever you have in storage, I.E. the included storage is very much temporary.
As for transaction costs, you only pay for outgoing data, not incoming data or data within Azure (one role to another).
You pay per GB, and $0,01 per 100.000 transactions

Resources