We are trying to move some data from one of our blob storage accounts and we are getting throttled.
Initially, we were getting 9gbps but soon after we got throttled down to 1.1gbps.
We also started receiving errors saying that Azure forcibly closed the connection and we were getting network timeouts.
Has anyone experienced this or have any knowledge around increasing limits?
According to the offical document Storage limits of Azure subscription and service limits, quotas, and constraints, there are some limits about your scenario which can not around as below.
Maximum request rate1 per storage account: 20,000 requests per second
Max egress:
for general-purpose v2 and Blob storage accounts (all regions): 50 Gbps
for general-purpose v1 storage accounts (US regions): 20 Gbps if RA-GRS/GRS enabled, 30 Gbps for LRS/ZRS 2
for general-purpose v1 storage accounts (Non-US regions): 10 Gbps if RA-GRS/GRS enabled, 15 Gbps for LRS/ZRS 2
Target throughput for single blob: Up to 60 MiB per second, or up to 500 requests per second
Considering for download data to local environment, except your network bandwidth and stablity, you have to compute the max concurrent number of requests per blob not over 500 and the total number of all requests not over 20,000 if you want to move data programmatically. So it's the key point for high concurrency controll.
If just move data inside Azure or not by programming, the best way is to use the offical transfer data tool AzCopy(for Windows or Linux) and Azure Data Factory. Then you will not need to consider for these limits and just wait for the move progress done.
Any concern, please feel free to let me know.
Related
According to Storage account scale limits each storage account in Azure can handle 20.000 requests per second.
But there is also Storage resource provider scale limits that restricts Storage account management operations (read) to 800 request per 5 minutes.
We seem to have reached the latter limit, and we are wondering what kind of operations are counted as Storage account management operations. We got a few minutes with intermittent 503 responses in our production system this morning, having 2600 GetBlob operations in 5 minutes.
Which operations count as Storage account management operations?
Does it matter whether we use BlobClient from the blob storage SDK, or HttpClient from .NET?
How do we read blob properties and metadata, and download blobs to (possibly) achieve 20.000 requests per second?
Are there any other ideas on what can lead to throttling when the load isn't that high altogether?
UPDATE:
After communication with Microsoft support (the proper ones...), they could inform us of the following:
The type of throttling you experienced is a partition throttling error. This type of error occurs when the client does too many requests against the same partition server. When such happens and the partition server gets overloaded, it does internal load balancing operations as part of the normal azure storage healing process.
When the partition being accessed suffers a load balancing operation (reassigning partitions to less loaded servers), the storage service returns 500 or 503 errors.
The limits I previously mentioned (the 800 reads for 5 minutes) are indeed for management operations and not for data ones. In your case, the GetBlob ones are data operations and are not covered by these hard limits. After analyzing the ingress/egress limit and also the transactions per second of your storage account, I verified that you also seem to be far away from hitting the threshold.
Just for the record and improved searchability: In Metrics these errors showed up as ClientOtherError and ClientThrottlingError.
Which operations count as Storage account management operations?
All the operations listed here are considered as storage account management operations. Essentially the operations you perform on managing the storage account themselves (and not the data in them) are considered as management operations.
Does it matter whether we use BlobClient from the blob storage SDK, or
HttpClient from .NET?
No. These operations deal with the data and not considered as part of management operations. These operations have separate throughput limit.
How do we read blob properties and metadata, and download blobs to
(possibly) achieve 20.000 requests per second?
Please see answer to previous question.
If I am creating an Azure Storage Account v2 then what is the maximum capacity of (or maximum size) of files we can store in the blob storage? I see some docs talking about 500 TB as the limit. Does that mean once the storage account reaches that 500 TB limit then it will stop accepting the uploads? Or is there a way to store more files by paying more?
It depends on the region. According to https://learn.microsoft.com/en-us/azure/azure-subscription-service-limits#storage-limits US and Europe can have up to 2PB Storage accounts. All other regions are 500TB. As mentioned by Alfred below, you can request an increase if you need to (see new max sizes here https://azure.microsoft.com/en-us/blog/announcing-larger-higher-scale-storage-accounts/)
I have yet to see a storage account hit the limit, but I would anticipate you would hit an error trying to upload a file at max capacity. I would advise designing your application to make use of multiple storage accounts to avoid hitting this limit (if you are expecting to exceed 500TB).
you can ask support to increase
https://azure.microsoft.com/en-us/blog/announcing-larger-higher-scale-storage-accounts/
I am confused about Azure Bandwidth outbound data-transfer pricing. The official website says that the First 5 GB/Month is free.
Suppose I have used 5 GB in January, then in February will it get reset and restart counting of 5 GB again? Are first 5 GB free in every month? Are these bandwidths free irrespective of resource i.e Virtual machines, App Services etc?
Yes, the first 5 GB of Outbound Data Transfer is free each month. This means any and all* outbound traffic from your Azure Resources.
So either when you're downloading data from Azure Storage, have a (data intensive) app running in a VM sending out a lot of data or download Azure SQL Database backups every night: you're consuming outbound data.
Please be advised that data going out of the Azure Region is counted as outbound traffic. So data that you copy between Azure Regions is counted as outbound data.
*As you can see in the article you shared:
Bandwidth refers to data moving in and out of Azure data centers other than those explicitly covered by the Content Delivery Network or ExpressRoute pricing.
In the documentation for Virtual Machine Scale Sets it says
Spread out the first letters of storage account names as much as possible
I have two questions to this:
Why should you use multiple Storage Accounts at all?
Why is Azure creating 5 Storage Accounts if I create a new Virtual Machine Scale Set through portal?
Why should I spread the first letters as much as possible?
The answer to this lies in the limits of Azure. If you look at the storage limits specifically, you will find that the storage account is capped at 20k IOPS.
Total Request Rate (assuming 1KB object size) per storage account
Up to 20,000 IOPS, entities per second, or messages per second
So that means that your Scale Set would effectively be capped at 20k IOPS, no matter how many VM's you put in it.
As for the storage Account naming, I have no clue, but looking at the templates they are linking to, they are not doing it:
"uniqueStringArray": [
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '0')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '1')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '2')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '3')))]",
"[concat(uniqueString(concat(resourceGroup().id, variables('newStorageAccountSuffix'), '4')))]"
],
I suspect, this may be somehow linked to how the storage accounts are distributed among nodes hosting them (so say accounts starting with 'A' are all hosted on the same cluster or near by clusters).
It's about avoiding throttling
https://learn.microsoft.com/en-us/azure/storage/storage-scalability-targets
For standard storage accounts: A standard storage account has a
maximum total request rate of 20,000 IOPS. The total IOPS across all
of your virtual machine disks in a standard storage account should not
exceed this limit.
You can roughly calculate the number of highly utilized disks
supported by a single standard storage account based on the request
rate limit. For example, for a Basic Tier VM, the maximum number of
highly utilized disks is about 66 (20,000/300 IOPS per disk), and for
a Standard Tier VM, it is about 40 (20,000/500 IOPS per disk), as
shown in the table below.
There's is no price difference between 5 or 1 storage accounts, so why not 5?
If you create 5 SA in different Storage Rack/Stomp (Datacenter infrastructure) you have less chance to be throttled, and they have better chance to distribute traffic load. So I think those are the reasons
I'm currently working out the cost-analysis for my upcoming Azure project. I am tempted to use a Azure Cloud Role, because it has a certain amount of storage included in the offer. However, I have the feeling that it is too good to be true.
Therefore, I was wondering. Do you have to pay transaction-costs/ storage costs on this "included" storage? I can't find any information about this on the Azure website, and I want to be as accurate as possible (even if the cost of transactions is almost nothing).
EDIT:
To clarify, I specifically want to know about the transaction costs on the storage. Do you have to pay a small cost per transaction on the storage (like with Blob/Table storage), or is this included in the offer as well?
EDIT 2:
I am talking about the storage included with the Cloud Services (web/worker) and not a separate Table/blob storage.
Can you clarify which offer you're referring to?
With Cloud Services (web/worker roles), each VM instance has some local storage associated with it, which is free of charge and, because it's a local disk, there are no transactions or related fees associated with this storage. As Rik pointed out in his answer, that data is not durable: it's on a single disk and will be gone forever if, say, the disk crashes.
If you're storing data in Blobs, Tables, or Queues (Windows Azure Storage), then you pay per GB ($0.095 cents per GB per month for geo-redundant storage, or $0.07 per GB per month for locally-redundant storage), and a penny per 100,000 transactions. And as long as your storage account is in the same data center as your Cloud Service, there's no data egress fees.
Now we come back to the question of which offer you're referring to. The free 90-day trial, for instance, comes with 70GB of Windows Azure Storage, and 50M transactions monthly included. MSDN subscriptions come with included storage and transactions as well. If you're just working with a pay-as-you-go subscription, you'll pay for storage plus transactions.
The storage is included, but not guaranteed to be persistent. Your role could be shut down and started on a different physical location, which has no impact on the availability of your role, but you'll lose your whatever you have in storage, I.E. the included storage is very much temporary.
As for transaction costs, you only pay for outgoing data, not incoming data or data within Azure (one role to another).
You pay per GB, and $0,01 per 100.000 transactions