Azure BLOB storage phantom requests - azure

I see strange requests when uploading blobs to storage. The only methods I use is PutBlob and SetBlobTier. But metrics shows large amount of GetBlobProperties requests with time interval about 1 hour. It seems like Azure makes some extra requests for statistic purposes. It happens only when uploading process is running. At attached diagram you can see 4 peaks of GetBlobProperties requests.
Does anybody know what is it? Another question is, will I be billed for this requests?

Any API call is considered to be a transaction:
Transactions – Each individual Blob, Table and Queue REST request to the storage service is considered as a potential transaction for billing. Applications can then control their transaction costs by controlling how often and how many requests they send to the storage service. We analyze each request received and then classify it as billable or not billable based upon our ability to process the request and the request’s outcome.
For that specific transaction, my guess is you're using Blob Archive(correct me if i'm wrong), the prices are as below:
It will be considered as All other Operations(per 10,000) pricing.
What could be causing it?
Due to a bug in our system, some internal transactions are miscategorized as customer requests when blobs transition between cold and archive storage tiers. These unexpected transactions will also be visible in both Azure Monitor as well as Storage Analytics logging. We are working on a fix for this and will deploy it as soon as possible. We apologize for any inconvenience and confusion this may have caused.

Related

Handling long-running tasks using Azure Functions and Azure Storage

I want to use Azure to handle long running tasks that can’t be handled solely by a web server as they exceed the 2 min HTTP limit (and would put unnecessary load on it regardless). In this case, it’s the generation of a PDF report that can take some time (between 2-5 mins). I’ve seen examples of solutions for this using other technologies (Celery, RabbitMQ, AWS Lamda, etc.) but not much using what's available on Azure (Functions and Storage in this case).
Details of my proposed solution are as follows (a diagram is here)
API (that has 3 endpoints):
Generate report – post a message to Azure Queue Storage
Get report generation status – query Azure Table Storage for status
Get report – retrieve PDF from Azure Blob Storage
Azure Queue Storage
Receives a message from the API containing parameters of the requested report
Azure Function
Triggered when a message is added to Azure Queue Storage
Creates report generation status record in Azure Table storage, set to ‘Generating’
Generate a report based on parameters contained in the message
Stores output PDF in Azure Blob Storage
Updates report generation status record in Azure Table storage to ‘Completed’
Azure Table Storage
Contains a table of report generation requests and associated status
Azure Blob Storage
Stores PDF reports
Other points
The app isn’t built yet – so there is no base case I’m comparing against (e.g. Celery/RabbitMQ)
The time it takes to run the report isn’t super important (i.e. I’m not concerned about Azure Function cold starts)
There’s no requirement for immediate notification of completion using something like Webhooks – the client will poll the API every so often using the get report generation endpoint.
There won't be much usage of the app, so having an always active server to handle tasks (vs Azure Function) seems to be a waste of money.
If I find that report generation takes longer than 10 mins, I can split it up into more than one Azure Function (to avoid consumption plan hard limit of 10 mins execution time per function)
My question is essential whether or not this is a good approach (to me it seems good, and relatively cost-effective, I’m just not sure if there’s something I’m missing).
This can be simplified using Durable Functions. Most of the job is already handled by the framework and you also can query an endpoint to check for the completion status.
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=csharp

Fastest Azure Blob Storage Replication method?

We have a situation where we now need to replicate all data from our Blob Storage account to another node in the same rough region (think WE Europe to NE Europe).
Currently we have only 1 application that writes content to certain containers, and other applications that write to other containers.
I'd ideally like to replicate all of them.
I've looked into the following:
Azure Blob Storage Geo-replication.
Azure Event Grid
Just writing the blobs in other applications themselves.
The issue with 1. is that the delivery time of the updated blob is not guaranteed. This could take upto 5 minutes, for example, which is too long in our use case. This is defined in their SLA.
The issue with 2. is that there is no guarantee of delivery time - only that delivery will take place at least once, whether it's to the intended destination, or a deadletter queue.
The issue with 3. is latency, cost & having to reengineer a few applications at once.
Expanding on 3. - we have an application that listens to messages published to Azure Service Bus. It acts upon those messages and creates blobs based off data that is stored in a DB on-site. It then uploads the blobs to the specific container it pertains to, and publshes a message saying it has done so.
The problem here is that we'd need to write from one data center, to another, incurring bandwidth charges, and ask this application to do double the work.
We would also need to ensure that blobs are written, and implement a method to ensure consistency - i.e. if one location goes down, we need to be able to replicate once it's back up.
So - what's the fastest replication method here? I'm assuming it's number 3. However, could there be another solution that has not been thought of so far?
Edit: 4. Have the generated files sent to Service Bus as messages and be picked up by applications dedicated to write, hosted in each geo-location.
Issue here is limitations on message size (makes sense!) we have files over 1MB.

"Storage account management operations" and ClientThrottlingError

According to Storage account scale limits each storage account in Azure can handle 20.000 requests per second.
But there is also Storage resource provider scale limits that restricts Storage account management operations (read) to 800 request per 5 minutes.
We seem to have reached the latter limit, and we are wondering what kind of operations are counted as Storage account management operations. We got a few minutes with intermittent 503 responses in our production system this morning, having 2600 GetBlob operations in 5 minutes.
Which operations count as Storage account management operations?
Does it matter whether we use BlobClient from the blob storage SDK, or HttpClient from .NET?
How do we read blob properties and metadata, and download blobs to (possibly) achieve 20.000 requests per second?
Are there any other ideas on what can lead to throttling when the load isn't that high altogether?
UPDATE:
After communication with Microsoft support (the proper ones...), they could inform us of the following:
The type of throttling you experienced is a partition throttling error. This type of error occurs when the client does too many requests against the same partition server. When such happens and the partition server gets overloaded, it does internal load balancing operations as part of the normal azure storage healing process.
When the partition being accessed suffers a load balancing operation (reassigning partitions to less loaded servers), the storage service returns 500 or 503 errors.
The limits I previously mentioned (the 800 reads for 5 minutes) are indeed for management operations and not for data ones. In your case, the GetBlob ones are data operations and are not covered by these hard limits. After analyzing the ingress/egress limit and also the transactions per second of your storage account, I verified that you also seem to be far away from hitting the threshold.
Just for the record and improved searchability: In Metrics these errors showed up as ClientOtherError and ClientThrottlingError.
Which operations count as Storage account management operations?
All the operations listed here are considered as storage account management operations. Essentially the operations you perform on managing the storage account themselves (and not the data in them) are considered as management operations.
Does it matter whether we use BlobClient from the blob storage SDK, or
HttpClient from .NET?
No. These operations deal with the data and not considered as part of management operations. These operations have separate throughput limit.
How do we read blob properties and metadata, and download blobs to
(possibly) achieve 20.000 requests per second?
Please see answer to previous question.

MS Azure Storage transactions

I'm confused with Storage transactions in Azure Storage for page bolb, so what is mean Storage transactions in Azure Storage.
I want to use to upload all of my website image to their, so I do not know how many transaction i need.
so I i have 1000 photos in my website and every day 1000 gusts visit my website how many Storage transactions required to retrieve from Azure storage to client.
so what about Bandwidth? most of my big file will be in Azure storage only small amount of html and css will be in Azure Web App Storage so I do not know that I need to buy bandwidth or not required since most of load come to Storage transactions.
The first area we would want to cover for transactions is what equals 1 transaction to Windows Azure Storage. Each and every REST call to Windows Azure Blobs, Tables and Queues counts as 1 transaction (whether that transaction is counted towards billing is determined by the billing classification discussed later in this posting). The REST calls are detailed here:
Blobs
Table
Queues
Each one of the above REST calls counts as 1 transaction. This includes the following types of requests:
Query/List Requests and Continuation Tokens – A Table Query, and Listing Blob Containers, Tables and Queues can return continuation tokens. This means that the query/listing must be continued to complete it. As described above, each REST request to the storage service counts as 1 transaction. Therefore, each continuation of the query/list counts as an additional 1 transaction, since it is another REST request to the storage service.
For more information refer: https://blogs.msdn.microsoft.com/windowsazurestorage/2010/07/08/understanding-windows-azure-storage-billing-bandwidth-transactions-and-capacity/

Availability of Azure storage account

This page says that the availability of an azure storage account is computed as (billable requests)/(total requests). Billable requests mean all the requests excluding those which experienced anonymous failures (except network errors), throttled requests, server timeout errors and unknown errors.
Now,what I see on the azure portal for my storage account is a straight line continuously at 100%, meaning that the account is available at 100% availability continuously. The line is without any break which means that the availability is being calculated continuously.
I know for sure that I don't throw requests to the storage account continuously. Then, how is this metric calculated for times when there are no requests?
Additionally, even a slight drop in storage availability means that some requests failed due to some server side issues. How can we ensure that these failed requests are retried and they pass?
When there isn't any incoming request, the availability is 100%. If your request encounters server side failures, you should retry the request via your code explicitly (in .NET client library, you can easily leverage RetryPolicy. See more details about RetryPolicy here).

Resources