Azure functions - Unexplained storage account cost related to files - azure

We are making use of Azure Functions (v2) extensively to fulfill a number of business requirements.
We have recently introduced a durable function to handle a more complex business process which includes both fanning out, as well as a chain of functions.
Our problem is related to how much the storage account is being used. I made a fresh deployment on an account we use for dev testing on Friday, and left the function idling over the weekend to monitor what happens. I also set a budget to alert me if the cost start shooting up.
Less than 48 hours later, I received an alert that I was at 80% of my budget, and saw how the storage account was single handedly responsible for the entire bill. The most baffling part is, that it's mostly egress and ingress on file storage, which I'm entirely not using in the application! So it must be something internal by the azure function implementations. I've dug around and found this. In this case the issue seems to have been solved by switching to an App Service plan, but this is not an option in our case and must stick to consumption. I also double checked and made sure that I don't have the AzureWebJobsDashboard setting.
Any ideas what we can try next?
The below are some interesting charts from the storage account. Note how file egress and ingress makes up most of the activity on the entire account.
A ticket for this issue has also been opened on GitHub

The link you provided actually points to AzureWebJobsDashboard as the culprit. AzureWebJobsDashboard is an optional storage account connection string for storing logs and displaying them in the Monitor tab in the portal. The storage account must be a general-purpose one that supports blobs, queues, and tables.
For performance and experience, it is recommended to use
APPINSIGHTS_INSTRUMENTATIONKEY and App Insights for monitoring instead
of AzureWebJobsDashboard
When creating a function app in App Service, you must create or link to a general-purpose Azure Storage account that supports Blob, Queue, and Table storage. Internally, Functions uses Storage for operations such as managing triggers and logging function executions. Some storage accounts do not support queues and tables, such as blob-only storage accounts, Azure Premium Storage, and general-purpose storage accounts with ZRS replication. These accounts are filtered out of from the Storage Account blade when creating a function app.
When using the Consumption hosting plan, your function code and
binding configuration files are stored in Azure File storage in the
main storage account. When you delete the main storage account, this
content is deleted and cannot be recovered.

If you use the legacy "General Purpose V1" storage accounts, you may see your costs drop by up to 95%. I had a similar use case where my storage account costs exploded after the accounts were upgraded to "V2". In my case, we just went back to V1 instead of changing our application.
Altough V1 is now legacy, I don't see Azure dropping it any time soon. You can still create it using the Azure Portal. Could be a medium-term solution.
Some alternatives to save costs:
Try the "premium" performance tier (V2 only). It is cheaper for such workloads.
Try LRS or ZRS as the redundancy setting. Depends on the criticality of this orchestration data.
PS: Our use case were some EventHub processors which used the storage accounts for coordination and checkpointing.
PS2: Regardless of the storage account configuration, there must be a way reduce the traffic towards the storage account. It is just another thing to try to reduce costs.

Related

Dedicated or shared Storage Account for Azure Function Apps with the names less than 32 symbols

Short Version
We want to migrate to v4 and our app names are less than 32 symbols.
Should we migrate to dedicated Storage Accounts or not?
Long Version
We use Azure Functions v3. From start one Storage Account was shared between 10+ Azure Function Apps. It could be by luck but the names are less than 32 symbols and it is not going to change. We are not using slots as they were initially not recommended and then with no adoption time or recommendation made generally available.
Pre-question research revealed this question but it looks like more related to the durable functions. Another question looks more up the point but outdated and the accepted answer states that one Storage Account can be used.
Firstly, the official documentation has a page with storage considerations and it states (props to ijabit for pointing to it.):
It's possible for multiple function apps to share the same storage account without any issues. For example, in Visual Studio you can develop multiple apps using the Azure Storage Emulator. In this case, the emulator acts like a single storage account. The same storage account used by your function app can also be used to store your application data. However, this approach isn't always a good idea in a production environment.
Unfortunately it does not elaborate further on the rationale behind the last sentence.
The page with best practices for Azure Function mentions:
To improve performance in production, use a separate storage account for each function app. This is especially true with Durable Functions and Event Hub triggered functions.
To my greater confusion there was a subsection on this page that said "Avoid sharing storage accounts". But it was later removed.
This issue is somehow superficially related to the question as it mentions the recommendation in the thread.
Secondly, we had contacted Azure Support for different not-related to this question issues and the two different support engineers shared different opinions on the current issue. One said that we can share a Storage Account among Functions Apps and another one said that we should not. So the recommendation from the support was mixed.
Thirdly, we want to migrate to v4 and in the migration notes it is stated:
Function apps that share storage accounts will fail to start if their computed hostnames are the same. Use a separate storage account for each function app. (#2049)
Digging deeper into the topic, the only issue is the collision of the function host names that are used to obtain the lock that was known even in Oct 2017. One can follow the thread and see how in Jan 2020 the recommendation was made to update the official Azure naming recommendation but it was made only on late Nov 2021. I also see that a non-intrusive, i.e. without renaming, solution is to manually set the host id. The two arguments raised by balag0 are: single point of failure and better isolation. They sound good from the perspective of cleaner architecture but pragmatically I personally find Storage Accounts reliable, especially if read about redundancy or consider that MS is dog-fooding it for other services. So it looks more like a backbone of Azure for me.
Finally, as we want to migrate to v4, should we migrate to dedicated Storage Accounts or not?
For the large project with 30+ Azure Functions I work on, we have gone with dedicated Storage Accounts. The reason why is Azure Storage account service limits. As the docs mention, this really comes into play with Durable Task Functions, but can also come into play in other high volume scenarios. There's a hard limit of 20k requests per second for a Storage Account. Hit that limit, and requests will fail and will return HTTP 429 responses. This means that your Azure Function invocation will fail too. We're running some high-volume scenarios and ran into this.
It can also cause problems with Durable Task Functions if two functions have the same TaskHub ID in host.json. This causes a collision when Durable Task Framework does its internal bookkeeping using Storage Queues and Table Storage, and there's lots of pain and agony as things fail in spectacular fashion.
Note that the 20k requests per second service limit can be raised with a support ticket to Azure. If approved, the max they'll raise it to is 50k requests/second.
So avoid the potential headaches and go with a Storage Account per Function.

How can I find the source of my Hot LRS Write Operations on Azure Storage Account?

We are using an Azure Storage account to store some files that shall be downloaded by our app on the users demand.
Even though there should be no write operations (at least none I could think of), we are exceeding the included write operations just some days into the billing period (see image).
Regarding the price it's still within limits, but I'd still like to know whether this is normal and how I can analyze the matter. Besides the storage we are using
Functions and
App Service (mobile app)
but none of them should cause that many write operations. I've checked the logs of our functions and none of those that access the queues or the blobs have been active lately. There are are some functions that run every now and then, but only once every few minutes and those do not access the storage at all.
I don't know if this is related, but there is a kind of periodic ingress on our blob storage (see the image below). The period is roundabout 1 h, but there is a baseline of 100 kB per 5 min.
Analyzing the metrics of the storage account further, I found that there is a constant stream of 1.90k transactions per hour for blobs and 1.3k transactions per hour for queues, which seems quite exceptional to me. (Please not that the resolution of this graph is 1 h, while the former has a resolution of 5 minutes)
Is there anything else I can do to analyze where the write operations come from? It kind of bothers me, since it does not seem as if it's supposed to be like that.
I 've had the exact same problem; after enabling Storage Analytics and inspecting the $logs container I found many log entries that indicate that upon every request towards my Azure Functions, these write operations occur against the following container object:
https://[function-name].blob.core.windows.net:443/azure-webjobs-hosts/locks/linkfunctions/host?comp=lease
In my Azure Functions code I do not explicitly write in any of container or file as such but I have the following two Application Settings configured:
AzureWebJobsDashboard
AzureWebJobsStorage
So I filled a support ticker in Azure with the following questions:
Are the write operation triggered by these application settings? I
believe so but could you please confirm.
Will the write operation stop if I delete these application settings?
Could you please describe, in high level, in what context these operations occur (e.g. logging? resource locking, other?)
and I got the following answers from Azure support team, respectively:
Yes, you are right. According to the logs information, we can see “https://[function-name].blob.core.windows.net:443/azure-webjobs-hosts/locks/linkfunctions/host?comp=lease”.
This azure-webjobs-hosts folder is associated with function app and it’s created by default as well as creating function app. When function app is running, it will record these logs in the storage account which is configured with AzureWebJobsStorage.
You can’t stop the write operations because these operations record necessary logs to storage account used by Azure Functions runtime. Please do not remove application setting AzureWebJobsStorage. The Azure Functions runtime uses this storage account connection string for all functions except for HTTP triggered functions. Removing this Application Settings will cause your function app unable to start. By the way, you can remove AzureWebJobsDashboard and it will stop Monitor rather than the operation above.
These operations is to record runtime logs of function app. These operations will occur when our backend allocates instance for running the function app.
Best place to find information about storage usage is to make use of Storage Analytics especially Storage Analytics Logging.
There's a special blob container called $logs in the same storage account which will have detailed information about every operation performed against that storage account. You can view the blobs in that blob container and find the information.
If you don't see this blob container in your storage account, then you will need to enable storage analytics on your storage account. However considering you can see the metrics data, my guess is that it is already enabled.
Regarding the source of these write operations, have you enabled diagnostics for your Functions and App Service? These write diagnostics logs to blob storage. Also, storage analytics is also writing to the same account and that will also cause these write operations.
For my case, I have a Azure App Insight which took 10K transactions on its storage per mintues for functions and app services, even thought there are only few https requests among them. I'm not sure what triggers them, but once I removed app insights, everything becomes normal.

Resource Group, Storage Account and Availability Set in Microsoft Azure

Recently I started to use Microsoft Azure Free Trial and I have gone through the link
I created a VM with the help of references. Also I read about Resource Group, Storage Account and Availability Set but I couldn’t understand the requirement and differences among all.
Please be kind to explain the requirement and differences among Resource Group, Storage Account and Availability Set.
In my opinion, these three resources are the logic resources, solutions for some requirement.
The Resource Group is the basic solution for you to manage other resources in the ARM module. As the definition shows:
A container that holds related resources for an Azure solution. The
resource group includes those resources that you want to manage as a
group. You decide how to allocate resources to resource groups based
on what makes the most sense for your organization
The Storage Account, you can also think it as a logic group for storage, a storage solution. It defines some types for different data requirement. And finally, the data still will be stored in the physical disk.
Azure Storage offers a massively scalable object store for data
objects, a file system service for the cloud, a messaging store for
reliable messaging, and a NoSQL store.
The Availability Set, it's a solution for the high availability. Create resources in the
Availability Set. It will help you avoid some accidents without downtime. Also, it can Prevent some erroneous operations from expanding with the update domain and fault domain. In one word, it provides redundancy to your application.
update
As you ask in the comment, first, when you create the VM, the resource group is necessary. But the Availability Set is not necessary, it just created for the high availability. You can create it or not, all dependant on your requirement.
For the storage account, there are two points I think you should pay attention to.
One is that the storage account is just for the unmanaged VM, to store the OS disk as a VHD file that you can manage it yourself. If you create a managed VM, the storage account is not needed to you.
Another is that the storage account also can store the logs, such as the diagnostic log. If you do not want to store the logs, the storage account is also not needed to you.
Note: If needed, you could just create one storage account to store the unmanaged VM OS disk and the logs as you want.

Windows Azure - how do you change the region of a Table Storage account?

I've created a Hosted Service that talks to a Storage Account in Azure. Both have their regions set to Anywhere US but looking at the bills for the last couple of months I've found that I'm being charged for communication between the two as one is in North-Central US and the other South-Central US.
Am I correct in thinking there would be no charge if they were both hosted in the same sub-region?
If so, is it possible to move one of them and how do I go about doing it? I can't see anywhere in the Management Portal that allows me to do this.
Thanks in advance.
Adding to what astaykov said: My advice is to always select a specific region, even if you don't use affinity groups. You'll now be assured that your storage and services are in the same data center and you won't incur outbound bandwidth charges.
There isn't a way to move a storage account; you'll need to either transfer your data (and incur bandwidth costs), or re-deploy your hosted service to the region currently hosting your data (no bandwidth costs). To minimize downtime if your site is live, you can push your new hosted service up (to a new .cloudapp.net name), then change your DNS information to point to the new hosted service.
EDIT 5/23/2012 - If you re-visit the portal and create a new storage account or hosted service, you'll notice that the Anywhere options are no longer available. This doesn't impact existing accounts (although they'll now be shown at their current subregion).
In order to avoid such charges the best guideline is to use Affinity Groups. You define affinity group once, and then choose it when creating new storage account or hosted service. You can still have the Affinity Group in "Anywhere US", but as long as both the storage account and the hosted service are in the same affinity group, they will be placed in one DataCenter.
As for moving account from one region to another - I don't think it is possible. You might have to create a new account and migrate the data if required. You can use some 3rd party tool as Cerebrata's Cloud Storage Studio to first export your data and then import it into the new account.
Don't forget - use affinity groups! This is the way to make 100% sure there will no be traffic charges between Compute, Storage, SQL Azure.

Windows Azure role is state full or not

According to MSDN, an azure service can conatins any number of worker roles. According to my knowledge a worker role can be recycled at any time by Windows Azure Fabric. If it is the true, then:
Worker role should be state less OR
Worker role should persist its state to Windows Azure storage services.
But i want to make a service which conatains client data and do not want to use Azure storage service. How I can accomplish this?
The velocity (whatever it is called) component of AppFabric is a distributed cache and can be used in these situations.
Azure's web and compute roles are stateless means all its local data is volatile and if you want to maintain the state you need to use some external resource to maintain that state and logic in your app to handle that. For simplicity you can use Azure drive but again internally its a blob storage.
You can write to local storage on the worker role by using the standard file IO APIs - but this will be erased upon instance shutdown.
You could also use SQL Azure, or post your data off to another storage service by HTTP (e.g. Amazon S3, or your own server).
However, this is likely to have performance implications. Depending on how much data you'll be storing, how frequently, and how big it is, you might be better off with Azure Storage!
Why don't you want to use Azure Storage?
If the data could be stored in Azure you have a good number of choices: Azure distributed cache, SQL Azure, blob, table, queue, or Azure Drive. It sounds like you need persistence, but can't use any of these Azure storage mechanisms. If data security is the problem, could you encrypt/hashing the data? Understanding why would be useful.
One alternative might be not persist at all, by chaining/nesting synchronous web service calls together, thus achieving reliable messaging.
Another might be to use Azure Connect to domain join Azure compute resource to your local data centre (if you have one), and use you on-premise storage.

Resources