We are considering moving archived data after some retention period to the newer cool tier of Azure Storage (https://azure.microsoft.com/en-us/blog/introducing-azure-cool-storage/).
Can I programatically set up something that will automatically change the tier or move content to a cool tier storage after some period of time?
In additional, we can change the Blob Storage characteristic at any point. But when change from cool to hot, you have to pay for a lot of I/O for converting the blob type. Converting from hot to cool is free. We could find more details at this document.
Can I programmatically set up something that will automatically change the tier or move content to a cool tier storage after some period of time?
Yes, (since August 2021) it's now possible to have Azure automatically move blobs between Hot, Cool, and Archive tiers - this is done using a Lifecycle management policy which you can configure.
This is documented in this page: https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview
Some caveats:
You can't force-run a policy: once you've set-up the policy you'll simply have to wait... up to 48 hours or even longer.
So if you want to quickly archive or cool your blobs immediately then you should use another approach. Unfortunately I can't make any recommendations right at the time of me writing this.
Your blobs need to be readable by Azure itself, so this won't work with super secure or inaccessible storage accounts.
You need to enable blob access-tracking first (I describe how to do this in the section below).
Note that blob access tracking costs extra money, but I can't find any pricing information available on access-tracking.
The policy rules you can create are somewhat restrictive as you can only filter the set of blobs to auto-archive or automatically move to the Cool tier based on only these parameters:
The Blob's Last-Modified time and Last-Accessed time - but only in relative days-ago (so you can't use absolute dates or complex interval/date-range expressions).
The blob type (only Block blobs can be auto-moved between tiers; Append blobs can only be auto-deleted (not auto-moved); while Page blobs aren't supported at all)
The blob container name, and optionally a blob name prefix match.
But you can't filter by blob-name prefix match without specifying a container name.
And by exact-matching Blob Tags already indexed by a Blob Index.
And Blob Indexes have their own limitations too.
The only possible rule actions are to move blobs between tiers or to delete blobs. You can't move blobs between containers or accounts or edit any blob metadata or change a blob's access-policy and so on...
While there is an action that will move a Cool blob back to the Hot tier on the first access/request (which is nice), I'd have personally liked to see an option to promote the blob back to hot only after 2 or 3 (or more) requests in a given time-window, instead of always moving it on the first request, which means junk HTTP requests (e.g. web spiders, your own blob scanner tools, etc) might incur the more expensive Cool-tier data transfer/read fees - and you can’t move blobs to cool without a 30-day wait.
To set up a blob lifecycle policy to automatically move Hot blobs to the Cool tier if not accessed for 90 days (as an example), do this:
(Azure Portal screenshots taken as I wrote this, in August 2022 - if the screenshots or instructions are outdated leave a comment and I'll update them)
Open the Azure Portal (https://portal.azure.com) and navigate to your Storage Account's "blade".
BTW, if your storage account is older than a few years then ensure your Storage account is upgraded to GeneralPurposeV2 or Blob-only.
Look for "Lifecycle Management" in the sidebar:
Ensure "Enable access tracking" is enabled.
The tooltip says access tracking has additional charges, but I can't find any information about what those charges are.
NOTE: The access tracking feature is distinct from Azure Blob Inventory, but lots of pages and Google results incorrectly report Blob Inventory pricing
Click "Add a rule" and complete the wizard:
Enter a Rule name like move-blobs-not-accessed-for-90-days-to-cool-tier
If you only want to auto-move blobs in specific containers (instead of the entire account) then check the "Limit blobs with filters".
On the "Base blobs" tab of the wizard you can specify the Last-Modified and Last-Accessed rules and actions as described above.
Note that you can have up to 3 conditions in a single policy, for the 3 possible actions: move-to-cool (tierToCool), move-to-archive (tierToArchive), and delete-blob (delete).
And that's it - so now you have to wait up to 48 hours for the policy to start to take effect, and maybe wait even longer for Azure to actually finish moving the blobs between tiers.
As I write this, I've had my policy set-up for about 2 hours now and there's still no-sign that it's even started yet. YMMV.
Related
We have a situation where we now need to replicate all data from our Blob Storage account to another node in the same rough region (think WE Europe to NE Europe).
Currently we have only 1 application that writes content to certain containers, and other applications that write to other containers.
I'd ideally like to replicate all of them.
I've looked into the following:
Azure Blob Storage Geo-replication.
Azure Event Grid
Just writing the blobs in other applications themselves.
The issue with 1. is that the delivery time of the updated blob is not guaranteed. This could take upto 5 minutes, for example, which is too long in our use case. This is defined in their SLA.
The issue with 2. is that there is no guarantee of delivery time - only that delivery will take place at least once, whether it's to the intended destination, or a deadletter queue.
The issue with 3. is latency, cost & having to reengineer a few applications at once.
Expanding on 3. - we have an application that listens to messages published to Azure Service Bus. It acts upon those messages and creates blobs based off data that is stored in a DB on-site. It then uploads the blobs to the specific container it pertains to, and publshes a message saying it has done so.
The problem here is that we'd need to write from one data center, to another, incurring bandwidth charges, and ask this application to do double the work.
We would also need to ensure that blobs are written, and implement a method to ensure consistency - i.e. if one location goes down, we need to be able to replicate once it's back up.
So - what's the fastest replication method here? I'm assuming it's number 3. However, could there be another solution that has not been thought of so far?
Edit: 4. Have the generated files sent to Service Bus as messages and be picked up by applications dedicated to write, hosted in each geo-location.
Issue here is limitations on message size (makes sense!) we have files over 1MB.
We are making use of Azure Functions (v2) extensively to fulfill a number of business requirements.
We have recently introduced a durable function to handle a more complex business process which includes both fanning out, as well as a chain of functions.
Our problem is related to how much the storage account is being used. I made a fresh deployment on an account we use for dev testing on Friday, and left the function idling over the weekend to monitor what happens. I also set a budget to alert me if the cost start shooting up.
Less than 48 hours later, I received an alert that I was at 80% of my budget, and saw how the storage account was single handedly responsible for the entire bill. The most baffling part is, that it's mostly egress and ingress on file storage, which I'm entirely not using in the application! So it must be something internal by the azure function implementations. I've dug around and found this. In this case the issue seems to have been solved by switching to an App Service plan, but this is not an option in our case and must stick to consumption. I also double checked and made sure that I don't have the AzureWebJobsDashboard setting.
Any ideas what we can try next?
The below are some interesting charts from the storage account. Note how file egress and ingress makes up most of the activity on the entire account.
A ticket for this issue has also been opened on GitHub
The link you provided actually points to AzureWebJobsDashboard as the culprit. AzureWebJobsDashboard is an optional storage account connection string for storing logs and displaying them in the Monitor tab in the portal. The storage account must be a general-purpose one that supports blobs, queues, and tables.
For performance and experience, it is recommended to use
APPINSIGHTS_INSTRUMENTATIONKEY and App Insights for monitoring instead
of AzureWebJobsDashboard
When creating a function app in App Service, you must create or link to a general-purpose Azure Storage account that supports Blob, Queue, and Table storage. Internally, Functions uses Storage for operations such as managing triggers and logging function executions. Some storage accounts do not support queues and tables, such as blob-only storage accounts, Azure Premium Storage, and general-purpose storage accounts with ZRS replication. These accounts are filtered out of from the Storage Account blade when creating a function app.
When using the Consumption hosting plan, your function code and
binding configuration files are stored in Azure File storage in the
main storage account. When you delete the main storage account, this
content is deleted and cannot be recovered.
If you use the legacy "General Purpose V1" storage accounts, you may see your costs drop by up to 95%. I had a similar use case where my storage account costs exploded after the accounts were upgraded to "V2". In my case, we just went back to V1 instead of changing our application.
Altough V1 is now legacy, I don't see Azure dropping it any time soon. You can still create it using the Azure Portal. Could be a medium-term solution.
Some alternatives to save costs:
Try the "premium" performance tier (V2 only). It is cheaper for such workloads.
Try LRS or ZRS as the redundancy setting. Depends on the criticality of this orchestration data.
PS: Our use case were some EventHub processors which used the storage accounts for coordination and checkpointing.
PS2: Regardless of the storage account configuration, there must be a way reduce the traffic towards the storage account. It is just another thing to try to reduce costs.
We are using an Azure Storage account to store some files that shall be downloaded by our app on the users demand.
Even though there should be no write operations (at least none I could think of), we are exceeding the included write operations just some days into the billing period (see image).
Regarding the price it's still within limits, but I'd still like to know whether this is normal and how I can analyze the matter. Besides the storage we are using
Functions and
App Service (mobile app)
but none of them should cause that many write operations. I've checked the logs of our functions and none of those that access the queues or the blobs have been active lately. There are are some functions that run every now and then, but only once every few minutes and those do not access the storage at all.
I don't know if this is related, but there is a kind of periodic ingress on our blob storage (see the image below). The period is roundabout 1 h, but there is a baseline of 100 kB per 5 min.
Analyzing the metrics of the storage account further, I found that there is a constant stream of 1.90k transactions per hour for blobs and 1.3k transactions per hour for queues, which seems quite exceptional to me. (Please not that the resolution of this graph is 1 h, while the former has a resolution of 5 minutes)
Is there anything else I can do to analyze where the write operations come from? It kind of bothers me, since it does not seem as if it's supposed to be like that.
I 've had the exact same problem; after enabling Storage Analytics and inspecting the $logs container I found many log entries that indicate that upon every request towards my Azure Functions, these write operations occur against the following container object:
https://[function-name].blob.core.windows.net:443/azure-webjobs-hosts/locks/linkfunctions/host?comp=lease
In my Azure Functions code I do not explicitly write in any of container or file as such but I have the following two Application Settings configured:
AzureWebJobsDashboard
AzureWebJobsStorage
So I filled a support ticker in Azure with the following questions:
Are the write operation triggered by these application settings? I
believe so but could you please confirm.
Will the write operation stop if I delete these application settings?
Could you please describe, in high level, in what context these operations occur (e.g. logging? resource locking, other?)
and I got the following answers from Azure support team, respectively:
Yes, you are right. According to the logs information, we can see “https://[function-name].blob.core.windows.net:443/azure-webjobs-hosts/locks/linkfunctions/host?comp=lease”.
This azure-webjobs-hosts folder is associated with function app and it’s created by default as well as creating function app. When function app is running, it will record these logs in the storage account which is configured with AzureWebJobsStorage.
You can’t stop the write operations because these operations record necessary logs to storage account used by Azure Functions runtime. Please do not remove application setting AzureWebJobsStorage. The Azure Functions runtime uses this storage account connection string for all functions except for HTTP triggered functions. Removing this Application Settings will cause your function app unable to start. By the way, you can remove AzureWebJobsDashboard and it will stop Monitor rather than the operation above.
These operations is to record runtime logs of function app. These operations will occur when our backend allocates instance for running the function app.
Best place to find information about storage usage is to make use of Storage Analytics especially Storage Analytics Logging.
There's a special blob container called $logs in the same storage account which will have detailed information about every operation performed against that storage account. You can view the blobs in that blob container and find the information.
If you don't see this blob container in your storage account, then you will need to enable storage analytics on your storage account. However considering you can see the metrics data, my guess is that it is already enabled.
Regarding the source of these write operations, have you enabled diagnostics for your Functions and App Service? These write diagnostics logs to blob storage. Also, storage analytics is also writing to the same account and that will also cause these write operations.
For my case, I have a Azure App Insight which took 10K transactions on its storage per mintues for functions and app services, even thought there are only few https requests among them. I'm not sure what triggers them, but once I removed app insights, everything becomes normal.
I've created a Hosted Service that talks to a Storage Account in Azure. Both have their regions set to Anywhere US but looking at the bills for the last couple of months I've found that I'm being charged for communication between the two as one is in North-Central US and the other South-Central US.
Am I correct in thinking there would be no charge if they were both hosted in the same sub-region?
If so, is it possible to move one of them and how do I go about doing it? I can't see anywhere in the Management Portal that allows me to do this.
Thanks in advance.
Adding to what astaykov said: My advice is to always select a specific region, even if you don't use affinity groups. You'll now be assured that your storage and services are in the same data center and you won't incur outbound bandwidth charges.
There isn't a way to move a storage account; you'll need to either transfer your data (and incur bandwidth costs), or re-deploy your hosted service to the region currently hosting your data (no bandwidth costs). To minimize downtime if your site is live, you can push your new hosted service up (to a new .cloudapp.net name), then change your DNS information to point to the new hosted service.
EDIT 5/23/2012 - If you re-visit the portal and create a new storage account or hosted service, you'll notice that the Anywhere options are no longer available. This doesn't impact existing accounts (although they'll now be shown at their current subregion).
In order to avoid such charges the best guideline is to use Affinity Groups. You define affinity group once, and then choose it when creating new storage account or hosted service. You can still have the Affinity Group in "Anywhere US", but as long as both the storage account and the hosted service are in the same affinity group, they will be placed in one DataCenter.
As for moving account from one region to another - I don't think it is possible. You might have to create a new account and migrate the data if required. You can use some 3rd party tool as Cerebrata's Cloud Storage Studio to first export your data and then import it into the new account.
Don't forget - use affinity groups! This is the way to make 100% sure there will no be traffic charges between Compute, Storage, SQL Azure.
I've recently begun using the Web Deployment Accelerator for my Windows Azure account. It is providing an immediate return in time saved and is an excellent offering.
However since "everything" is now stored to Azure Storage rather to the regular E:Drive I am immediately seeing a cost consequence for using the tool.
In one day I have racked up a mighty 4 cent NZD charge. In order to do that I had to burn through about 80,000 storage transactions and frankly i cant figure where they all went.
I uploaded 6 sites that are very small wouldn't have more than 300 files each. So I'm wondering:
a. is there is a profiling tool for the Web Deployment Accelerator that will allow me to see where and how 80,000 storage transactions were used for such a small offering. Is it storage transaction intensive tool? Has any cost analysis been carried out in terms of how this tool operates? Has it been optimised with cost in mind?
b. If I'm using this tool do i pay for 2 storage transactions per http request to a site? As since the tool now writes the web server logs to table storage, that would be one storage request to pull the http request resource (img, script, etc) and a storage request to write the log entry as well would it not?
I'm not concerned about current charges I 'm concerned about the future if i start rolling all my hosted business into the cloud. I mean Im now being charged even just to "look" at my data right? If i list the contents of a storage folder using a tool like Azure Storage Explorer that's x number of storage transactions where x = number of files in the folder?
Not sure of a 3rd-party profiler tool, but Windows Azure Storage logging and metrics will give you very detailed info regarding both individual accesses and hourly rollups. It's pretty straightforward to enable, and the November 2011 SDK includes support for the API calls required for enabling. See here for an overview of what's offered for metrics and logging.
My team worked with Fullscale180 to build a storage library, Azure Store XRay, to demonstrate how to enable and query storage metrics and logging. Note: This was published before the SDK had logging and metrics support, so it uses the REST API calls instead. But that won't impact you if you try to use the library.
You can also look at another code demo, Cloud Ninja, which calls the XRay library for its metrics display (see here for running demo).
Regarding querying storage for objects in blob containers: that's not a 1:1 transaction:file scenario. You can specify the maximum number of blobs to return when listing items in a container. It's possible that all blobs are returned in one transaction. Of course, if you then grab each blob, each of these will be at least one transaction (depending on blob size). See here for details about listing blobs.