What is using my Azure Blob Accounts - azure

After starting to use it for a while we have accumulated a bunch of storage accounts. There doesn’t seem to be a way to figure out if those storage accounts are used, and what they are used by. It looks like even spinning up a VM creates a storage account.
Is there a way (without the PowerShell) to see what is being used and delete the unused storage?

As others have said, it's not possible to accurately give you an answer, but you can iterate over the storage accounts, and within that loop iterate the containers to see which ones have blobs or not. I would approach this from within VS by creating a new project, then using NuGet to add a reference to the WindowsAzure.Storage client library. This will make iterating those collections easier. It is essentially a wrapper to the Azure Management API. There is likely a way to do it with PS as well.

Related

Best way to index data in Azure Blob Storage?

I plan on using Azure Blob storage to store images. I will have around 5000 categories for images that I plan on using folders to keep separated. For each of the image files, the file names won't differ a lot across the board and there is the potential to need to change metadata frequently.
My original plan was to use a SQL database to index all of these files and store my metadata there, but I'm second guessing that plan.
Is it feasible to index files in Azure Blob storage using a database, or should I just stick with using blob metadata?
Edit: I guess this question should really be "are there any downsides to indexing Azure Blob storage using a relational database?". I'm much more comfortable working with a DB than I am Azure storage, so my preference is to use a DB.
I'm second guessing whether or not to use a DB after looking at Azure storage more and discovering meta-tags and indexing. Hope this helps.
You can use Azure Search for this task as well, store images in Azure Storage (BLOB) and use Azure Search for crawling. indexing and searching. Using metadata you can enhance your search as well. This way you might not even need to use Folders to separate different categories.
Blob Index is a very feasible option and it can save the in the pricing, time, and overhead in terms of not using SQL.
https://azure.microsoft.com/en-gb/blog/manage-and-find-data-with-blob-index-for-azure-storage-now-in-preview/
If you are looking for more information on this preview feature, I would love hear more and work closer on this issue. Could you please reach me on BlobIndexPreview#microsoft.com.

What does the "Storage Account" setting of an Azure Function App do?

It has a default selection of "functionb7be452dbab0" in my case, but I can change it to select other storage accounts. There is no documentation that I can see which explains the storage account setting.
It is used for several things:
In Consumption mode, it holds your files, using Azure Files. i.e. all you function files exist in there.
In addition, the script runtime (based on the WebJobs SDK) uses Blobs, Queues and Tables as part of its infrastructure. e.g. it uses that to synchronize the work between multiple instances. It also stores logging information there.
Note that you can easily see all this by using Microsoft Azure Storage Explorer and looking at all the things in there.
As an aside, you can optionally also make use of this storage account for your own queues and blobs that you want to use in your functions.

Understanding Azure Storage (blobs) with accounts and containers. Test containers?

I am beginning to use Azure Storage (blob specifically) in my application but wanted to know what the norm was in the case of testing versus production storage.
So is it routine to create one storage account? ie:
http:// <storage-account-name>.blob.core.windows.net/
and then have different containers for each environment? ie:
http://<storage-account-name>.blob.core.windows.net/testContainer
http://<storage-account-name>.blob.core.windows.net/productionContainer
so then it would end up looking like with populated data:
http://<storage-account-name>.blob.core.windows.net/testContainer/<whateverkey>
http://<storage-account-name>.blob.core.windows.net/productionContainer/<whateverkey>
or is should I be creating two different storage accounts? I had assumed that the connectionString generated was for just the storage account name and then later in my logic I would be specifying the containers and keys when adding data.
Thanks
There is no standard way, but... keep in mind: Azure storage isn't multi-level regarding subfolders (though the paths can be simulated). So, using containers to organize test vs production will hinder your ability to take advantage of conainers properly within your app (e.g. if you want /images/foo.png ... now you must have /productioncontainer/images/foo.png).
Remember that storage accounts are free: You pay only for storage used. So it costs nothing extra to have both a test and a production storage account. And then, the only thing that changes is the base address (storage account name).
You're correct regarding connection string: You just have accountname.blob.core.windows.net/container/object .
You should use different Storage Accounts - that way in addition to having storage isolation you can also ensure you have different security protection for accessing your development environment vs your production environment.

Is this a sensible Azure Blob Storage setup and are there restructuring tools to help me migrate to it?

I think we have gone slightly wrong on the way we have used Azure storage in a SAAS system. We created a storage account per client (Securtiy was prime consideration) and containers per system area e.g. Vehicle, Work etc
Having done further reading it seems a suggestion would be that we should have used one account for all clients. Each client should have a container (so we can programmatically create it) which we then secure. Then files should just be structured using "virtual" folder structure e.g. Container called "Client A". Then Files for the Jobs (in Work area of system) stored like Work/Jobs/{entity id}/blah.pdf. Does this sound sensible?
If so we now have about 10 accounts that we need to restructure. Are there any tools that will let us easily copy one accounts contents to another containers account? I appreciate we probably can't move the files between accounts (as we set them up ages ago so can't use native copy function) so I guess some sort of copy. There are GB of files across all the accounts.
It may not be such a bad idea to keep different storage accounts per client. The benefits of doing that (to me) are:
Better security as mentioned by you.
You'll be able to achieve better throughput / client as each client will have their own storage account. If you keep one storage account for all clients, and if one client starts hitting that account badly other clients will be impacted.
Better scalability. Each storage account can hold up to 200 TB of data. So if you keep just one storage account and assuming each client consumes 100 GB of data, you'll be able to accommodate only 2000 clients (I hope my math is right :)). With individual storage accounts, you won't be restricted in that sense.
There're some downsides as well. Some of them are:
Management would be a nightmare. Imagining you have 2000 customers then you would end up managing 2000 storage accounts.
You may be limited by Windows Azure. Currently by default you get about 10 or 20 storage accounts per subscription and you would need to contact support to manually up that limit. They can do that for you but I would imagine you would want this to be a self-service model where you would be able to create as many storage accounts as you want without contacting support.
Now coming to your question about tooling, you could possibly write something on your own which makes use of Copy Blob functionality. This functionality allows you to copy blob data across storage accounts asynchronously. Basically this is what you would do is:
First create a blob container for each client in the target storage account.
Enumerate all blob containers in source storage account.
For each blob container in source storage account, enumerate the blobs.
Copy each blob asynchronously to target storage account in the client's blob container.
If you're a PowerShell fan, you can look into Cerebrata's Azure Management Cmdlets (http://www.cerebrata.com/Products/AzureManagementCmdlets) as well which wraps this functionality. I could have recommended Cerebrata's Azure Management Studio as well but I haven't tried this functionality just yet there [Disclosure: I'm one of the devs on Cerebrata team].
Hope this helps.
Adding to Gaurav Mantri answer...
You can have shared storage account for customers and use Shared Access Signature(SAS) to limiting access to particular container or blobs(as well as for tables and queues)...
http://msdn.microsoft.com/en-us/library/windowsazure/hh508996.aspx

What is the best strategy for using Windows Azure as a file storage system - with http download capabilities

I need to store multiple files that users upload, and then provide these users with the capability of accessing their files via http. There are two key considerations:
- Storage (which is my primary concern here)
- Security (which let's leave aside for now)
The question is:
What is the most cost efficient and performant way of storing all these files and giving access to them later? I believe the answer is:
- Store files within Azure Storage Account, and have a key that references them in an SQL Azure database.
I am correct on this?
Is a blob storage flat? Or can I create something like folders inside it to better organize my files?
The idea of using SQL Azure to store metadata for your blobs is a pretty common scenario, which allows you to take advantage of SQL for searching, and blobs for storage.
Blobs are organized by container. So you'd have something like:
http://mystorage.blob.core.windows.net/mycontainer/myfile.doc
You can also simulate a hierarchy using a delimiter, but in reality there's just container plus blob.
If you keep the container or blob private, the user would either have to go through your web front end (or web service), or you'd have to provide them with a special URL with a Shared Access Signature appended, which is a time-limited URL.
I would recommend you to take a look at BlobShare Sample which is a simple file sharing application that demonstrates the storage services of the Windows Azure Platform, together with the authentication and authorization capabilities of Access Control Service (ACS). The full sample code is located at following link:
http://blobshare.codeplex.com/
You can use this sample code immediately, just by adding proper reference to your Windows Azure Account credentials. The best thing with this sample is that you can provide blob access directly through Access Control Services. You can also modify the code to add SAS support as well as blob download from public containers. Once you have it working and understood the concept you can tweak to make it the way you would want.

Resources