Why do duplicate folders exist in my Azure blob storage container? - azure

I am aware that Azure blob storage does not use an actual folder structure but could not think of a better way to describe this.
The issue we're seeing is when opening Server Explorer (in Visual Studio) to browse through our blob storage container. We separate client resources and data by folder so in this case we have a blob titled productdata/Client_5/testimage.jpg.
The problem is that this Client_5 folder appears twice when inspecting our blob storage. So far I've double checked there are no weird special characters in either folder and double checked case sensitivity. The two paths are EXACTLY the same except its actual contents. Our application has no problems with this because the path is still exactly the same to the resources it's attempting to get. (For example, since the folders are named exactly the same, https://myazureaccount.blob.core.windows.net/productdata/Client_5/image.jpg still takes us to exactly where we need to be.) It's just a pain when we use Server Explorer to view our blobs on Azure because we have two folder locations to check. This could very well be a bug in Server Explorer for Visual Studio as well.
If anyone else has ever come across this, any info is appreciated. I couldn't find anything on the topic when searching online but figure I would post the question here for reference. Also, I'll be contacting Azure support soon to see if they can shed some light on any of this and will post what info I get from them here later.

It's true that blob storage doesn't have the concept of folder but the API built on the top of it does. I've seen exactly the same or similar problems in other tools as well: Microsoft Azure Storage Explorer and even Azure Portal. I tried to go deeper and when I executed:
CloudBlobContainer.ListBlobs(null, useFlatBlobListing: false)
it also returned duplicated directories. To be precise it returned the list with several instances of CloudBlobDirectory that had the same Prefix. Sounds like a bug. Now, if a tool uses this approach to get a list of directories it will fail. If the tool uses flat listing and builds the structure of folders in its logic it should be ok.
Hard to say what is the reason of such behaviour. In my case files in blob storage were copied by Azure Data Factory activity with concurrency option but I'm not sure if it's the rule.
BTW Microsoft Azure Storage Explorer in my case showed only some subset of folders which is much worse than displaying duplicated directories so I switched to Azure Explorer mentioned above and it's worth recommending.

I was experiencing an issue where the "folder" names appeared identical, but on closer inspection one had a trailing space.
Because folders don't really exist in blob storage and a space is a valid value, it is possible to have trailing or leading spaces in the names.

Azure blob storage does not ahve the concept of folder, only container, you can simulate folder setting the name of the blog to save like 'folder/img.png', but folder/ is part of the name of the blob.
Also, I ever use Storage Explorer, try with this: http://azurestorageexplorer.codeplex.com/releases/view/125870

Related

Azure blob soft delete and versioning- how to restore files easily?

I am trying to understand how soft delete and versioning work within azure blog storage.
It seems that if you have both soft delete and versioning turned on... you can’t just ‘undelete ’ deleted files, as versioning actually saves a new version as a deleted file.
So instead you have to promote the last version of each deleted file.
But what if you have a structure of nested folders and thousands of blobs... you can’t just promote the top version of the top level folder... you need to use Powershell to list files with no current version, and promote them? How would you do this?
This seems awfully complicated, when without versioning - a simple ‘undelete’ command is available from the GUI.
Am I missing something? What is the easiest way to ‘undelete’ a nested folder structure of thousand of blobs in folders, when versioning is turned on?
Thanks
As Rob Minson pointed out, the approach involves copying a blob version to the same container. For PowerShell, use the Copy-AzStorageBlob cmdlet; for Azure CLI, use the az storage blob copy start command. You can pass an account key or SAS token, or use Azure AD.
We've updated the documentation to shed some light on an approach to restoring blobs when soft-delete and/or versioning is enabled. Code samples are available for both PowerShell and Azure CLI.
Simply put, no.
The first point that needs to be emphasized is that blobs in blob storage are not nested as you might think. It seems that blob storage is the same as the local file system: some nested folders, and many files inside. But in fact these are fake, the storage structure of blob storage is flat. Blob storage is not about putting a small box in a box and then putting items in the small box. In fact, all blobs are items of blob storage, and there is no such thing as a "small box".
Then, the second point, for blob storage, the soft-delete operation only supports two objects, one is a blob and the other is a container.
Check out this document:
https://learn.microsoft.com/en-us/azure/storage/blobs/soft-delete-container-overview?tabs=azure-cli#how-container-soft-delete-works
However, you can only use container soft delete to restore blobs if
the container itself was deleted. To a restore a deleted blob when its
parent container has not been deleted, you must use blob soft delete
or blob versioning.
So unfortunately, there is no so-called easy way. You need to operate on each blob, the nested structure does not actually exist.
If you are interested, you can read this blog:
https://medium.com/#loopjockey/structuring-azure-blobs-for-functions-8305ba427356
I completely agree that this seems really un-documented at the moment. I've raised a github issue against this docs page to see if they can get the situation improved.
The best path through that I've found is something like the following:
Using Azure Storage Explorer, open up the container with the soft-deleted, versioned blobs, then change the drop down to "All blobs and blobs without current version". Now you can select a blob and hit 'Promote Version'. The deleted blob will be restored and in the Activities pane you can expand the operation and hit 'Copy AzCopy Command to Clipboard'.
The result will show you something like the following:
./azcopy.exe copy
"https://accountname.blob.core.windows.net/containername/blobname?<sastoken>&versionid=2021-04-22T11%3A35%3A36.9385599Z"
"https://accountname.blob.core.windows.net/containername/blobname?<sastoken>"
--overwrite=true
--recursive
--trusted-microsoft-suffixes=;
Now, based on this you can see you have a building block for automating the process you're talking about. Your problem at this point is finding this thing:
versionid=2021-04-22T11%3A35%3A36.9385599Z
Unfortunately that's a timestamp to nanosecond precision which you're not going to be able to infer. There's no functionality I can find in powershell, in the REST APIs or in AzCopy to get this data, the only way I have found is this sample for the .Net SDK.
All this probably means you can either:
Implement your own C# console app using the Azure.Storage.Blobs library to list the versions for each blob, then perform the relevant copy command now you know the magic version string
Wait for the REST API or Powershell library to get the ability to list versions

Downloading files from Azure Blob Storage - Error: The given path's format is not supported

I've just started working with Azure Blob Storage and I'm trying to download Blob Blocks to my local environment, without any success. the file names get copied, but no data is copied (i.e. the files are empty)
If I use the same settings with an upload task, then the files are copied.
The error I get from SSIS (SQL Server 2016 SP1 and VS 2015 professional) is:
Error: Download task has stopped with exception: The given path's format is not supported.
Below are the properties of the download and upload, any ideas anyone?
cheers,
Anthony
I tested both of these tasks, and my setup of each of the tasks look pretty much identical to yours as far as slashes and backslashes go for the LocalDirectory, BlobContainer, BlobDirectory, etc. I did notice you have some variation in capitalization (e.g. DELTA vs Delta), but I tested that as well, and it was not case sensitive. Because all of that looks perfect, I suspect there is an issue with permissions or the name of the files that it doesn't like.
You probably need to escape the backslashes in the expression that sets the variable value used for LocalDirectory, like this:
C:\\Temp\\Delta\\GoodsIn

Azure convert blob to file

Some large disks containing hundreds of 30GB tar files have been prepared and ready to ship.
The disks have been prepared as BLOB using the WAImportExport tool.
The Azure share is expecting files.
Ideally we don't want to redo the disks as FILE instead of BLOB. Are we able to upload as BLOBs to one storage area and extract the millions of files from the tarballs to a FILE storage area without writing code?
Thanks
Kevin
azcopy will definitely do it and has been tested. We were able to move files from blobs to files using the CLI in Azure with the azcopy command.
The information provided below was proven not to be true.
Microsoft Partner told me yesterday there is no realistic way to convert Blobs to Files in the above-mentioned scenario.
Essentially, it is important to select either WAImportExport.exe Version 1 for BLOBS or WAImportExport.exe Version 2 for files. Information on this can be found at this location.
The mistake was easily made and done so by a number of people here: the link to the tool sent was to the binary version 1. Search results tended to direct users to version 1 but version 2 only appears only after deeper dig. Version 2 - seems to be an afterthought my Microsoft when they added the Files option to Azure. It's a pity they didn't use different binary names or build a switch into version 2 to do both and delete the version 1 offering.

Publish website to Azure, remove additional files at destination, but ignore specific folders

I currently manually delete obsolete folders from a published azure website. I know there is an option in visual studio to Remove additional files at destination. My problem is that I have an Images folder (quite large) that users upload, that will be deleted when I publish with this option checked. My question is, is there a way to use this option with exclusions? Meaning, to delete all files that are not in the local project except "\Images" folder?
You can most likely customize the web deploy usage from VS to do what you want but I don't think I would recommend it since things like that tend to get fragile.
I would suggest changing your architecture to store the images in a blob container, then possibly mapping your blobs to a custom domain (https://azure.microsoft.com/en-us/documentation/articles/storage-custom-domain-name/).
Having your images in blob storage will also prevent any accidental deletion of the Images folder by someone else that doesn't know it shouldn't be touched (or you simply forgetting about it one day).
Using blob storage will also allow you to configure CDN usage if ever find that you needed it.
Another option would be to create a virtual directory on your WebApp configuration and put the Images there - that way your VS deploy/publish wouldn't be modifying that subdirectory. This link may help with that: https://blogs.msdn.microsoft.com/tomholl/2014/09/21/deploying-multiple-virtual-directories-to-a-single-azure-website/

Azure localisation - how to take resource files out of your packaged application?

I have a localised site reading from your standard .resx resource files. Everything works fine, however I am deploying to Azure. The .resx files are packaged along with the rest of the site and deployed onto each role instance. Meaning if I want to make a change to something I need to redeploy the entire package to Azure again and suffer a rolling update.
Is there a way I can get my site to read resource files from a single static location, such as blob storage? Is this a good idea or should I just do my best to get it right first time?
Thank you!
Well rolling updates aren't the end of the world. If your site is hosted with multiple running instances, each instance will be taken out of the load-balanced loop, brought down and updated in sequence, so your users shouldn't experience any real down time.
One option though would be to move to a non-resx based localization setup. you can write your own ResourceProvider to override the built in one. Rick Strahl had a nice example of reading resource information from a database.
http://www.west-wind.com/weblog/posts/2009/Apr/01/Updated-WestwindGlobalization-Data-Driven-Resource-Provider-for-ASPNET

Resources