Azure convert blob to file - azure

Some large disks containing hundreds of 30GB tar files have been prepared and ready to ship.
The disks have been prepared as BLOB using the WAImportExport tool.
The Azure share is expecting files.
Ideally we don't want to redo the disks as FILE instead of BLOB. Are we able to upload as BLOBs to one storage area and extract the millions of files from the tarballs to a FILE storage area without writing code?
Thanks
Kevin

azcopy will definitely do it and has been tested. We were able to move files from blobs to files using the CLI in Azure with the azcopy command.
The information provided below was proven not to be true.
Microsoft Partner told me yesterday there is no realistic way to convert Blobs to Files in the above-mentioned scenario.
Essentially, it is important to select either WAImportExport.exe Version 1 for BLOBS or WAImportExport.exe Version 2 for files. Information on this can be found at this location.
The mistake was easily made and done so by a number of people here: the link to the tool sent was to the binary version 1. Search results tended to direct users to version 1 but version 2 only appears only after deeper dig. Version 2 - seems to be an afterthought my Microsoft when they added the Files option to Azure. It's a pity they didn't use different binary names or build a switch into version 2 to do both and delete the version 1 offering.

Related

Azure blob soft delete and versioning- how to restore files easily?

I am trying to understand how soft delete and versioning work within azure blog storage.
It seems that if you have both soft delete and versioning turned on... you can’t just ‘undelete ’ deleted files, as versioning actually saves a new version as a deleted file.
So instead you have to promote the last version of each deleted file.
But what if you have a structure of nested folders and thousands of blobs... you can’t just promote the top version of the top level folder... you need to use Powershell to list files with no current version, and promote them? How would you do this?
This seems awfully complicated, when without versioning - a simple ‘undelete’ command is available from the GUI.
Am I missing something? What is the easiest way to ‘undelete’ a nested folder structure of thousand of blobs in folders, when versioning is turned on?
Thanks
As Rob Minson pointed out, the approach involves copying a blob version to the same container. For PowerShell, use the Copy-AzStorageBlob cmdlet; for Azure CLI, use the az storage blob copy start command. You can pass an account key or SAS token, or use Azure AD.
We've updated the documentation to shed some light on an approach to restoring blobs when soft-delete and/or versioning is enabled. Code samples are available for both PowerShell and Azure CLI.
Simply put, no.
The first point that needs to be emphasized is that blobs in blob storage are not nested as you might think. It seems that blob storage is the same as the local file system: some nested folders, and many files inside. But in fact these are fake, the storage structure of blob storage is flat. Blob storage is not about putting a small box in a box and then putting items in the small box. In fact, all blobs are items of blob storage, and there is no such thing as a "small box".
Then, the second point, for blob storage, the soft-delete operation only supports two objects, one is a blob and the other is a container.
Check out this document:
https://learn.microsoft.com/en-us/azure/storage/blobs/soft-delete-container-overview?tabs=azure-cli#how-container-soft-delete-works
However, you can only use container soft delete to restore blobs if
the container itself was deleted. To a restore a deleted blob when its
parent container has not been deleted, you must use blob soft delete
or blob versioning.
So unfortunately, there is no so-called easy way. You need to operate on each blob, the nested structure does not actually exist.
If you are interested, you can read this blog:
https://medium.com/#loopjockey/structuring-azure-blobs-for-functions-8305ba427356
I completely agree that this seems really un-documented at the moment. I've raised a github issue against this docs page to see if they can get the situation improved.
The best path through that I've found is something like the following:
Using Azure Storage Explorer, open up the container with the soft-deleted, versioned blobs, then change the drop down to "All blobs and blobs without current version". Now you can select a blob and hit 'Promote Version'. The deleted blob will be restored and in the Activities pane you can expand the operation and hit 'Copy AzCopy Command to Clipboard'.
The result will show you something like the following:
./azcopy.exe copy
"https://accountname.blob.core.windows.net/containername/blobname?<sastoken>&versionid=2021-04-22T11%3A35%3A36.9385599Z"
"https://accountname.blob.core.windows.net/containername/blobname?<sastoken>"
--overwrite=true
--recursive
--trusted-microsoft-suffixes=;
Now, based on this you can see you have a building block for automating the process you're talking about. Your problem at this point is finding this thing:
versionid=2021-04-22T11%3A35%3A36.9385599Z
Unfortunately that's a timestamp to nanosecond precision which you're not going to be able to infer. There's no functionality I can find in powershell, in the REST APIs or in AzCopy to get this data, the only way I have found is this sample for the .Net SDK.
All this probably means you can either:
Implement your own C# console app using the Azure.Storage.Blobs library to list the versions for each blob, then perform the relevant copy command now you know the magic version string
Wait for the REST API or Powershell library to get the ability to list versions

azcopy list function gives a different count (almost double) of objects than Storage Explorer

I am uploading files with AZCOPY (one by one as and when they are provided) to Azure Datalakes gen 2 and keep a track with Storage explorer and individual log of each file.
There have been 6253 file uploads and Storage explorer shows the same along with number of logs for each file upload
But when i use AZCOPY LIST it gives me 11254.
Making it difficult to script and automate.
Is there a logical explanation for this?
There is no access issue, in fact the same AZCOPY is working on copying the files
I have tried to redownload if that makes sense
This is a known bug, scheduled for fixing in our next release: https://github.com/Azure/azure-storage-azcopy/issues/692

Upload lot of files to Azure Blob Storage

I have directory where is about 100 000+ subdirectories. In every subdirectory is from one to ten files. All files are images with content type = image/jpeg.
Together this files have size over 54 GB. Is there any chance to upload this files with structure
/orders/1000000003/12345468878.jpeg.
I know that BLOB is not hierarchical. I don't have Windows, i don't have Powershell, i don't have Visual Studio.
Any suggestions?
Use the full path of your files as blob names. To upload from Linux or Mac, you can use Azure CLI (available as an NPM package).
Even though the structure is not hierarchical, you can "emulate" directories by adding /'s to the path name.
There are multiple clients available for Mac that support Azure Storage; my favorite is Cyberduck: https://cyberduck.io/ (free)

Why do duplicate folders exist in my Azure blob storage container?

I am aware that Azure blob storage does not use an actual folder structure but could not think of a better way to describe this.
The issue we're seeing is when opening Server Explorer (in Visual Studio) to browse through our blob storage container. We separate client resources and data by folder so in this case we have a blob titled productdata/Client_5/testimage.jpg.
The problem is that this Client_5 folder appears twice when inspecting our blob storage. So far I've double checked there are no weird special characters in either folder and double checked case sensitivity. The two paths are EXACTLY the same except its actual contents. Our application has no problems with this because the path is still exactly the same to the resources it's attempting to get. (For example, since the folders are named exactly the same, https://myazureaccount.blob.core.windows.net/productdata/Client_5/image.jpg still takes us to exactly where we need to be.) It's just a pain when we use Server Explorer to view our blobs on Azure because we have two folder locations to check. This could very well be a bug in Server Explorer for Visual Studio as well.
If anyone else has ever come across this, any info is appreciated. I couldn't find anything on the topic when searching online but figure I would post the question here for reference. Also, I'll be contacting Azure support soon to see if they can shed some light on any of this and will post what info I get from them here later.
It's true that blob storage doesn't have the concept of folder but the API built on the top of it does. I've seen exactly the same or similar problems in other tools as well: Microsoft Azure Storage Explorer and even Azure Portal. I tried to go deeper and when I executed:
CloudBlobContainer.ListBlobs(null, useFlatBlobListing: false)
it also returned duplicated directories. To be precise it returned the list with several instances of CloudBlobDirectory that had the same Prefix. Sounds like a bug. Now, if a tool uses this approach to get a list of directories it will fail. If the tool uses flat listing and builds the structure of folders in its logic it should be ok.
Hard to say what is the reason of such behaviour. In my case files in blob storage were copied by Azure Data Factory activity with concurrency option but I'm not sure if it's the rule.
BTW Microsoft Azure Storage Explorer in my case showed only some subset of folders which is much worse than displaying duplicated directories so I switched to Azure Explorer mentioned above and it's worth recommending.
I was experiencing an issue where the "folder" names appeared identical, but on closer inspection one had a trailing space.
Because folders don't really exist in blob storage and a space is a valid value, it is possible to have trailing or leading spaces in the names.
Azure blob storage does not ahve the concept of folder, only container, you can simulate folder setting the name of the blog to save like 'folder/img.png', but folder/ is part of the name of the blob.
Also, I ever use Storage Explorer, try with this: http://azurestorageexplorer.codeplex.com/releases/view/125870

Microsoft Azure - how to store backup data

I have a trivial and simple task - I want to store some my data (documents, photos etc) on Azure as backup. Which type of service should I select? Store as Blob? But I want to save a structure of data (folders, subfolders etc). Azure Backup? It stores only archive data, I don't want to archive it all in one. DocumentDB? I not need to have features like return json format etc... What is the best way to store many (thousands) files (big and small) to save folder structure without archivation to one file (so, I want to have a simple way to get only one file quickly)
I use Windows 7.
In addition to what knightpfhor wrote above you might also want to take a look at Getting Started with AzCopy - it is a Windows tool that helps you copy content to / from your local storage to Blob Storage. PowerShell and XPlatCLI are other options as well.
Based on what you've described, blob storage can achieve what you're after. While blob storage doesn't technically support folders, it does let you include \ in the file names which are interpreted by some blob storage clients as folders (for example the free Cerebrata Azure Explorer). So if you have
RootDrive
|
--My Pictures
|
Picture1.jpg
Picture2.jpg
--My Other Picutres
|
Picture1.jpg
You could create blobs with the names:
RootDrive\My Pictures\Picture1.jpg
RootDrive\My Pictures\Picture2.jpg
RootDrive\My Other Picutres\Picture1.jpg
And they have the ability to be interpreted as folders.
I examined Recovery Services deeply and it works now (from december 2014) for Windows client versions too (including Windows 7, Windows 8, Windows 8.1). It allows to backup selected files and folders.
This is a guide how to use it
http://azure.microsoft.com/blog/2014/11/04/back-up-your-data-to-the-cloud-with-3-simple-steps/

Resources