How to rename file in BLOB container - azure

Can you please let me know if it is a possible to rename a file in a BLOB container using SDK. In most of the suggestion it is recommended to create a new BLOB, copy the content of the old BLOB & then delete the old BLOB. However, due to size of the files I do not seem it will be an ideal solution in the architecture that I am working with. So I am searching for an alternate way of achieving result.
Kind Regards,

Related

Azure blob soft delete and versioning- how to restore files easily?

I am trying to understand how soft delete and versioning work within azure blog storage.
It seems that if you have both soft delete and versioning turned on... you can’t just ‘undelete ’ deleted files, as versioning actually saves a new version as a deleted file.
So instead you have to promote the last version of each deleted file.
But what if you have a structure of nested folders and thousands of blobs... you can’t just promote the top version of the top level folder... you need to use Powershell to list files with no current version, and promote them? How would you do this?
This seems awfully complicated, when without versioning - a simple ‘undelete’ command is available from the GUI.
Am I missing something? What is the easiest way to ‘undelete’ a nested folder structure of thousand of blobs in folders, when versioning is turned on?
Thanks
As Rob Minson pointed out, the approach involves copying a blob version to the same container. For PowerShell, use the Copy-AzStorageBlob cmdlet; for Azure CLI, use the az storage blob copy start command. You can pass an account key or SAS token, or use Azure AD.
We've updated the documentation to shed some light on an approach to restoring blobs when soft-delete and/or versioning is enabled. Code samples are available for both PowerShell and Azure CLI.
Simply put, no.
The first point that needs to be emphasized is that blobs in blob storage are not nested as you might think. It seems that blob storage is the same as the local file system: some nested folders, and many files inside. But in fact these are fake, the storage structure of blob storage is flat. Blob storage is not about putting a small box in a box and then putting items in the small box. In fact, all blobs are items of blob storage, and there is no such thing as a "small box".
Then, the second point, for blob storage, the soft-delete operation only supports two objects, one is a blob and the other is a container.
Check out this document:
https://learn.microsoft.com/en-us/azure/storage/blobs/soft-delete-container-overview?tabs=azure-cli#how-container-soft-delete-works
However, you can only use container soft delete to restore blobs if
the container itself was deleted. To a restore a deleted blob when its
parent container has not been deleted, you must use blob soft delete
or blob versioning.
So unfortunately, there is no so-called easy way. You need to operate on each blob, the nested structure does not actually exist.
If you are interested, you can read this blog:
https://medium.com/#loopjockey/structuring-azure-blobs-for-functions-8305ba427356
I completely agree that this seems really un-documented at the moment. I've raised a github issue against this docs page to see if they can get the situation improved.
The best path through that I've found is something like the following:
Using Azure Storage Explorer, open up the container with the soft-deleted, versioned blobs, then change the drop down to "All blobs and blobs without current version". Now you can select a blob and hit 'Promote Version'. The deleted blob will be restored and in the Activities pane you can expand the operation and hit 'Copy AzCopy Command to Clipboard'.
The result will show you something like the following:
./azcopy.exe copy
"https://accountname.blob.core.windows.net/containername/blobname?<sastoken>&versionid=2021-04-22T11%3A35%3A36.9385599Z"
"https://accountname.blob.core.windows.net/containername/blobname?<sastoken>"
--overwrite=true
--recursive
--trusted-microsoft-suffixes=;
Now, based on this you can see you have a building block for automating the process you're talking about. Your problem at this point is finding this thing:
versionid=2021-04-22T11%3A35%3A36.9385599Z
Unfortunately that's a timestamp to nanosecond precision which you're not going to be able to infer. There's no functionality I can find in powershell, in the REST APIs or in AzCopy to get this data, the only way I have found is this sample for the .Net SDK.
All this probably means you can either:
Implement your own C# console app using the Azure.Storage.Blobs library to list the versions for each blob, then perform the relevant copy command now you know the magic version string
Wait for the REST API or Powershell library to get the ability to list versions

How to unzip .gz file from blob storage in Azure Data Factory?

I have folder (say folder A) in blob storage that has more than one zipped file (.gz format). I want to unzip all the file and save back to folder (say folder B in blob storage).
This is the approach I was trying. GetMetadata-->ForEach Loop.
Inside foreach loop, I tried copy activity. However, the unzipped file is corrupted.
Glad to hear the issue is resolved now:
"Actually the issue was file extension. I added file extension and it
works for me now. "
I help you post it and others can know that. This can be beneficial to other community members.

How to uncompress rar files using Azure DataFactory

We have a new client, while landing the project we gave them a blob storage for them to leave files so we could later automate and process the information.
The idea is to use Azure Datafactory but we find no way of dealing with .rar files, and even .zip, being it files from windows, are giving us trouble. And since it is the clien giving the .rar format, we wanted to make absolutely sure there is no way to process before asking them to change it, or deploying a databricks or similar service just for the purpose of transforming the file.
Is there any way to get a .rar file from a blob storage, uncompress it, then process it?
I have been looking in posts like this and related official documentation and closest we have come is using ZipDeflate, but it does not seem to fill our requirement.
Thanks in advance!
Data factory compression only supported types are GZip, Deflate, BZip2, and ZipDeflate.
For the Unsupported file types and compression formats, Data Factory provides some workarounds for us:
You can use the extensibility features of Azure Data Factory to transform files that aren't supported. Two options include Azure Functions and custom tasks by using Azure Batch.
You can see a sample that uses an Azure function to extract the contents of a tar file. For more information, see Azure Functions activity.
You can also build this functionality using a custom dotnet activity. Further information is available here.
Next way, you may need to figure out how to using Azure function to extract the contents of a rar file.
you can use logic apps
you can use webhook activity calling a runbook
both are easiee than using a custom activity

azcopy list function gives a different count (almost double) of objects than Storage Explorer

I am uploading files with AZCOPY (one by one as and when they are provided) to Azure Datalakes gen 2 and keep a track with Storage explorer and individual log of each file.
There have been 6253 file uploads and Storage explorer shows the same along with number of logs for each file upload
But when i use AZCOPY LIST it gives me 11254.
Making it difficult to script and automate.
Is there a logical explanation for this?
There is no access issue, in fact the same AZCOPY is working on copying the files
I have tried to redownload if that makes sense
This is a known bug, scheduled for fixing in our next release: https://github.com/Azure/azure-storage-azcopy/issues/692

Why do duplicate folders exist in my Azure blob storage container?

I am aware that Azure blob storage does not use an actual folder structure but could not think of a better way to describe this.
The issue we're seeing is when opening Server Explorer (in Visual Studio) to browse through our blob storage container. We separate client resources and data by folder so in this case we have a blob titled productdata/Client_5/testimage.jpg.
The problem is that this Client_5 folder appears twice when inspecting our blob storage. So far I've double checked there are no weird special characters in either folder and double checked case sensitivity. The two paths are EXACTLY the same except its actual contents. Our application has no problems with this because the path is still exactly the same to the resources it's attempting to get. (For example, since the folders are named exactly the same, https://myazureaccount.blob.core.windows.net/productdata/Client_5/image.jpg still takes us to exactly where we need to be.) It's just a pain when we use Server Explorer to view our blobs on Azure because we have two folder locations to check. This could very well be a bug in Server Explorer for Visual Studio as well.
If anyone else has ever come across this, any info is appreciated. I couldn't find anything on the topic when searching online but figure I would post the question here for reference. Also, I'll be contacting Azure support soon to see if they can shed some light on any of this and will post what info I get from them here later.
It's true that blob storage doesn't have the concept of folder but the API built on the top of it does. I've seen exactly the same or similar problems in other tools as well: Microsoft Azure Storage Explorer and even Azure Portal. I tried to go deeper and when I executed:
CloudBlobContainer.ListBlobs(null, useFlatBlobListing: false)
it also returned duplicated directories. To be precise it returned the list with several instances of CloudBlobDirectory that had the same Prefix. Sounds like a bug. Now, if a tool uses this approach to get a list of directories it will fail. If the tool uses flat listing and builds the structure of folders in its logic it should be ok.
Hard to say what is the reason of such behaviour. In my case files in blob storage were copied by Azure Data Factory activity with concurrency option but I'm not sure if it's the rule.
BTW Microsoft Azure Storage Explorer in my case showed only some subset of folders which is much worse than displaying duplicated directories so I switched to Azure Explorer mentioned above and it's worth recommending.
I was experiencing an issue where the "folder" names appeared identical, but on closer inspection one had a trailing space.
Because folders don't really exist in blob storage and a space is a valid value, it is possible to have trailing or leading spaces in the names.
Azure blob storage does not ahve the concept of folder, only container, you can simulate folder setting the name of the blog to save like 'folder/img.png', but folder/ is part of the name of the blob.
Also, I ever use Storage Explorer, try with this: http://azurestorageexplorer.codeplex.com/releases/view/125870

Resources