Copying multiple files from Azure Blob Storage - azure

We are modelling a directory structure in Azure Blob storage. I would like to be able to copy all files from a folder to a local directory. Is there any way to copy multiple files from blob storage at once that match a pattern or do I have to get these files individually?

As you may already know, blob storage only support 1 level hierarchy: You have blob containers (folder) and each container contains blobs (files). There's no concept of folder hierarchy there. The way you create an illusion of folder hierarchy is via something called blob prefix. For example, look at the screenshot below:
In the picture above, you see two folders under images blob container: 16x16 and 24x24. In cloud, the blob names include these folder names. So the name of AddCertificate.png file in folder 24x24 in blob storage is 24x24/AddCertificate.png.
Now coming to your question, you would still need to download individual files but what storage client library allows you to do is fetch a list of blobs by blob prefix. So if you want to download all files in folder 24x24 (or in other words you want to download all blobs with prefix 24x24), you would first list the blobs with prefix 24x24 and then download each blob individually. On the local computer, you could create a folder by the name of the prefix.

You can refer below code as a sample reference ((its written in javascript but you can easily map the logic to any language). This code is maintained by Microsoft.
https://github.com/WindowsAzure/azure-sdk-tools-xplat/blob/master/lib/commands/storage.blob._js#L144
https://github.com/WindowsAzure/azure-sdk-tools-xplat/blob/master/lib/commands/storage.blob._js#L689
The second link explains how to parse blob prefixes and get the folder hierarchy out of it.
It also shows how to download blob using multiple threads and ensure the integrity of the blob using MD5.
Just including the high level code that process blob path containing prefixes. Please refer above link for full implementation, I cannot copy paste entire implementation here.
if (!fileName) {
var structure = StorageUtil.getStructureFromBlobName(specifiedBlobName);
fileName = structure.fileName;
fileName = utils.escapeFilePath(fileName);
structure.dirName = StorageUtil.recursiveMkdir(dirName, structure.dirName);
fileName = path.join(structure.dirName, fileName);
dirName = '.'; //FileName already contain the dirname
}

Related

Azure Blob Using Python

I am accessing a website that allows me to download CSV file. I would like to store the CSV file directly to the blob container. I know that one way is to download the file locally and then upload the file, but I would like to skip the step of downloading the file locally. Is there a way in which I could achieve this.
i tried the following:
block_blob_service.create_blob_from_path('containername','blobname','https://*****.blob.core.windows.net/containername/FlightStats',content_settings=ContentSettings(content_type='application/CSV'))
but I keep getting errors stating path is not found.
Any help is appreciated. Thanks!
The file_path in create_blob_from_path is the path of your local file, looks like "C:\xxx\xxx". This path('https://*****.blob.core.windows.net/containername/FlightStats') is Blob URL.
You could download your file to byte array or stream, then use create_blob_from_bytes or create_blob_from_stream method.
Other answer uses the so called "Azure SDK for Python legacy".
I recommend that if it's fresh implementation then use Gen2 Storage Account (instead of Gen1 or Blob storage).
For Gen2 storage account, see example here:
from azure.storage.filedatalake import DataLakeFileClient
data = b"abc"
file = DataLakeFileClient.from_connection_string("my_connection_string",
file_system_name="myfilesystem", file_path="myfile")
file.append_data(data, offset=0, length=len(data))
file.flush_data(len(data))
It's painful, if you're appending multiple times then you'll have to keep track of offset on client side.

Storing blob in folder in blob container

I'm rewriting a backend call in node.js/express that downloads a file from a large folder in Azure Storage and then I resize the image and upload it to another folder in the same blob container. Just a different folder. Downloading the pictures is pretty straight forward with request but when I try to upload I can't find an option to upload in a folder. just in the root of the container.
I've tried to add the folder in the URL manually. however, the library won't allow me to change it since it's a constant. adding a '/' in the name of the file will replace the '/' with '%2F' (this also counts for the blobcontainer name).
I'm using the library #azure/storage-blob
const containerName = 'test/Small';
// upload file
BlockBlobURL.fromContainerURL(
ContainerURL.fromServiceURL(serviceURL, containerName),
(req.body.NewFileName != null ? req.body.NewFileName : req.body.FileName) + '.png'
);
The URL ends up being
https://*****.blob.core.windows.net/test%2FSmall/test.png
Instead of
https://*****.blob.core.windows.net/test/Small/test.png
Actually, there aren't folders in Azure Blob Storage, all blobs are stored in a flat hierarchy under a container. Therefore, you can directly specify Small/test.png as the uploaded blob name.
The lib #Azure/storage-blob filters the / out and replaces it with %2F if I use the .fromServiceUrl() and .fromContainerUrl() function. That doesn’t work. I’ve tried that.
I ended up initializing the classes ContainerURL and BlockBlobURL without the function bypassing the filter. that seemed to fix it.

How to save a file to a subfolder in an Azure blob container?

I'm trying to save an image to our Azure blog storage. Thanks to the helpful response provided via the link below the code snippet, I was able to successfully save the file to our top-level container. Unfortunately, I would like the file to save to a subdirectory in that container. And, for the life of me, I can not get it to work.
Currently, the image is saving to our "images" container. Within that container is a folder, "members". I would like the files to save to that subdirectory. So "images/members". I tried to pass "images/members" to GetBlockBlobReference but then the file just didn't save at all (or, at least I can't find it).
This seems like it should be pretty simple. Thanks in advance.
CloudBlobClient blobClient = account.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("images");
CloudBlockBlob blockBlob = container.GetBlockBlobReference(filename);
blockBlob.UploadFromStream(stream);
Top-level container. The image with the Guid is one that I uploaded
The "members" directory. Sorted by most recent; nothing recent appearing
Helpful solution that got me to saving successfully to the top-level container
"images" is the name of your container.
what you need to do is change this line from
CloudBlockBlob blockBlob = container.GetBlockBlobReference(filename);
to
CloudBlockBlob blockBlob = container.GetBlockBlobReference("members/" + filename);
Then you can use the Azure Storage Explorer to view your files and folders:
https://azure.microsoft.com/en-us/features/storage-explorer/
Using a slash (/) in the filename will create a 'folder'. I used 'folder' because it's a virtual folder, a trick used to give us humans the idea of folders. There's actually only one level of grouping which is the Container.
Each slash (/) in a filename stands for one 'folder', so creating a blob with filename firstFolder/secondFolder/filename.txt will create a file with that exact name. Which looks like a file with the path firstFolder -> secondFolder. You can ask a container to ListBlobs with useFlatBlob set to true, returning you all blobs in the specific container. So all Blobs in all folders.
You can also ask for all blobs in a virtual folder by getting a DirectoryReference using CloudBlobContainer.GetDirectoryReference and listing the blobs under there.
More info here: Work with Blob resources
You can follow this snip of code to do the same.
output = detected_anomaly.to_csv (encoding = "utf-8", index=False)
blob_path = 'blobfolder1/blobfolder2/'
blob_path_anomalies = blob_path + '<created_blob_folder_name>/demo.csv'
blob_service.create_blob_from_text(container_name, blob_path_anomalies, output)

IFileProvider Azure File storage

I am thinking about implementing IFileProvider interface with Azure File Storage.
What i am trying to find in docs is if there is a way to send the whole path to the file to Azure API like rootDirectory/sub1/sub2/example.file or should that actually be mapped to some recursion function that would take path and traverse directories structure on file storage?
just want to make sure i am not missing something and reinvent the wheel for something that already exists.
[UPDATE]
I'm using Azure Storage Client for .NET. I would not like to mount anything.
My intentention is to have several IFileProviders which i could switch based on Environment and other conditions.
So, for example, if my environment is Cloud then i would use IFileProvider implementation that uses Azure File Services through Azure Storage Client. Next, if i have environment MyServer then i would use servers local file system. Third option would be environment someOther with that particular implementation.
Now, for all of them, IFileProvider operates with path like root/sub1/sub2/sub3. For Azure File Storage, is there a way to send the whole path at once to get sub3 info/content or should the path be broken into individual directories and get reference/content for each step?
I hope that clears the question.
Now, for all of them, IFileProvider operates with path like ˙root/sub1/sub2/sub3. For Azure File Storage, is there a way to send the whole path at once to getsub3` info/content or should the path be broken into individual directories and get reference/content for each step?
For access the specific subdirectory across multiple sub directories, you could use the GetDirectoryReference method for constructing the CloudFileDirectory as follows:
var fileshare = storageAccount.CreateCloudFileClient().GetShareReference("myshare");
var rootDir = fileshare.GetRootDirectoryReference();
var dir = rootDir.GetDirectoryReference("2017-10-24/15/52");
var items=dir.ListFilesAndDirectories();
For access the specific file under the subdirectory, you could use the GetFileReference method to return the CloudFile instance as follows:
var file=rootDir.GetFileReference("2017-10-24/15/52/2017-10-13-2.png");

How to create a sub container in azure storage location

How can I create a sub container in the azure storage location?
Windows Azure doesn't provide the concept of heirarchical containers, but it does provide a mechanism to traverse heirarchy by convention and API. All containers are stored at the same level. You can gain simliar functionality by using naming conventions for your blob names.
For instance, you may create a container named "content" and create blobs with the following names in that container:
content/blue/images/logo.jpg
content/blue/images/icon-start.jpg
content/blue/images/icon-stop.jpg
content/red/images/logo.jpg
content/red/images/icon-start.jpg
content/red/images/icon-stop.jpg
Note that these blobs are a flat list against your "content" container. That said, using the "/" as a conventional delimiter, provides you with the functionality to traverse these in a heirarchical fashion.
protected IEnumerable<IListBlobItem>
GetDirectoryList(string directoryName, string subDirectoryName)
{
CloudStorageAccount account =
CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
CloudBlobClient client =
account.CreateCloudBlobClient();
CloudBlobDirectory directory =
client.GetBlobDirectoryReference(directoryName);
CloudBlobDirectory subDirectory =
directory.GetSubdirectory(subDirectoryName);
return subDirectory.ListBlobs();
}
You can then call this as follows:
GetDirectoryList("content/blue", "images")
Note the use of GetBlobDirectoryReference and GetSubDirectory methods and the CloudBlobDirectory type instead of CloudBlobContainer. These provide the traversal functionality you are likely looking for.
This should help you get started. Let me know if this doesn't answer your question:
[ Thanks to Neil Mackenzie for inspiration ]
Are you referring to blob storage? If so, the hierarchy is simply StorageAccount/Container/BlobName. There are no nested containers.
Having said that, you can use slashes in your blob name to simulate nested containers in the URI. See this article on MSDN for naming details.
I aggree with tobint answer and I want to add something this situation because I also
I need the same way upload my games html to Azure Storage with create this directories :
Games\Beautyshop\index.html
Games\Beautyshop\assets\apple.png
Games\Beautyshop\assets\aromas.png
Games\Beautyshop\customfont.css
Games\Beautyshop\jquery.js
So After your recommends I tried to upload my content with tool which is Azure Storage Explorer and you can download tool and source code with this url : Azure Storage Explorer
First of all I tried to upload via tool but It doesn't allow to hierarchical directory upload because you don't need : How to create sub directory in a blob container
Finally, I debug Azure Storage Explorer source code and I edited Background_UploadBlobs method and UploadFileList field in StorageAccountViewModel.cs file. You can edit it what you wants.I may have made spelling errors :/ I am so sorry but That's only my recommend.
If you are tying to upload files from Azure portal:
To create a sub folder in container, while uploading a file you can go to Advanced options and select upload to a folder, which will create a new folder in the container and upload the file into that.
Kotlin Code
val blobClient = blobContainerClient.getBlobClient("$subDirNameTimeStamp/$fileName$extension");
this will create directory having TimeStamp as name and inside that there will be your Blob File. Notice the use of slash (/) in above code which will nest your blob file by creating folder named as previous string of slash.
It will look like this on portal
Sample code
string myfolder = "<folderName>";
string myfilename = "<fileName>";
string fileName = String.Format("{0}/{1}.csv", myfolder, myfilename);
CloudBlockBlob blob = container.GetBlockBlobReference(fileName);

Resources