I have some TFRecords files stored on Azure Blob storage. I wonder is there some way to read them directly using tensorflow's Dataset API.
The way I currently use is to download those tfreocrds to local file system and then read from via Dataset API. Is there a more efficient way?
The way to go here is to construct SAS tokens for your files and use the resulting urls to download the files in tensorflow. You can generate the tokens via Azure Portal if you only have a few files, or use something like the Azure CLI.
Related
I have a folder called data with a bunch of csvs (about 80), file sizes are fairly small. This data is clean and has already been preprocessed. I want to upload this data folder and register as a datastore in azureml. Which would be best for this scenario data store created with file share or a data store created with blob storage?
AFAIK, based on your scenario you can make use of Azure File Share to create data store.
Please note that, Azure Blob storage is suitable for uploading large amount of unstructured data whereas Azure File Share is suitable for uploading and processing the structured files in chunks (more interaction with app to share files).
I have a folder called data with a bunch of csvs (about 80), file sizes are fairly small. This data is clean and has already been preprocessed.
As you mentioned CSV data is clean and preprocessed it comes under structured data. So, you can make you of Azure File Share to create data store.
To register a data store with Azure File Share you can make use of this MsDoc
To know more about Azure File Share and Azure Blob storage, please find below links:
Azure Blob Storage or Azure File Storage by Mike
azureml.data.azure_storage_datastore.AzureFileDatastore class - Azure Machine Learning Python | Microsoft Docs
Recently I have been working on adding documents to Azure storage using blob and file share. But then I realized that in file share using rest API I can upload in two steps
Creating a file
Adding content
I am able to doing that but my requirement here is to upload the .pdf, .docx document at once
and then there should be a way to download them as well.
Could some one please help.
Thanks
Unfortunately, there's no batch download capability available in Azure Blob Storage. You will need to download each blob individually. What you could do is download blobs in parallel to speed things up.
There is an alternative way you can approach using C# or PowerShell.
Would recommend you to please go through this MS document :
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-scalable-app-download-files?tabs=dotnet
And this one also
https://azurelessons.com/upload-and-download-file-in-azure-blob-storage/
Reference: How to download multiple files in a single request from Azure Blob Storage using c#?
I want to automate the transfer of files from a website not hosted in Azure to my client’s premises.
I am considering having an API on the website send the files to Azure Blob Storage , and then having another API running at the client site, download them.
Both would make use of the Azure storage API, which I like because it is easy to implement.
The files do not need to stay in Azure and can be deleted from storage once they are downloaded.
However I am wondering if there is a faster way.
Should I be using Hot Blob Storage or File Storage perhaps?
I looked at https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers but am still unclear as to the fastest method for my use case.
I suggest you can use File share, which can be mapped to local as a mapped drive and can be easily and faster operation like read / delete.
If you choose code only, from the comparison of blob and file, they can be up to Up to 60 MiB/s, I cannot see which is faster. There is a Azure Storage Data Movement Library , which is designed for high-performance uploading, downloading and copying Azure Storage Blob and File, you can use it for your purpose.
I would recommend blob storage for this application. Logic apps can also be used to automate this pipeline based on timer triggers or some other trigger.
I have a problem that I have been wracking my brain about and figured I would need some perspective and insight from people who are a lot more knowledgeable about this.
What I have currently: Web based application hosted in azure uses azure blob store to store files that are generated as part of data import processes. We have a seperate application that extends the original web application that allows users to upload files and these files are currently also stored in azure blob store.
Where I am trying to go: I have a requirement that wants the ability to map network file shares on a users laptop and be able to access these files that currently reside in the blob.
Since Azure blob does not support SMB I have no way of actually doing this with a blob store.
I could use Azure files in conjunction with a File Server running the sync agent. However, this requires a lot of work both in terms of refactoring, setup and some custom service that add remove permissions on the file server.
I'm wondering if there is a service or a piece of software that exists in the market currently that allows me to continue using blob and perhaps sync the blob files into a file server that can then allow users to access and open files using windows file explorer? I found one that looks like an open source project but only does a one way sync from the blob to the file share. Ideally I'd like to find a solution that does a two way sync like azure file sync does.
Any thoughts and ideas will be appreciated.
Since the max number of blob containers, file shares is unlimited. Per my understanding, you could leverage the following approaches:
Migrate the data from blob storage to azure file share instead of blob storage, then the subsequent file store is azure file storage.
Note: Currently you must specify storage account key when mounting file shares, details you could follow this feedback. I recommend that you'd better do not map network file shares on a users laptop.
You could still use the blob storage, and you could create each blob container for each user and generate each blob container SAS token for your users, then the users could leverage Azure Storage Explorer to manage their blob files or use AzCopy and other command tools to download the blob files into their laptop file system.
Note: For security consideration, you could combine a stored access policy with a SAS, in order to revoke the permissions, you just need to invalidate the related access policy instead of regenerating the account key. Details you could follow Controlling a SAS with a stored access policy and Shared Access Signatures, Part 2: Create and use a SAS with Blob storage.
I am in the middle of developing a cloud server and I need to store HDF files ( http://www.hdfgroup.org/HDF5/ ) using blob storage.
Functions related to creating, reading writing and modifying data elements within the file come from HDF APIs.
I need to get the file path to create the file or read or write it.
Can anyone please tell me how to create a custom file on Azure Blob ?
I need to be able to use the API like shown below, but passing the Azure storage path to the file.
http://davis.lbl.gov/Manuals/HDF5-1.4.3/Tutor/examples/C/h5_crtfile.c
These files i am trying to create can get really huge ~10-20GB, So downloading them locally and modifying them is not an option for me.
Thanks
Shashi
One possible approach, admittedly fraught with challenges, would be to create the file in a temporary location using the code you included, and then use the Azure API to upload the file to Azure as a file input stream. I am in the process of researching how size restrictions are handled in Azure storage, so I can't say whether an entire 10-20GB file could be moved in a single upload operation, but since the Azure API reads from an input stream, you should be able to create a combination of operations that would result in the information you need residing in Azure storage.
Can anyone please tell me how to create a custom file on Azure Blob ?
I need to be able to use the API like shown below, but passing the
Azure storage path to the file.
http://davis.lbl.gov/Manuals/HDF5-1.4.3/Tutor/examples/C/h5_crtfile.c
Windows Azure Blob storage is a service for storing large amounts of unstructured data that can be accessed via HTTP or HTTPS. So from application point of view Azure Blob does not work as regular disk.
Microsoft provides quite good API (c#, Java) to work with the blob storage. They also provide Blob Service REST API to access blobs from any other language (where specific blob storage API is not provided like C++).
A single block blob can be up to 200GB so it should easily store files of ~10-20GB size.
I am afraid that the provided example will not work with Windows Azure Blob. However, I do not know HDF file storage; maybe they provide some Azure Blob storage support.